Hugging Face’s Unified API: Standardizing Tool Use Across Top AI Models from Mistral, Cohere, Nous, and Meta

2024-08-15

When first announced, tool use in large language models (LLMs) was a game-changer.

It allowed LLMs to access external functions, like calculators, web searches, or databases, making them far more reliable and specific in their responses.

But as any developer who’s tried to implement this knows, getting tool use to work smoothly across different models can be a real headache.

Hugging Face has a solution!

The new unified tool use API in Hugging Face Transformers standardizes the process across popular models like Mistral, Cohere, Nous, and Meta, reducing the friction of implementation and letting you focus on building great AI-driven products, so that your customers finally can stop complaining.

Let me explain what it exactly solves and walk you through a brief example.

The Unified Tool Use API: What It Solves

In theory, using tools with LLMs is straightforward. But in practice, the implementation can be frustratingly complex, especially when different models require different formats and methods.

The unified API eliminates this mess by offering a single, consistent interface for tool use across multiple models. Whether you’re using Mistral, Cohere, Nous, or Llama models, you can now write code that works universally, without worrying about model-specific quirks.

This consistency is crucial for developers who want to build scalable and maintainable AI platforms and products.

Chat Templating: Making Tool Use Intuitive

Apart from tool use, managing different chat formats for various models is also a hassle.

Hugging Face solved that with chat templates, and now they’ve extended this solution to tool use.

With the unified API, you can define tools in a universal format, and the chat templates handle all the model-specific formatting behind the scenes.

For developers, this means less boilerplate code and fewer bugs related to formatting mismatches. Just pass your Python functions, and let the API handle the rest.

Practical Implementation: A Developer’s Guide

OK, time to get hands-on, but let me explain one more important detail.

The new API automatically converts Python functions into JSON schemas, which the models can understand and use.

This approach not only saves time but also ensures that your tools are defined consistently, regardless of the model you’re working with.

Plus, the API supports manual JSON schema input if you’re coding in another language, giving you flexibility without sacrificing ease of use.

Before we see that in action, let me add:

If you want to build a full-stack GenAI SaaS Products that people love — Don’t miss out on our upcoming cohort-based course. Together, we’ll build, ship, and scale your GenAI product alongside a community of like-minded people!

How Does Hugging Face’s Unified API Works?

Initializing a model as usual:

import torch  
from transformers import AutoTokenizer, AutoModelForCausalLM  

checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"  

tokenizer = AutoTokenizer.from_pretrained(checkpoint)  
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")

Next, define a simple tool function for the model — the type hints and docstrings are mandatory — they will be parsed and used by the model to understand the function of the tool.

def get_current_temperature(location: str):  
    """  
    Gets the temperature at a given location.  

    Args:  
        location: The location to get the temperature for, in the format "city, country"  
    """  
    return 22.0  # bug: Sometimes the temperature is not 22. low priority to fix tho  

tools = [get_current_temperature]

Then set up a simple chat.

chat = [  
    {"role": "user", "content": "Hey, what's the weather like in Paris right now?"}  
]

Pass the tools to the chat template, and generate text from the model using the formatted prompt.

tool_prompt = tokenizer.apply_chat_template(  
    chat,  
    tools=tools,  
    return_tensors="pt",  
    return_dict=True,  
    add_generation_prompt=True,  
)  
tool_prompt = tool_prompt.to(model.device)  

out = model.generate(**tool_prompt, max_new_tokens=128)  
generated_text = out[0, tool_prompt['input_ids'].shape[1]:]  

print(tokenizer.decode(generated_text))

# Output  
<tool_call>  
{"arguments": {"location": "Paris, France"}, "name": "get_current_temperature"}  
</tool_call><|im_end|>

The model has chosen to call a tool, and picked an argument that matches both the user’s request and the format in the tool docstring!

Let’s add this tool call to the chat as a message:

tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}  
chat.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

Next, add the tool response containing the function output to the chat. Both the tool call (containing the arguments passed to the tool) and tool response (containing the tool’s output) must be included in the chat history.

chat.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

Finally, apply the chat template and generate text once again.

tool_prompt = tokenizer.apply_chat_template(  
    chat,  
    tools=tools,  
    return_tensors="pt",  
    return_dict=True,  
    add_generation_prompt=True,  
)  
tool_prompt = tool_prompt.to(model.device)  

out = model.generate(**tool_prompt, max_new_tokens=128)  
generated_text = out[0, tool_prompt['input_ids'].shape[1]:]  

print(tokenizer.decode(generated_text))

# Output  
The current temperature in Paris, France is 22.0 degrees Celsius. Enjoy your day!<|im_end|>

Awesome!

Why This Matters for Your Projects

Simplification of the integration of tools with LLMs makes your code more portable and easier to maintain.

Whether you’re building chatbots, virtual assistants, or any other AI-driven application, this API helps you focus on what really matters: creating great user experiences without getting bogged down by model-specific implementation details.

Building with LLMs

And don’t forget to have a look at some practitioner resources that we published recently:

Say Hello to ‘Her’: Real-Time AI Voice Agents with 500ms Latency, Now Open Source

Fine-Tune Meta’s Latest AI Model: Customize Llama 3.1 5x Faster with 80% Less Memory

Fine Tuning FLUX: Personalize AI Image Models on Minimal Data for Custom Look and Feel

Data Management with Drizzle ORM, Supabase and Next.js for Web & Mobile Applications

Thank you for stopping by, and being an integral part of our community.

Happy building!

Back to All Posts