The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM

Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM

Fast and memory-efficient thanks to QLoRA

Benjamin Marie's avatar
Benjamin Marie
Jul 22, 2024
∙ Paid
7

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM
2
1
Share
Generated with DALL-E

Recent large language models (LLMs) are highly capable in most language generation tasks. However, since they operate based on next-token prediction, they often struggle with accurately performing mathematical operations. Additionally, due to their knowledge cut-off, they may lack the information needed to answer some queries accurately.

One way to alleviate these issues is through function calling. Function calling allows LLMs to reliably connect to external tools. It enables effective tool usage and interaction with external APIs. For example, retrieving information from the Internet and performing mathematical operations can be accomplished through function calling by interfacing the LLM with a web search engine and a calculator.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we will see how to fine-tune LLMs for function calling. I use xLAM, a dataset of 60k entries of function calling released by Salesforce for fine-tuning Llama 3 and Qwen2. We will see how to format the dataset and how we can exploit the fine-tuned adapters for function calling.

I also made this notebook implementing the fine-tuning, and some examples of inference:

Get the notebook (#89)

Function Calling for LLMs: How Does it Work?

For instance, if you prompt a standard LLM with “Give me the square root of 3342398”, it will generate the answer, one digit at a time, which will probably be very inaccurate. Let’s try it with Llama 3 Instruct:

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a calculator."},
    {"role": "user", "content": "Give me the square root of 3342398"},
]

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])

The results:

The square root of 3342398 is 1833.13

It’s close but wrong. The right answer is “1828.222634144977” LLMs can’t do math accurately. They can reason on math problems and approximate results but they won’t be as accurate as a calculator. They are not designed for it. However, they can call a calculator. We simply need the model to “understand” that we are requesting the square root of the number 3342398.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share