To make AI systems truly effective, LLMs need the ability to interact with the real world. On their own, LLMs are amazing at generating text and reasoning, but they’re inherently limited. They can’t fetch external information, perform actions, or control other systems without help from external tools. To bridge this gap, LLMs need agency. Agency allows them to act by using tools, triggering workflows, or making decisions based on their outputs.
This is the core idea behind AI agents. An AI agent is a program where the LLM guides the workflow, and its level of control determines how much agency it has.
Take a multi-step agent, for example. It operates in a loop: the LLM decides the next action, the system executes it, collects the results, and the cycle repeats until it achieves a satisfactory outcome.
Thanks to the smolagents framework from Hugging Face, creating local AI agents has become incredibly simple.
In this article, we’ll explore how to build our own AI agent with smolagents. As an example, we’ll create an agent that can search the web and retrieve data from a web page, all using a single GPU. We will understand the challenges and highlight what to watch out for when building and using an AI agent. Instead of just showcasing a polished, fully functional agent (like most tutorials tend to do), I’ll demonstrate where things can go wrong, explain why, and give you some insights into these shortcomings.
I’ve also made a notebook implementing an AI agent, which can perform web searches and handle Retrieval-Augmented Generation (RAG) using web page content. You can check it out here: