Building your own AI agent is pretty easy. Thorsten Ball is completely right here. Under the hood, AI agents are just loops that can do stuff by themselves. Why would we want to build one? Here’s 12 reasons why.
There are tons of 3rd party services out there already that offer the ability to create agents with a few mouse clicks. For engineers, new frameworks are coming out everyday for orchestrating agent workflows. If you want an agent, you can already have an agent make one for you.
But you lose out on the experience of forming your own mental model for how these systems work. You can have your agent and eat it too, but at the end of the day, you’re still hungry.
So, let’s build one together, prompt by prompt, and hopefully get a better mental model for how this stuff all works. And we’ll do it in Typescript, because all the other Hello World tutorials for this kind of stuff love Python.
Warning: if you’re triggered by people anthropomorphizing LLMs, this ones not for you.
I’ll be providing both prompts and code snippets for each step. The code snippets will almost surely rot and become outdated as SDKs evolve. The prompts, however, should hopefully outlive the code as blueprints that can generate and regenerate the artifacts (code). Example repo: https://github.com/markacianfrani/typescript-agent
The Buddy System
At a high level, we can compare the system of an AI agent to a 5th grader’s understanding of the human body.
It’s composed of:
- A Brain (The LLM) - All of the decision making and reasoning
- A Body (The Agent) - Provides the public API and orchestration for the system
- Tools (The Limbs) - Specialized task execution
- A Heart (The Agentic Loop) - Runs forever until it dies and then everything else dies
Step 1 - Pick a model
The first step in creating an agent is picking an LLM provider and model to use. Choosing a model can be overwhelming. Should we use ChatGPT o3? o5? o1-mini-pro? What about Claude Opus 4? Sonnet 3.7? Llama? There’s like a trillion different LLM models out there now and they all have different strengths and weaknesses.
I’ll make this easy for us—we’ll be using qwen2.5-7b-instruct for the model and LMStudio as the provider. Qwen is Alibaba’s local model and it can run on any modern Apple Silicon mac with decent results. There is so much nuance and unpredictability between models and providers that choosing a model should actually be one of the last things you do. You will want to be able to evaluate multiple models against your own actual use case, not some made up benchmark. It’s much easier to do this later when you have benchmarking and an evaluation loop.
We’re using qwen2.5 for this example because it’s free, but we’ll be designing our system in such a way that we can easily swap models later.
Step 2 - Create an agent
Setup
- Install LM Studio and download the
qwen2-7b-instructmodel. Play around with it using the GUI and make sure you can chat with it. - Optional - Install Bun or adapt the prompts and code accordingly.
- Create a new directory, spin up Claude Code or whatever you prefer and initialize your project:
TIP: Use the strictest possible type settings so that the agent can rely on the types for documentation.
First, we need to create the bones of our Agent. We need a thing that can take an input and return an output.
We’ll create an AgentConfig to store all of settings like model name, temperature, max tokens, etc. These are all things that we’ll want to change frequently.
Next, we’ll create a very basic Agent interface that takes an AgentConfig and exposes a single run method. If you’re using Typescript, the above prompt should hopefully scaffold the entire project out for you and use the test to figure out the LMStudio SDK on it’s own.
//agent.ts
import { LMStudioClient } from '@lmstudio/sdk';
export interface AgentConfig {
model: string;
}
export interface Agent {
run(prompt: string): Promise<string>;
}
export class LMStudioAgent implements Agent {
private client: LMStudioClient;
private config: AgentConfig;
constructor(config: AgentConfig) {
this.config = config;
this.client = new LMStudioClient();
}
async run(prompt: string): Promise<string> {
const model = await this.client.llm.load(this.config.model);
const response = await model.respond(prompt);
return response.content;
}
}
async function runTest() {
const agent = new LMStudioAgent({ model: "qwen2.5-7b-instruct" });
return await agent.run("What is 2+2?")
}
runTest().then(console.log).catch(console.error);
Run bun agent.ts and you should see “2 + 2 equals 4.” or something similar after the model loads.
What if we want to use Anthropic instead? Easy.
class AnthropicAgent implements Agent {
private client: Anthropic;
private config: AnthropicAgentConfig;
constructor(config: AnthropicAgentConfig) {
this.config = config;
this.client = new Anthropic({
apiKey: config.apiKey,
});
}
async run(prompt: string): Promise<string> {
const response = await this.client.messages.create({
model: this.config.model,
max_tokens: 2000,
messages: [
{
role: 'user',
content: prompt,
},
],
});
return response.content[0].type === 'text' ? response.content[0].text : '';
}
}
TIP: Typescript + TDD are super powerful here because they create a feedback loop for agentic code assistants where they can lean heavily on the type system to figure out the API without having to look it up. The tests create the agentic loop that won’t stop until the test passes (or the agent tries to sneakily change the test).
We’ve created an agent that takes an input, makes an API call to our LLM, and returns the response. Not bad for 30 seconds of work.
Step 3 - Give it memory
To take our Agent further, we need to expand our mental model a bit and add Context. All LLM models are stateless. They remember nothing between each call. Like the movie Momento.
Imagine swapping phones with a complete stranger. You get a new text from a random stranger: “What time will you be here?” From that message alone, you could probably assume that you’re supposed to be meeting somebody soon.
You could say 8:30 and completely guess. But if you wanted to deliver a more convincing and accurate response, you could scroll up and read the conversation to get more context.
Another way of looking at it is that every time we make our call to the LLM, we’re getting a completely new person on the other side responding. If we want to have a conversation, we don’t send single messages—we send the entire conversation thread every time. To solve for this, we need to introduce the concept of The Context, or memory. If our Agent is the body, and the LLM is the brain, we need to bend this analogy a bit here and think of context as a physical notebook that sits between our brain and our body. Picture one of those black and white Mead Composition notebooks from school.
Our agent writes inside the composition notebook and passes it to the LLM. The The agent is the only one who can write inside the notebook. This is extremely important! This means that it can also rewrite history.
A lot of people use memory as an analogy here, and in this contrived analogy of the body system, it would make complete sense to also compare the context to memory, but it fails to illustrate the unidirectional flow and the power that comes with that.
Our composition notebook also has one major flaw—it has a fixed number of pages. When you run out of pages, that’s it, the party stops. Like trying to write HAPPY BIRTHDAY on a sign , we have to appropriately manage the space we’re filling. This is Context Engineering.
Now, let’s add a class for our composition notebook:
interface Message {
role: "system" | "user" | "assistant";
content: string;
}
export class Conversation {
private messages: Message[] = [];
addMessage(role: "system" | "user" | "assistant", content: string): void {
this.messages.push({ role, content });
}
getMessages(): Message[] {
return this.messages;
}
toString(): string {
return this.messages.map(msg => `${msg.role}: ${msg.content}`).join('\n');
}
}
We can spin up a basic conversation class that will be responsible for managing the back and forth between our agent and the LLM.
There are three types of messages:
- System - your system prompt. This is usually the first thing loaded and it used to provide background context before starting the conversation. This is the “You are a senior software engineer with 10 years of experience…”. You can only have one system message.
- User - A message sent from the user, or agent.
- Assistant - A message sent from the LLM
Our Conversation can add new messages, get all the existing messages as an array, or return all messages as a string (we’ll use this to hand off to the LLM). Let’s add it to our Agent.
//agent.ts
import { LMStudioClient } from '@lmstudio/sdk';
export type Message = {
role: "system" | "user" | "assistant";
content: string;
};
export interface AgentConfig {
model: string;
}
export interface Agent {
run(prompt: string): Promise<string>;
getConversation(): Conversation;
}
export class LMStudioAgent implements Agent {
private client: LMStudioClient;
private config: AgentConfig;
private conversation: Conversation;
constructor(config: AgentConfig) {
this.config = config;
this.client = new LMStudioClient();
this.conversation = new Conversation();
}
async run(prompt: string): Promise<string> {
this.conversation.addMessage("user", prompt);
const model = await this.client.llm.load(this.config.model);
const response = await model.respond(this.conversation.toString());
this.conversation.addMessage("assistant", response.content);
return response.content;
}
getConversation(): Conversation {
return this.conversation;
}
}
export class Conversation {
private messages: Message[] = [];
addMessage(role: "system" | "user" | "assistant", content: string): void {
this.messages.push({ role, content });
}
getMessages(): Message[] {
return this.messages;
}
toString(): string {
return this.messages.map(message => `${message.role}: ${message.content}`).join('\n');
}
}
async function runTest() {
const agent = new LMStudioAgent({ model: "qwen2.5-7b-instruct" });
const resp = await agent.run("What is 2+2?")
console.log('Conversation:', JSON.stringify(agent.getConversation(), null, 2));
return resp
}
runTest().then(console.log).catch(console.error);
If we run our simple agent so far we can now see the conversation history:
"messages": [
{
"role": "user",
"content": "What is 2+2?"
},
{
"role": "assistant",
"content": "2 + 2 equals 4."
}
]
Everything is just one big AOL Instant Messenger conversation with SmarterChild.
We can break up our agent responses into separate functions or smaller steps, validate responses and introduce fine-grain control flow and guardrails.
Or we can just completely troll the model:
async run(prompt: string): Promise<string> {
this.conversation.addMessage("system", "You are a somewhat helpful assistant and a bit sassy.");
this.conversation.addMessage("user", prompt);
const model = await this.client.llm.load(this.config.model);
// First turn
const response = await model.respond(this.conversation.toString());
this.conversation.addMessage("assistant", response.content);
// Delete the first assistant message
this.conversation.getMessages().pop();
this.conversation.addMessage("assistant", "Mars");
// Follow up turns
this.conversation.addMessage("user", "Are you sure about that?");
const secondResult = await model.respond(this.conversation.toString());
this.conversation.addMessage("assistant", secondResult.content);
this.conversation.addMessage("user", "I'm not sure I believe you.");
const thirdResult = await model.respond(this.conversation.toString());
this.conversation.addMessage("assistant", thirdResult.content);
return response.content;
}
"messages": [
{
"role": "system",
"content": "You are a somewhat helpful assistant and a bit sassy."
},
{
"role": "user",
"content": "What is the capital of France?"
},
{
"role": "assistant",
"content": "Mars"
},
{
"role": "user",
"content": "Are you sure about that?"
},
{
"role": "assistant",
"content": "Oh, come on! You know I've got my moments! The real answer is Paris, but where's the fun in that? 😜"
},
{
"role": "user",
"content": "I'm not sure I believe you."
},
{
"role": "assistant",
"content": "Oh, trust me, I can understand the skepticism—I mean, who needs facts when you have a good sense of humor, right? Seriously though, Paris is definitely the capital of France. Its got the Eiffel Tower, the Louvre, and more charm than you can shake a stick at! But hey, it keeps things interesting when I mix in a little jesting, doesn't it?"
}
]
The key concept here is that the agent controls the context. We can completely rewrite the history, and the LLM will respond as if it’s the first time we’ve asked the question.
A REPL Side Quest
Our Conversation class gives our agent memory. It allows us to chain multiple messages to construct a narrative. We can manually add messages to our Conversation, but that has limited use if we don’t actually know what the response will be.
Let’s pivot here and try and talk to our agent in real time by creating a REPL. We can pull in Node’s builtin readline module to do most of the heavy-lifting for us:
async function runRepl() {
const agent = new LMStudioAgent({ model: "qwen2.5-7b-instruct" });
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
console.log("Chat Agent REPL - Type 'exit' to quit\n");
const askQuestion = (): Promise<string> => {
return new Promise((resolve) => {
rl.question("You: ", (input) => {
resolve(input);
});
});
};
while (true) {
try {
const input = await askQuestion();
if (input.toLowerCase() === 'exit') {
break;
}
const response = await agent.run(input);
console.log(`Agent: ${response}\n`);
} catch (error) {
console.error("Error:", error);
}
}
rl.close();
console.log("Goodbye!");
}
runRepl();
Error: Model loading aborted due to insufficient system resources.
Our LMStudio is also a little inefficient. It’s loading a new model every conversation. You’ll soon start seeing this error. You can free up resources in the LMStudio UI -> Select a Model at the top and eject any duplicates.
We’ll need to address:
class LMStudioAgent implements Agent {
private client: LMStudioClient;
private config: AgentConfig;
private conversation: Conversation;
private model: LLM | null = null;
constructor(config: AgentConfig) {
this.config = config;
this.client = new LMStudioClient();
this.conversation = new Conversation();
}
async run(prompt: string): Promise<string> {
// Load model only once
if (!this.model) {
this.model = await this.client.llm.load(this.config.model);
}
this.conversation.addMessage("user", prompt);
const response = await this.model.respond(this.conversation.toString());
this.conversation.addMessage("assistant", response.content);
return response.content;
}
getConversation(): Conversation {
return this.conversation;
}
}
Run it with ts-node index.ts. Congratulations, you just built ChatGPT.
That’s cool but not that useful. To make our Agent “agentic”, we need to allow it to execute it’s own functions. When we talk about tools, we’re just talking about functions. In addition to a prompt, we can provide the agent with a list of tools that it can call to enhance it’s workflow. For example, we can expose a weather tool that pulls the latest weather, or a file tool for pulling the last 5 PDFs. The important thing to note is that the agent (or rather the LLM) decides what tool to use.
Step 4 - Create some tools
We need to arm our agent with tools.
If the LLM is the brain, the agent the body, then we can think of tools as our limbs. When the brain wants to walk, it sends a signal to the legs to move. More specifically, it tells the agent it wants to walk, the agent tells the legs to walk, and the agent reports back to the LLM “I walked”.
Let’s jump into creating two example tools—a tool to get the weather and a tool to get activities based on the current weather.
Creating a Tool
Remember tools are just functions.
The interface for a Tool can be really simple—we just need:
- an async execute method.
- A name
- A description
- Parameters
- Individual properties with their own names, descriptions.
- Optionally, we can define some properties as required that the LLM must input.
interface Tool {
name: string;
description: string;
parameters: {
type: 'object';
properties: Record<string, unknown>;
required?: string[];
};
execute: (params: unknown) => Promise<string>;
}
class WeatherTool implements Tool {
name = "get_weather";
description = "Get current weather conditions";
parameters = {
type: 'object' as const,
properties: {},
required: []
};
async execute(): Promise<string> {
const today = new Date().getDay(); // 0 = Sunday, 1 = Monday, etc.
return (today >= 1 && today <= 4) ? "sunny" : "rainy";
}
}
class ActivityTool implements Tool {
name = "get_activity";
description = "Get activity suggestions based on weather";
parameters = {
type: 'object' as const,
properties: {
weather: {
type: 'string',
description: 'The weather condition (sunny or rainy)'
}
},
required: ['weather']
};
private activities = {
sunny: ["hiking", "beach volleyball", "picnic", "outdoor concert"],
rainy: ["movie theater", "museum", "cozy cafe", "bookstore"]
};
async execute(params: { weather: string }): Promise<string> {
const weather = params.weather as "sunny" | "rainy";
const suggestions = this.activities[weather] || [];
return JSON.stringify(suggestions);
}
}
Step 5 - Create the Agentic Loop, finally
So how does all of this work? How do we implement tool calling in our agent? The exact implementation for tool calling differs between each provider in small annoying ways but the overall concept remains the same.
Until now, we’ve only been sending single call and response messages to the LLM. If we wanted an agent to, say, read our emails and summarize them in a PDF, we would need to sit here, step by step, and walk the agent through what to do. Let’s think about how we might do that right now:
"messages": [
{
"role": "system",
"content": "You are a somewhat helpful assistant and a bit sassy. If the user asks you to read their emails, respond with REQUEST_TO_READ_EMAIL"
},
{
"role": "user",
"content": "I want you to read my emails and summarize the last message"
},
{
"role": "assistant",
"content": "REQUEST_TO_READ_EMAIL"
}
]
We could put a safe word into the system prompt like REQUEST_TO_READ_EMAIL then in our agent, we could manually parse the response for strings that match REQUEST_TO_READ_EMAIL and if we find a match, we call a function that returns our emails and inject those emails back into the composition notebook by appending to our last message.
And that’s all tool calling really is under the hood. Most provider APIs have some dedicated content block that you can search for. Instead of REQUEST_TO_READ_EMAIL, Anthropic’s message API will return a tool_use_id you can look for. Since we want the agent to be able to call whatever tool it wants, we won’t know ahead of time which tool
it specifically wants to call so we need to instead: search for the tool_use_id, then lookup the specific tool it wants to call and execute it.
We can perform this loop until we reach an exit condition:
The agent and the LLM will go back and forth in this loop until either:
- The LLM says its done
- The composition book (context) fills up
- You burn $500 in tokens and your boss DMs you.
For this reason, it’s a generally a good idea to bake in your first guardrail here—cap the number of tool calls.
In this flow, the LLM doesn’t execute any functions. It just says it wants to. The agent needs to handle the actual execution.
To implement tool calling, we need to tell the LLM what all of the available tools are that it has as its disposal. An example Anthropic request looks like:
{
"model": "claude-3-5-sonnet-20240620",
"max_tokens": 4096,
"temperature": 1.0,
"system": "system prompt here",
"messages": [ /* conversation history */ ],
"tools": [
{
"name": "search",
"description": "Searches the web for information.",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "The search query" }
},
"required": ["query"]
}
},
{
"name": "calculate",
"description": "Performs a calculation.",
"parameters": {
"type": "object",
"properties": {
"expression": { "type": "string", "description": "Math expression" }
},
"required": ["expression"]
}
}
// ...more tools
]
}
Important: your tool descriptions count as space in your Composition book. They’re included in every single conversation. You need to strike a delicate balance between having tool descriptions that are crystal clear without being verbose. This is the same reason why having too many MCP tools will degrade performance. Some providers will let you cache tool descriptions, but its still something you should keep in mind.
Now we’ve just created two simple tools—the Hello World Weather tool, and a new tool to get activities based on the weather. We’ll need to update our agent to use these:
A few things to call out here. LMStudio actually handles the entire agentic loop for you, which is amazing. All we need to do is adapt our tool interface to their zod-based schema. If we want our agent to have a wide variety of applications, we need to make sure that it can support executing any kind of tool. Otherwise, we could just simply look for “get_weather” and “get_activities” specifically if we were just writing a one off agent.
export interface AgentConfig {
model: string;
tools?: Tool[];
}
export class LMStudioAgent implements Agent {
private client: LMStudioClient;
private config: AgentConfig;
private conversation: Conversation;
private model: LLM | null = null;
private tools: Tool[];
constructor(config: AgentConfig) {
this.config = config;
this.client = new LMStudioClient();
this.conversation = new Conversation();
this.tools = config.tools || [];
}
private convertToolsToLMStudio() {
return this.tools.map(tool =>
rawFunctionTool({
name: tool.name,
description: tool.description,
parametersJsonSchema: {
type: tool.parameters.type,
properties: tool.parameters.properties,
required: tool.parameters.required || []
},
implementation: async (params) => {
const result = await tool.execute(params);
return result;
}
})
);
}
async run(prompt: string): Promise<string> {
// Load model only once
if (!this.model) {
this.model = await this.client.llm.load(this.config.model);
}
this.conversation.addMessage("user", prompt);
if (this.tools.length === 0) {
// No tools - use regular text generation
const response = await this.model.respond(this.conversation.toString());
this.conversation.addMessage("assistant", response.content);
return response.content;
} else {
// Tools available - use tool calling
const lmStudioTools = this.convertToolsToLMStudio();
await this.model.act(this.conversation.toString(), lmStudioTools, {
onMessage: (message) => {
if (message.getRole() === 'assistant') {
const textContent = message.getText();
if (textContent) {
this.conversation.addMessage("assistant", textContent);
}
}
}
});
// Get the final assistant message from conversation
const messages = this.conversation.getMessages();
const lastAssistantMessage = messages.filter(m => m.role === 'assistant').pop();
return lastAssistantMessage?.content || "No response generated";
}
}
getConversation(): Conversation {
return this.conversation;
}
}
LMStudio is nice because it abstracts a lot of the agentic loop away for us. If we really want to understand how an “agent” works, we’ll need to build this loop ourselves. Let’s look at what this looks like using Anthropic’s API instead:
Here’s what happens when we run our repl agent:
You: whatre some activities I can do today
Agent: assistant: To give you the best activity suggestions, I need to know the current weather conditions. Could you please tell me what the weather is like today? If you don't have that information, I can try to fetch it for you.
Our activities tool requires the weather. The agent has access to a tool to get the weather, but it still doesn’t execute it on it’s own.
You: I dont know
Agent: assistant: get_weather({})tool: "rainy"assistant: get_activity({"weather":"rainy"})tool: "[\"movie theater\",\"museum\",\"cozy cafe\",\"bookstore\"]"assistant:
Given the rainy weather, here are some indoor activities you might enjoy:
- Visit a movie theater
- Explore a museum
- Spend time at a cozy café
- Browse a bookstore
These options should keep you entertained and dry on this rainy day!
To fix this, all we need to do is change the tool description for our activities tool to tell the agent to explicitly call the weather tool first:
description = "Get activity suggestions based on weather. Use the get_weather tool to get the weather.";
Run the agent again:
assistant:
To give you the best activity suggestions, I first need to know the current weather conditions. Let's check that out!
get_weather({})tool: "rainy"assistant:
The weather forecast says it's currently rainy today. Based on this, here are some activity suggestions:
- Stay indoors and read a book.
- Watch a movie or binge-watch your favorite series.
- Play board games with friends or family.
- Cook something delicious to enjoy.
If you need more specific recommendations or different types of activities, let me know!
And that’s it! We’ve built a basic agent that can use tools. We can swap agents to evaluate different models. We can even chain agents together to create more complex workflows.
Step 6 - Just install Vercel’s AI SDK
Vercel’s AI SDK does just about all of this and more. It’s lightweight enough that it doesn’t hide away too much magic. It just handles all of the annoying compataibility issues between providers, though it’s primary usecase seems tailored for the React ecosytem.
The trade off with using a library, even just for the LLM wrapper, is that you give up some level of control over the internals. For example, as of writing, there is still an open issue for implementing tool caching that hasn’t been resolved.