<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Mark Anthony Cianfrani</title><description>Thoughts</description><link>https://cianfrani.dev/</link><item><title>A Better Figma MCP or How To Let Claude Design</title><link>https://cianfrani.dev/posts/a-better-figma-mcp/</link><guid isPermaLink="true">https://cianfrani.dev/posts/a-better-figma-mcp/</guid><description>The one where we give Claude access to the Figma API</description><pubDate>Sun, 11 Jan 2026 00:00:00 GMT</pubDate><content:encoded>import PromptBox from &quot;../../components/PromptBox.astro&quot;;

The [official Figma MCP server](https://help.figma.com/hc/en-us/articles/32132100833559-Guide-to-the-Figma-MCP-server) WAS pretty underwhelming at best. It used to be one way only--you could only pull context about your design, but not manipulate it. They recently updated released an [official way to push to Figma from Claude Code](https://www.figma.com/blog/introducing-claude-code-to-figma/) that might be enough for your everyday use case now. But if you still find it limiting, it&apos;s actually pretty dang simple to give an LLM the ability to automate a LOT of the manual minutiae involved with maintaining systems in Figma. All of the clicking and dragging and renaming things. Things that are completely unrelated to the actual craft of design and just waste time. 

## A Better Way

&lt;figure&gt;
	
		&lt;p&gt;
			Your browser doesn&apos;t support HTML5 video.{&quot; &quot;}
			&lt;a href=&quot;/assets/blog/figma-mcp-fast.mp4&quot;&gt;Download the video&lt;/a&gt; instead.
		&lt;/p&gt;
	
	&lt;figcaption&gt;
		Asking Claude to generate a complicated button component with multiple
		variants, states, and variables
	&lt;/figcaption&gt;
&lt;/figure&gt;

It doesn&apos;t have to be that way. It turns out that Figma exposes their entire plugin API right in the browser. Meaning, if you want to let Claude[^1] rip over your design, all you need to do is give it access to a browser. Once in a file, Claude can interact with the Figma plugin API itself by writing and executing code instead of manually clicking and dragging. 

&amp;gt; !!! SECURITY WARNING !!! Before you try this, it&apos;s important to understand the risk. When you give your LLM access to a browser, it can do all sorts of things like delete your file or, if you&apos;re not looking, go into your admin account and turn on some setting that sends you a bill for sending guest links or something. This is no worse than using the official Claude For Chrome plugin, but it&apos;s important to read what it&apos;s trying to do before approving any tool calls. And the cool thing is, if you don&apos;t understand what it means, just keep asking it to explain.

## Installation

For this to work, you only need a single MCP--a browser. There are a lot of options, I like the [Chrome Devtools CDP](https://github.com/ChromeDevTools/chrome-devtools-mcp). You can install it in [Claude Code](https://code.claude.com/docs/en/overview) with: 

`claude mcp add chrome-devtools npx chrome-devtools-mcp@latest`

Then all you need to do is prompt. Literally, just copy and paste this, login to Figma, navigate to your file, and ask Claude to do whatever. This really works best with Claude Code, but don&apos;t be put off by the terribly-named, gatekept &quot;Code&quot; product name[^2]. Claude Code can be used for so much more than just cranking out code. 


3. After confirming access, execute the user&apos;s query using the &quot;evaluate_script&quot; tool to run JavaScript code that interacts with the Figma plugin API. You can perform tasks such as creating shapes, modifying properties, or extracting information from the design file. 

## Rules of Engagement
- Always explain in plain English what you are about to do. Assume that the user cannot read code.
- Do not try to alternative solutions like using the REST API or manually interacting with the Figma UI


If &quot;figma is not defined&quot;, make sure that the user has appropriate permissions to edit the file and run plugins. If the user doesn&apos;t, suggest creating a new branch on the file. If the &quot;figma&quot; global is still not accessible, instruct the user to manually open any plugin and close it, then try again. There is a weird bug where the &quot;figma&quot; global is not available until a plugin has been opened at least once in the file.


## Additional Documentation

The full reference to the Figma plugin API can be found here: [Figma Plugin API Documentation](https://developers.figma.com/docs/plugins/api/global-objects/).
`}
/&amp;gt;

That&apos;s all you need. No third party plugins or websocket servers. Just have Claude open a browser and write code. 

## Plugin 

To make it a little easier to install, I wrapped this (markdown file) in a [claude code plugin](https://github.com/markacianfrani/claude-code-figma/tree/main): 
```
/plugin marketplace add markacianfrani/claude-code-figma
/plugin install figma-friend
```

## Use Cases 

- Creating and maintaining components 
- Auditing usage across multiple files
- Triaging designs, random organization, and documentation
- Comparing implementations against code
- Literally whatever 

This obviously doesn&apos;t replace designers. And you can clearly see in my example where it got things wrong if you pay attention. It&apos;s another tool in the toolbox that you can thoughtfully reach for. At least until Figma completely shuts this down. 

[^1]: also works with any frontier model--codex, gemini, etc.
[^2]: &quot;unnecessarily aggressive, consider softening or removing&quot;, nah I&apos;m gonna keep that, but thank you AI.</content:encoded></item><item><title>Recreating The Bell Labs Switch Experiment With Agents</title><link>https://cianfrani.dev/posts/12-switches/</link><guid isPermaLink="true">https://cianfrani.dev/posts/12-switches/</guid><description>The one where I recreate the 12 switch experiment from TADSE</description><pubDate>Sun, 14 Dec 2025 00:00:00 GMT</pubDate><content:encoded>import PromptBox from &quot;../../components/PromptBox.astro&quot;;



There&apos;s this book called The Art of Doing Science and Engineering by Richard Hamming that&apos;s about the art of doing science and engineering. Well, maybe 40% of it is, and the rest of it is full of equations and formulas on forier transforms and 100 other things that go above my head.

It&apos;s a pretty good book nonetheless. There&apos;s a section where Hamming reflects:

&amp;gt; A psychologist friend at Bell Telephone Laboratories once built a machine with about 12 switches and a red and a green light. You set the switches, pushed a button, and either you got a red or a green light. After the first person tried it twenty times they wrote a theory of how to make the green light come on. The theory was given to the next victim and they had their twenty tries and wrote their theory, and so on endlessly. The stated purpose of the test was to study how theories evolved.

But it turned out that the device was completely random. There were no patterns. The participants were not statisticians and no one figured out the trick to the machine.

&amp;gt; This is a sad commentary on your education. You are lovingly taught how one theory was displaced by another, but you are seldom taught to replace a nice theory with nothing but randomness! And this is what was needed; the ability to say the theory you just read is no good and there was no definite pattern in the data, only randomness.

And so Hamming&apos;s &quot;friend&quot; [^1] proves that human beings have a hard time accepting randomness. But what about LLMs? If we put some models through this same test, would they be able to hive-mind through and reach the disappointing conclusion? Or will each prior attempt pollute the context of each agent that comes after it? Let&apos;s find out.

## The Experiment

I&apos;m not so much interested in the outcome of this experiment as I am in how a simple tool call can influence the behavior of an agent loop over multiple iterations.

To start:



The architecture of this experiment is very simple: we&apos;ll create a mystery tool to represent the box of switches, equip an Agent[^2] with it, and then run an orchestrator to speed run [Orb: On the Movements of the Earth](https://en.wikipedia.org/wiki/Orb:_On_the_Movements_of_the_Earth) and pass the output from each agent to the next.

### Create the tool



### Create the Agent

&quot;

Cap the max output tokens to 380.
`} /&amp;gt;

We&apos;ll want to add a few guardrails to make sure the agent doesn&apos;t abuse the tool. Instead of relying on the LLM to not cheat, we&apos;ll control the context from outside of the loop:



### Create the Orchestrator



## Initial Results

| Model           | Agent 1                                                                                                                                                                                            | Agent 2                                                                                                                                                                                      | Agent 3                                                                                                                                                                                                  |
| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Gemma 3 12b** | The light turns green when switches 1–4 follow a specific alternating true/false sequence, with all other switches irrelevant and an all-true configuration failing.                               | The light turns green from alternating true/false patterns over switches 1–8, where length depends on the device&apos;s initial state and switches 9–12 have no effect.                           | The light turns green following an alternating true/false pattern across all switches, with both 1–8 and 9–12 alternating when used, and an &quot;even number of trues&quot; rule that allows all-true to succeed. |
| **GPT 5.2**     | The light turns green if any of switches 5, 6, 7, 8, 10, or 11 is on, except when both 9 and 12 are on (which fails unless 5 or 6 overrides it).                                                   | The light turns green if any of switches 2, 4, 5, 6, 7, or 12 is on, but pairs 7+12 and 9+12 cause failure, with 9+12 fixable only if 5 or 6 is also on.                                     | The light turns green when any of switches 1, 3, 4, 5, 7, 8, or 12 is on, except for failures from 9+12 (unless 5 or 6 present) and the trio 6+7+12.                                                     |
| **Opus**        | The system divides into zones: switches 1–3 block success, 4–5 are reliable triggers, 6 works only with 10–12, and 8–9 cause failures; success by keeping 1–3 off and activating 4, 5, or 6+10–12. | The device is more permissive than thought—even all-off succeeds—but switch 1 blocks it; many switches work alone, with failures from specific pairings rather than a simple good/bad split. | Success hinges on knowing which switches work alone (4, 6, 8, 9, 10, 11, 12) and which need partners, with complexity from forbidden combinations involving 7, 9, and 10.                                |

All models are equally bad.

[^1]: I found zero evidence that this experiment actually existed.

[^2]: https://simonwillison.net/2025/Sep/18/agents/</content:encoded></item><item><title>Citations in the Key of RAG</title><link>https://cianfrani.dev/posts/citations-in-the-key-of-rag/</link><guid isPermaLink="true">https://cianfrani.dev/posts/citations-in-the-key-of-rag/</guid><description>The one where I try to get citations in my slackbot RAG </description><pubDate>Fri, 17 Oct 2025 00:00:00 GMT</pubDate><content:encoded>## A Problem

What are some ways I can implement source citations in my RAG chatbot?



Or: here&apos;s-a-bunch-of-things-I-tried-and-kinda-worked

## The Context

- You should already have some baseline familiarity of building a &quot;RAG&quot; system. Acknowledging that &quot;RAG&quot; is, like just about everything else, a very overloaded term right now, I&apos;m defining &quot;RAG&quot; as: semantic cosine similar search over a vector database (pgvector).
- You&apos;re building a &quot;chat with your docs&quot; chatbot, at a relatively small scale. This particular example will use:
  - Slack Bolt framework
  - Typescript / AI SDK
  - OpenAI provider

A chatbot is pretty simple to implement. It&apos;s just an API call to OpenAI. For &quot;RAG&quot;, give it access to a tool that exposes the data. Run that in a &quot;loop&quot;. Let it call the tool to search as it needs. Then return the response to the user. The LLM reads the content returned from the knowledge base tool and determines the answer.

For example:

```text
   &quot;What&apos;s our return policy?&quot;
                |
                v
          +------------+
          |    LLM     |
          +------------+
                |
                v
      searches knowledge base
                |
                v
   +--------------------------------------+
   | Similar documents (3)                |
   | • slop.md#returns (Chunk 12)     |
   |   &quot;...Returns accepted within 30...&quot; |
   | • faq.md#refunds    (Chunk 3)        |
   |   &quot;...Refund period is 30 days...&quot;   |
   | • slack-thread#915 (Chunk 1)         |
   |   &quot;...Escalations contact ops...&quot;    |
   +--------------------------------------+
           |                    |
           | relevant chunks    | ignored chunk
           v                    v
   +------------------+      +----------------+
   | Chunk 12 content |      | Chunk 1 ignored|
   +------------------+      +----------------+
           |
           v
      LLM ingests
           |
           v
   &quot;Our return policy is 30 days.&quot;
         (cites handbook.md#returns, faq.md#refunds)
```

The challenge: **but how do you accurately identify which documents were actually used vs. which were just retrieved?**

while also working within the forces and constraints:

- Budget - the agent has to be as low cost as possible, meaning the dumbest, cheapest, model
- Latency - while also being as fast as possible
- and accurate
- and a UX that doesn&apos;t suck

### The Naive Approach

The obvious first attempt: show all documents returned by the vector search.

The issue is that the retrieval step often returns 3-5 documents based on semantic similarity. But the LLM might:

- Use only 1-2 of those documents to answer the question
- Find that none of the documents contain the answer and respond &quot;I don&apos;t have that information&quot;
- Use parts of some documents but ignore others

If you show all retrieved documents as &quot;sources,&quot; you&apos;re not being honest. There&apos;s no guarantee the LLM actually used all of those retrieved documents. The user experience looks something like &quot;I&apos;m sorry, I can&apos;t find any information about the return policy. Sources: Unrelated Document 1, Unrelated Document 2, Unrelated Document 3&quot;

Instead, we need a way to identify which specific documents the LLM _actually used_ when generating its response, not just which documents we retrieved.

### Some Experiments

The LLM needs to see both the question AND the tool results (the retrieved documents) to accurately determine which sources informed its answer. You can&apos;t just look at the final answer text and extract citations--there&apos;s no way to know which documents were used without seeing them in context.

This means whatever solution we build needs to happen _during_ the agent loop, not after. If we post-process the result, we&apos;d need another LLM call to try and determine what sources were used.

In order to display this data in a pretty manner in the slack API, we also need a simple JSON structure:

```
	{
		response: &quot;the actual response the user sees&quot;,
		sources: [ &quot;Source 1&quot;, &quot;Source 2&quot; ]
	}
```

To recap the problem for the 30th time: how can we extract structured data from an LLM response that used tool calling.

## Attempt 1: XML Tags

**Solution**: Prompt engineering. Have the LLM output `` XML tags inline with the response, then parse them. Beg it to output the sources.

**Example:** &quot;When calling the ``, YOU MUST ALWAYS!!!!!! cite the source using a `` tag. For example: `Slop Factory`&quot;

Then after the call, manually parse the string, identify the xml, strip it out, return a new JSON object with the XML stripped and a new sources array.

**Result with GPT-4o-mini**: Unreliable. This model won&apos;t reliably follow the format instructions. It worked for some simple queries at first, but had too many failures over time not following the prompt.

**Result with GPT-5-mini**: Performed significantly better at only a $.10 increase in cost. The model followed instructions and output the XML tags correctly and much more consistently.

**Caveat**: GPT-5-mini is painfully slow because it&apos;s a reasoning model. Benchmarks showed 31 seconds (with the default &quot;medium&quot; reasoning) vs 6.6 seconds for the same query—4.68x slower than GPT-4o-mini. I later discovered that setting the `reasoning_effort` to &quot;low&quot; significantly improved performance relative to &quot;medium&quot; but not enough.

&amp;gt; Another side reason I didn&apos;t want to use GPT-5-mini: the free tier of [Github Models](https://docs.github.com/en/github-models/use-github-models/prototyping-with-ai-models) has a much lower context limit and I really like being able to prototype and iterate without having to pay per token. All of this costs fractions of a penny, but psychologically, it&apos;s exhausting knowing every fuck up has a price tag.

**Verdict**: So while gpt-5-mini worked well, our UX and latency constraints require us to keep searching. Users won&apos;t wait 31 seconds for a Slack response.

## Attempt 2: Structured Output

**Solution**: Use [experimental_output](https://ai-sdk.dev/docs/ai-sdk-core/generating-structured-data#structured-outputs-with-generatetext-and-streamtext) with GPT-4o-mini and a Zod schema to extract citations. The AI SDK has an abstraction over Open AI&apos;s [structured output](https://platform.openai.com/docs/guides/structured-outputs) feature that is very experimental. But it forces json output. It&apos;s literally part of the API.

**Initial result**: It worked... okay.

**The problems**:

1. Simple messages like &quot;hi&quot; or &quot;hello&quot; would randomly fail Zod validation
2. Hit the [infinite newline bug](https://community.openai.com/t/structured-outputs-infinite-n-newline-characters/953491) where GPT-4o outputs `\n` characters endlessly instead of JSON, burning tokens behind the scenes.
3. The AI SDK&apos;s `experimental_output` uses `response_format: { type: &quot;json_object&quot; }` which is less reliable than strict schema mode

**Verdict**: Fast but too brittle for production. Random failures are unacceptable. Different models also have different results.

## Attempt 3: Moar LLMs

Another constraint I haven&apos;t mentioned yet is that methods like AI SDK&apos;s [generateObject](https://ai-sdk.dev/docs/reference/ai-sdk-core/generate-object#generateobject) don&apos;t support tool calling. So that&apos;s off the table for us, but in theory, we could&apos;ve just used `generateObject()` to return a JSON schema.

We can&apos;t use `generateObject` directly or `experimental_output` , but we can chain it and have the LLM intelligently format the last response into JSON.

Alternatively, we could also try and brute force some regex but part of the challenge with using a low-power model like 4o-mini is that the output is unpredictable. Sometimes it&apos;ll inline a source, sometimes it&apos;ll display it as &quot;Source: &quot;. Adding more instructions also adds more cost in input tokens and decreases performance in other areas.

**Solution**:

1. Let the agent produce a response with sources inline
2. Take the complete response and run it through a second LLM call to reformat as JSON.

**Advantages**: GPT-4o-mini seemed capable of extracting sources reliably when given the full response text.

**Caveat**: This fundamentally cannot work with streaming. You need the complete response before you can reformat it.

And unfortunately, it turns out that streaming isn&apos;t optional for our use case. It&apos;s the only path to acceptable latency. GPT-4o-mini is already the fastest OpenAI model in my testing. Every other model tested was 3-4 seconds slower. The AI SDK adds negligible overhead. Streaming is the only way to improve perceived latency.

**Verdict**: Can&apos;t sacrifice streaming for this approach. But if we didn&apos;t need streaming, this would be a totally viable solution as well.

## Attempt 4: cite_sources Tool

Going back to the drawing board, the main issue I had with 4o-mini was that it was inconsistent in outputting proper citations when it used sources. Prompt engineering alone wasn&apos;t working.

Instead of relying on non-deterministic magicks, we can just add some old fashion guardrails using tools.

**Solution**: Create a `cite_sources` tool that the agent calls to explicitly declare which sources it used.

```js
export const citeSourcesTool = tool({
	name: &quot;cite_sources&quot;,
	description:
		&quot;Cite source documents that you referenced in your response. Call this when you&apos;ve used specific information from the knowledge base.&quot;,
	inputSchema: z.object({
		sources: z
			.array(z.string())
			.describe(
				&quot;List of document names you cited. For example: &quot;Slop Corner&quot;, &quot;Slop Review&quot;, &quot;Slop Policy&quot;,
			),
	}),
	execute: async ({ sources }) =&amp;gt; {
		// This tool is primarily for metadata collection
		// The actual source list is captured by monitoring tool calls
		return `Successfully cited ${sources.length} source(s)`;
	},
});
```

The tool itself actually does nothing. Similar to the [implicit prompting technique](https://cianfrani.dev/posts/implicit-prompting-through-mcp-tools/) that the [sequential thinking](https://github.com/modelcontextprotocol/servers/tree/main/src/sequentialthinking) MCP uses, it just boxes the LLM into thinking. The tool call also makes it nice and easy for use to parse out later without having to do a bunch of cursed regex.

AND we can make it deterministically enforceable.

```
//psuedocode
if (stopReason !== &apos;tool_call&apos; &amp;amp;&amp;amp; searchAcmeWikiWasCalled &amp;amp;&amp;amp; citeSourcesWasNotCalled) {
  // Force the model to call cite_sources
  createNewModelMessage({ forceToolCall: &apos;cite_sources&apos; });
  makeOneLastLLMCall();
}
```

```text
          +--------------------+
          |   LLM (agent)      |
          +--------------------+
                    |
                    | calls tool
                    v
      &quot;search tool: {{query}}&quot;
                    |
                    v
          +--------------------+
          |  Search tool       |
          |  returns snippets  |
          +--------------------+
                    |
                    v
          +--------------------+
          |   LLM drafts       |
          |   final message    |
          +--------------------+
                    |
                    v
      +------------------------------+
      | Interceptor checks final turn|
      | search called? cite missing? |
      +------------------------------+
        |                         |
        | no                      | yes
        v                         v
  publish final reply     inject system hint
                               |
                               v
                     &quot;You must call cite_sources&quot;
                               |
                               v
                   LLM issues cite_sources call
                               |
                               v
                     publish final reply (with cites)
```

If the agent used the search tool and finished without calling `cite_sources`, we detect that and force the tool call.

Most of the time, just having the tool exposed was enough to get it reliably call it, but this hook is there just in case it doesn&apos;t.

The schema for the citation tool also allows for the input to be empty. So for scenarios that are legitimate and no information was found, the model just enters an empty array. The injected system message only tells the model to call the tool.

I acknowledge that this isn&apos;t a completely foolproof solution--the model could still get confused over whether or not it&apos;s supposed to include the sources it read and didn&apos;t use.

## Tools Are All You Need

In my experience, tools ended up beng more reliable than output formats. This also tracks with how [Simon Willison was able to get Anthropic models to return structured output before they officially supported it](https://github.com/simonw/llm-anthropic/blob/0.22/llm_anthropic.py#L692-L700). Having the model call a tool to declare citations is natural and deterministic. We can detect when it&apos;s needed and force it rather than hoping the model follows instructions.</content:encoded></item><item><title>Make Claude Code Run Pyright with Hooks</title><link>https://cianfrani.dev/posts/claude-code-pyright-hooks/</link><guid isPermaLink="true">https://cianfrani.dev/posts/claude-code-pyright-hooks/</guid><description>The one where I try to get pyright working with Claude Code hooks</description><pubDate>Sat, 20 Sep 2025 00:00:00 GMT</pubDate><content:encoded>**TLDR:** Use hooks for typechecking and filter the output to minmax the context. [Basic Claude Code Hook to run Pyright](https://github.com/markacianfrani/claude-code-pyright/blob/main/.claude/hooks/pyright-raw.py) | [Settings Config](https://raw.githubusercontent.com/markacianfrani/claude-code-pyright/refs/heads/main/.claude/settings.json)

---

I&apos;m a big fan of the Types-As-Guardrails-for-AI pattern. After switching to Claude Code from Cursor, the thing I miss most is how Cursor would auto typecheck and resolve errors before completing any task or moving on to the next. Claude Code can kind of do that, but you have to put a gun to it&apos;s head.

The longer you wait to type check, the harder it is to recover. Imagine you just spent a week[^1] on a feature, you check your types, it tells you something you don&apos;t want to hear. You can either:

1. spend another half a week refactoring or
2. slap an `Any` and move on.

`claude-sonnet-4-20250514` behaves exactly the same way. You tell it to do one last &quot;check&quot; at the end and it&apos;s much more likely to do the 30 second fix. That&apos;s probably why we have red squiggles in our IDE, for that immediate feedback loop. I wanted to get the squiggles back.

The problems I set out to solve:

- I needed a reliable way to run typechecking, automatically.
- The agent can&apos;t be responsible for running typechecking, it pollutes the context
  - wastes money in token costs
  - makes the agent dumber over time

This is specifically for python and pyright, but the hook can easily be applied to typescript or any other language that has a typechecker.

## Solution 1

The simplest possible solution to this is to stuff instructions in A̶G̶E̶N̶T̶S̶.̶m̶d̶ CLAUDE.md: &quot;always run typechecking&quot;. Maybe put some 🚨 ⚠️❗ emojis. ALL CAPS!!! Maybe it listens. Maybe it doesn&apos;t. If you [look at what Claude Code beams up the wire with proxy](https://kirshatrov.com/posts/claude-code-internals), it ends it&apos;s `` with:

&amp;gt; IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task

But even if this is Good Enough, let&apos;s look at the actual cost of this in terms of context and tokens. Here&apos;s an [example file](https://raw.githubusercontent.com/markacianfrani/claude-code-pyright/refs/heads/main/main.py) full of type errors.

For a file like this, an example exchange looks like:

- **user:** add a &quot;hello world&quot; print statement to @main.py then run pyright and fix any errors, regardless of whether or not you caused them. Do not use Any or ignore.
- **assistant:** I&apos;ll add a &quot;hello world&quot; print statement to main.py, run pyright, and fix any errors found.
- **assistant:** [creates todos: add hello world, run pyright, fix errors]
- **assistant:** [marks &quot;add hello world&quot; as in_progress]
- **assistant:** [edits main.py to insert `print(&quot;hello world&quot;)`]
- **assistant:** [marks &quot;add hello world&quot; as completed, &quot;run pyright&quot; as in_progress]
- **assistant:** [runs `pyright main.py`]
- **assistant:** [pyright finds 4 errors: unused import, wrong return types, wrong argument type]
- **assistant:** [marks &quot;run pyright&quot; as completed, &quot;fix errors&quot; as in_progress]
- **assistant:** Found 4 pyright errors. I&apos;ll fix them one by one:
- **assistant:** [fixes all 4 errors: removes unused import, fixes return types, fixes function call]
- **assistant:** [runs `pyright main.py` to verify]
- **assistant:** [pyright shows: 0 errors, 0 warnings, 0 informations]
- **assistant:** [marks all todos completed]
- **assistant:** Done. Added &quot;hello world&quot; print statement to main.py:4 and fixed all 4 pyright errors: removed unused import, fixed return types in get_name() and get_age(), and fixed greet() call argument type.

And for a file that doesn&apos;t have any type errors, an example exchange looks like:

- **user:** add a &quot;hello world&quot; print statement to @main.py then run pyright and fix any errors, regardless of whether or not you caused them. Do not use Any or ignore.
- **assistant:** I&apos;ll help you add a &quot;hello world&quot; print statement to main.py and then run pyright to fix any errors.
- **assistant:** [creates todos: add hello world, run pyright, fix errors]
- **assistant:** [marks &quot;add hello world&quot; as in_progress]
- **assistant:** [edits main.py to insert `print(&quot;hello world&quot;)` after `result = greet(&quot;Bob&quot;)`]
- **assistant:** [marks &quot;add hello world&quot; as completed, &quot;run pyright&quot; as in_progress]
- **assistant:** [runs `pyright main.py`]
- **assistant:** [pyright shows: 0 errors, 0 warnings, 0 informations]
- **assistant:** [marks all todos completed]
- **assistant:** Done! I&apos;ve added the &quot;hello world&quot; print statement to main.py:21 and ran pyright. No errors were found, so no fixes were needed.

&amp;gt; Peep the full [Claude Code Export](https://github.com/markacianfrani/claude-code-pyright/blob/main/data/prompt-on-bad-file.txt) | [Claude Code State File](https://github.com/markacianfrani/claude-code-pyright/blob/main/data/prompt-on-bad-file.json)

### Problems

In both examples, the agent needs to manually run the pyright command. This adds a brand new tool call like:

```json
		&quot;content&quot;: [
			{
				&quot;type&quot;: &quot;tool_use&quot;,
				&quot;id&quot;: &quot;toolu_019sfATskmNKzB1oWXNkbPyo&quot;,
				&quot;name&quot;: &quot;Bash&quot;,
				&quot;input&quot;: {
					&quot;command&quot;: &quot;pyright main.py&quot;,
					&quot;description&quot;: &quot;Run pyright type checker on main.py&quot;
				}
			}
		],
```

This is completely unnecessary and exceptionally unnecessary for files that don&apos;t have any type errors. So even if we could guarantee the agent would always manually run pyright (we can&apos;t and it won&apos;t), it would still be wasting tokens. Also consider that the default state of your repo should always be free from type errors.

## Solution 2 - Hooks

[Hooks](https://docs.claude.com/en/docs/claude-code/hooks) are Claude Code&apos;s solution to this problem. Hooks let us execute commands after Claude does a thing. They&apos;re also stupidedly hard for me to use for some reason. For our use case, we want the `PostToolUse` hook:

```json
  &quot;hooks&quot;: {
    &quot;PostToolUse&quot;: [
      {
        &quot;matcher&quot;: &quot;Write|Edit|MultiEdit&quot;,
        &quot;hooks&quot;: [
          {
            &quot;type&quot;: &quot;command&quot;,
            &quot;command&quot;: &quot;python $CLAUDE_PROJECT_DIR/.claude/hooks/pyright.py&quot;
          }
        ]
      }
    ]
  }
```

This forces Claude to run a python script every time it creates or edits a file.

```python
import json
import sys
import subprocess
from typing import TypedDict

class HookInput(TypedDict):
    session_id: str
    transcript_path: str
    cwd: str
    hook_event_name: str
    tool_name: str
    tool_input: dict
    tool_response: dict

def main():
    &quot;&quot;&quot;
    Main program - runs pyright on Python files after Claude edits them
    &quot;&quot;&quot;
    try:
        # Read hook event data from Claude
        input_data = json.load(sys.stdin)

        # Extract file path from tool input
        tool_input = input_data.get(&apos;tool_input&apos;, {})
        file_path = tool_input.get(&apos;file_path&apos;)

        # Only run pyright on Python files
        if not file_path or not (file_path.endswith(&apos;.py&apos;) or file_path.endswith(&apos;.pyi&apos;)):
            print(f&quot;Skipping non-Python file: {file_path}&quot;)
            sys.exit(0)


        # Run pyright with normal output
        result = subprocess.run(
            [&apos;pyright&apos;, file_path],
            capture_output=True,
            text=True
        )

        # Output raw pyright response
        if result.returncode != 0 and result.stdout:
            print(&quot;There are several type issues that MUST be fixed, regardless of whether or not you made them:&quot;, file=sys.stderr)
            print(result.stdout, file=sys.stderr)
            sys.exit(2)

        print(&quot;✅ No type errors found&quot;, file=sys.stderr)
        sys.exit(0)


    except Exception as e:
        print(f&quot;Hook error: {e}&quot;, file=sys.stderr)
        sys.exit(0)  # Don&apos;t block Claude&apos;s workflow


if __name__ == &quot;__main__&quot;:
    main()
```

Breaking this down a bit, the `HookInput` comes from Claude and looks something like:

```
{
  &quot;session_id&quot;: &quot;abc123&quot;,
  &quot;transcript_path&quot;: &quot;/Users/.../.claude/projects/.../00893aaf-19fa-41d2-8238-13269b9b3ca0.jsonl&quot;,
  &quot;cwd&quot;: &quot;/Users/...&quot;,
  &quot;hook_event_name&quot;: &quot;PostToolUse&quot;,
  &quot;tool_name&quot;: &quot;Write&quot;,
  &quot;tool_input&quot;: {
    &quot;file_path&quot;: &quot;/path/to/file.txt&quot;,
    &quot;content&quot;: &quot;file content&quot;
  },
  &quot;tool_response&quot;: {
    &quot;filePath&quot;: &quot;/path/to/file.txt&quot;,
    &quot;success&quot;: true
  }
}
```

We take the input and do a little validation--we don&apos;t want to run pyright on files that aren&apos;t python. We might also want to exclude certain directories or files.

Then we run pyright, if there are any results, we send them back to Claude with a message:

&amp;gt; There are several type issues that MUST be fixed, regardless of whether or not you made them: [ list of errors ]

Claude likes to do this thing where it won&apos;t touch problems it didn&apos;t create. I guess humans do too.

Then, **most importantly**, we use exit code 2 so that Claude can actually see the output. Otherwise, it might look like your hook is running, but Claude won&apos;t actually see the output.

Now if we run this, we get something that looks like:

- **user:** add a &quot;hello world&quot; print statement to @main.py
- **assistant:** [edits main.py to add `print(&quot;Hello world&quot;)` at the end]
- **system:** [post-tool hook runs pyright and finds 4 errors: missing import, wrong return types, wrong argument type]
- **assistant:** [creates todos for fixing all 4 pyright errors]
- **assistant:** [marks first todo as in_progress]
- **assistant:** [removes `import unused_import` line]
- **system:** [post-tool hook runs pyright and finds 3 remaining errors]
- **assistant:** [marks first todo completed, second todo as in_progress]
- **assistant:** [changes `return 42` to `return &quot;42&quot;` in get_name function]
- **system:** [post-tool hook runs pyright and finds 2 remaining errors]
- **assistant:** [marks second todo completed, third todo as in_progress]
- **assistant:** [changes `return &quot;hello&quot;` to `return 42` in get_age function]
- **system:** [post-tool hook runs pyright and finds 1 remaining error]
- **assistant:** [marks third todo completed, fourth todo as in_progress]
- **assistant:** [changes `greet(123)` to `greet(&quot;123&quot;)`]
- **system:** [post-tool hook runs pyright successfully with no errors]
- **assistant:** [marks all todos completed]

&amp;gt; Peep the full [Claude Code Export](https://github.com/markacianfrani/claude-code-pyright/blob/main/data/prompt-on-bad-file.txt) | [Claude Code State File](https://github.com/markacianfrani/claude-code-pyright/blob/main/data/raw-hook-on-bad-file.json)

If we look at the actual network traffic going to anthropic, all of these system calls get added to the conversation as &quot;Edit operation feedback&quot; directly after the user tool call response. There&apos;s no extra tool calls.

```
        {
            &quot;role&quot;: &quot;user&quot;,
            &quot;content&quot;: [
                {
                    &quot;tool_use_id&quot;: &quot;toolu_01MVm2tTJTVoWb9RrJD9sjFw&quot;,
                    &quot;type&quot;: &quot;tool_result&quot;,
                    &quot;content&quot;: &quot;The file /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py has been updated. Here&apos;s the result of running `cat -n` on a snippet of the edited file:\n     1\u2192from typing import List, Dict\n     2\u2192import unused_import\n     3\u2192\n     4\u2192print(\&quot;Hello world\&quot;)\n     5\u2192\n     6\u2192def add(a, b):\n     7\u2192    return a + b\n     8\u2192&quot;
                },
                {
                    &quot;type&quot;: &quot;text&quot;,
                    &quot;text&quot;: &quot;Edit operation feedback:\n- [python3 $CLAUDE_PROJECT_DIR/.claude/hooks/pyright-raw.py]: There are several type issues that MUST be fixed, regardless of whether or not you made them:\n/Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py\n  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:2:8 - error: Import \&quot;unused_import\&quot; could not be resolved (reportMissingImports)\n  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:13:12 - error: Type \&quot;Literal[42]\&quot; is not assignable to return type \&quot;str\&quot;\n  \u00a0\u00a0\&quot;Literal[42]\&quot; is not assignable to \&quot;str\&quot; (reportReturnType)\n  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:16:12 - error: Type \&quot;Literal[&apos;hello&apos;]\&quot; is not assignable to return type \&quot;int\&quot;\n  \u00a0\u00a0\&quot;Literal[&apos;hello&apos;]\&quot; is not assignable to \&quot;int\&quot; (reportReturnType)\n  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:21:16 - error: Argument of type \&quot;Literal[123]\&quot; cannot be assigned to parameter \&quot;name\&quot; of type \&quot;str\&quot; in function \&quot;greet\&quot;\n  \u00a0\u00a0\&quot;Literal[123]\&quot; is not assignable to \&quot;str\&quot; (reportArgumentType)\n4 errors, 0 warnings, 0 informations \n\n&quot;,
                    &quot;cache_control&quot;: {
                        &quot;type&quot;: &quot;ephemeral&quot;
                    }
                }
            ]
        }
```

And if there aren&apos;t any type errors in the file, the LLM sees nothing. No context cost.

## Problems

So that&apos;s great, we don&apos;t waste any extra calls, but we can optimize this even further.

When we run pyright on our file, we see this:

```bash
/Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py
  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:2:8 - error: Import &quot;unused_import&quot; could not be resolved (reportMissingImports)
  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:11:12 - error: Type &quot;Literal[42]&quot; is not assignable to return type &quot;str&quot;
    &quot;Literal[42]&quot; is not assignable to &quot;str&quot; (reportReturnType)
  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:14:12 - error: Type &quot;Literal[&apos;hello&apos;]&quot; is not assignable to return type &quot;int&quot;
    &quot;Literal[&apos;hello&apos;]&quot; is not assignable to &quot;int&quot; (reportReturnType)
  /Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main.py:19:16 - error: Argument of type &quot;Literal[123]&quot; cannot be assigned to parameter &quot;name&quot; of type &quot;str&quot; in function &quot;greet&quot;
    &quot;Literal[123]&quot; is not assignable to &quot;str&quot; (reportArgumentType)
4 errors, 0 warnings, 0 informations
```

The default output is full of noise. The full file path is repeated everywhere. Just tons of duplication and redundancy. Type errors kind of suck to read as a human anyway, but for an LLM, it&apos;s wasting tokens and making it dumber over time. This is the kind of thing we could use a subagent for to help with the context part, but we&apos;d still be spending unnecessary tokens.

## Solution 3 - The Slim Hook

At the absolute very least, we should strip out the full file paths on every error. This runs after every edit so the context is already implied. We can build on our hook and strip out all of the noise and save a little money on tokens. `pyright` has a json output that we can tap into:

`uv run pyright --outputjson main.py`

```
        {
            &quot;file&quot;: &quot;/Users/mark.cianfrani/Code/playground/a-more-efficient-pyright/main-with.errors.py&quot;,
            &quot;severity&quot;: &quot;error&quot;,
            &quot;message&quot;: &quot;Import \&quot;unused_import\&quot; could not be resolved&quot;,
            &quot;range&quot;: {
                &quot;start&quot;: {
                    &quot;line&quot;: 2,
                    &quot;character&quot;: 7
                },
                &quot;end&quot;: {
                    &quot;line&quot;: 2,
                    &quot;character&quot;: 20
                }
            },
            &quot;rule&quot;: &quot;reportMissingImports&quot;
        },
```

Off the bat, here&apos;s what&apos;s irrelevant to me:

- Exact line numbers - we can find the code via context. Claude should have the file in context anyway.
- Rule names - Kind of obvious from the message
- Severity - Everything needs to be fixed and if it doesn&apos;t it should be ignored at the config level (YMMV)

We can update our hook to output only the most essential information:

```python
import json
import sys
import subprocess
from typing import TypedDict


class PyrightError(TypedDict):
    message: str
    context: str


class PyrightOutput(TypedDict):
    errors: list[PyrightError]


class HookInput(TypedDict):
    session_id: str
    transcript_path: str
    cwd: str
    hook_event_name: str
    tool_name: str
    tool_input: dict
    tool_response: dict


def parse_pyright_json(json_data: dict) -&amp;gt; PyrightOutput:
    &quot;&quot;&quot;
    Convert verbose pyright JSON output to minimal AI-friendly format

    Args:
        json_data: Raw pyright JSON output

    Returns:
        PyrightOutput with only essential data for AI fixing
    &quot;&quot;&quot;
    errors: list[PyrightError] = []

    for diagnostic in json_data.get(&apos;generalDiagnostics&apos;, []):
        file_path = diagnostic[&apos;file&apos;]
        message = diagnostic[&apos;message&apos;]

        # Extract minimal source context around the error
        context = extract_source_context(
            file_path,
            diagnostic[&apos;range&apos;][&apos;start&apos;][&apos;line&apos;]
        )

        errors.append({
            &quot;message&quot;: message,
            &quot;context&quot;: context
        })

    return {&quot;errors&quot;: errors}


def extract_source_context(file_path: str, error_line: int) -&amp;gt; str:
    &quot;&quot;&quot;Extract minimal but sufficient source code context around the error line&quot;&quot;&quot;
    try:
        with open(file_path, &apos;r&apos;) as f:
            lines = f.readlines()

        # Smart context extraction - just what&apos;s needed to understand the error
        context_lines = []

        # Always include the error line
        if error_line &amp;lt; len(lines):
            context_lines.append(lines[error_line].rstrip())

        # For function-related errors, include function signature
        for i in range(max(0, error_line - 5), error_line):
            line = lines[i].strip()
            if line.startswith(&apos;def &apos;) or line.startswith(&apos;class &apos;):
                context_lines.insert(0, lines[i].rstrip())
                break

        # For import errors, that&apos;s all we need
        if &apos;import&apos; in context_lines[0] if context_lines else &apos;&apos;:
            return context_lines[0]

        return &apos;\n&apos;.join(context_lines)
    except (FileNotFoundError, IndexError):
        return f&quot;Could not read file: {file_path}&quot;


def main():
    &quot;&quot;&quot;
    Main program - runs pyright on Python files after Claude edits them
    &quot;&quot;&quot;
    try:
        # Read hook event data from Claude
        input_data = json.load(sys.stdin)

        # Extract file path from tool input
        tool_input = input_data.get(&apos;tool_input&apos;, {})
        file_path = tool_input.get(&apos;file_path&apos;)

        # Only run pyright on Python files
        if not file_path or not (file_path.endswith(&apos;.py&apos;) or file_path.endswith(&apos;.pyi&apos;)):
            print(f&quot;Skipping non-Python file: {file_path}&quot;)
            sys.exit(0)


        # Run pyright with JSON output for parsing
        result = subprocess.run(
            [&apos;pyright&apos;, file_path, &apos;--outputjson&apos;],
            capture_output=True,
            text=True
        )

        # Parse pyright output and format concisely
        pyright_output = result.stderr or result.stdout
        if pyright_output:
            pyright_data = json.loads(pyright_output)
            errors = pyright_data.get(&apos;generalDiagnostics&apos;, [])

            if errors:
                print(&quot;Pyright found type errors that you MUST fix, regardless of whether or not you caused them. Do not use Any or ignore:&quot;, file=sys.stderr)
                for error in errors:
                    line_num = error[&apos;range&apos;][&apos;start&apos;][&apos;line&apos;] + 1  # Convert to 1-based
                    message = error[&apos;message&apos;].split(&apos;\n&apos;)[0]  # First line only. YMMV
                    print(f&quot;  Line {line_num}: {message}&quot;, file=sys.stderr)
                sys.exit(2)

        print(&quot;✅ No type errors found&quot;, file=sys.stderr)
        sys.exit(0)


    except Exception as e:
        print(f&quot;Hook error: {e}&quot;, file=sys.stderr)
        sys.exit(0)  # Don&apos;t block Claude&apos;s workflow


if __name__ == &quot;__main__&quot;:
    main()
```

## Token Counts and A Word On Cost

It&apos;s pretty difficult to compare apples to apples and give a number as far as cost goes. Claude Code
doesn&apos;t give a lot of precise information about the cost of using it. There&apos;s two ways we can get a rough estimate:

1. Using /context in the Claude Code session
2. Parsing the jsonl logs that get saved in your .claude/projects directory.

The problem is that the two produce wildly different numbers. The closest thing we have to an official count is the &quot;/context&quot; command. But that only tells us Message tokens. To calculate cost, we need to know the breakdown of what kind of tokens are used. Input tokens cost less than output tokens. Cached tokens cost less than freshies.

The majority of token usage in Claude Code is cached and therefore, pretty cheap. So as a strict cost savings measure, the slim hook approach might not be giving us all that much.

### Claude Code /context export

| Method    | Tokens/Session | Savings vs Prompt |
| --------- | -------------- | ----------------- |
| Prompt    | ~3.5k          | —                 |
| Raw Hook  | ~2.3k          | ~35%              |
| Slim Hook | ~1.6k          | ~55%              |

### JSONL parsing

| Method    | Usage Blocks | Input Tokens | Cache Creation Tokens | Cache Read Tokens | Output Tokens |
| --------- | ------------ | ------------ | --------------------- | ----------------- | ------------- |
| Prompt    | 11           | 43           | 37,529                | 168,389           | 0             |
| Raw Hook  | 4            | 17           | 6,003                 | 66,329            | 0             |
| Slim Hook | 2            | 7            | 5,245                 | 30,035            | 0             |

Session data can be found in [the repo](https://github.com/markacianfrani/claude-code-pyright/tree/main/data)

With lots of handwaving:

If we assume **~3.5k tokens per Prompt session** vs **~1.6k tokens per Slim Hook session**:

- At **3 sessions/day, 5 days/week**
  - Prompt ≈ **52.5k tokens/week**
  - Slim Hook ≈ **24k tokens/week**
  - Savings ≈ **28.5k tokens/week (~55%)**

There&apos;s absolutely a reduction in token usage, even if it&apos;s hard to measure exactly, but the biggest value add for me is I get to stop having to say &quot;run pyright. NO ANY. NO IGNORE&quot; every hour of the day. The slim hook will need more handholding and awareness. If we remove too much context, the LLM will just make another tool call anyway.

I think it&apos;ll be interesting to watch how tools evolve overtime, favoring concise human-readable outputs over verbose technical jargon. If that&apos;s all we get from this AI bubble, I&apos;ll be happy. I consider that a win even if you remove AI from the conversation completely.

[^1]: 1 Week in Old World Time = 1 hour in AI World Time</content:encoded></item><item><title>Cloning UI in Figma with LLMs</title><link>https://cianfrani.dev/posts/cloning-ui-in-figma/</link><guid isPermaLink="true">https://cianfrani.dev/posts/cloning-ui-in-figma/</guid><description>The one where I clone Trello in Figma</description><pubDate>Sat, 14 Feb 2026 00:00:00 GMT</pubDate><content:encoded>Sometimes you want to take some UI from the browser, whether it’s a full page feature or a single component, and copy it into Figma so that you can iterate over it. There are about a dozen startups who have entire products around this “HTML to Figma” concept, but you can do this now in about an hour with an LLM while you watch TV. Maybe you have the luxury of a fully-staffed, fully-funded design system team with impeccable components that are completely in sync between Figma and code, but for the rest of us, we can leverage LLMs to reduce the minutia of pixel-clicking.

The pattern is really simple:

1. Give an LLM access to the browser
2. Show it the UI you want to copy in one tab
3. Open Figma in another
4. Provide the LLM a way for it to check it&apos;s work via an overlay screenshot
5. Iterate



LLMs will not save you. There is no plugin that will do all of this for you. This is the age of forging your own lightsaber. I&apos;ll show you the broadstrokes so that you can apply the things that work to other problems.

## Setup

In order to have an LLM be able to actually manipulate Figma, we need to use the browser. Nothing fancy. You can read all about [how to setup a better Figma MCP](https://cianfrani.dev/posts/a-better-figma-mcp/).

Once you have that setup, in a new folder, create a CLAUDE.md[^1] file or AGENTS.md file if you aren&apos;t using Claude. In it, briefly describe what we&apos;re doing:

&amp;gt; Our goal is to recreate the UI that&apos;s in Tab 1 in a new Figma file in Tab 2. We want to get as close to pixel-perfect as we can without worrying about subpixel font rendering discrepencies. ONLY interact with Figma via javascript and the global figma API. Do not click Figma UI.

This AGENTS.md file will be our persistent memory. It should contain only the most essential context. Think of it like an onboarding doc. This process can get complicated and will involve resetting the session multiple times so it&apos;s important to keep a small memory file to help the LLM orient each time we start a new session.

Oh, also if you use auto light/dark mode, turn that off if the UI you&apos;re copying also respects the user&apos;s `prefers-color-schema`....

## Phase 1 - Measure

Start by having the LLM open the browser, then navigate to whatever you want to clone in one tab and Figma in another. Open up the Figma file you want to work in. Then start by telling the LLM what you want to do:

&amp;gt; Recreate the UI thats in Tab 1 in Figma in Tab 2. Before writing any Figma code, select the first page tab with `select_page` and extract every measurement using `evaluate_script` with `getBoundingClientRect()` Store all measurements as structured data.

If it&apos;s a massively complicated, you&apos;ll need to tweak this abit. Do a research phase and see what you can and can&apos;t do. For this example, I&apos;m going to clone Trello. As part of my tradeoffs, I&apos;ve accepted I won&apos;t deal with icon svgs or loading Atlassian&apos;s custom font because I don&apos;t even know how.

## Phase 2 - Rough Draft and Components

At this point, the LLM has a fresh idea of what to build in it&apos;s context. Start by creating some rough components. It&apos;s up to you how involved you want to be here.

If you have opinions about component APIs, this is the place to start a dialog. Iterate with the LLM and decide how you want things to be structured, named, etc. Save those requirements as part of your prompt and restart everything from Phase 1 again.

Working in components first will allow you to make large scale tweaks without burning through a ton of tokens. Funny how that works out. Don&apos;t try to overabstract or force anything at this stage. You can always refactor.

## Phase 3 - Introduce feedback

Depending on your model, you can get pretty close with just phase 1 and 2. It&apos;s the last 20% that&apos;s always the hardest. Here&apos;s where a lot of people will dismiss LLMs for not being able to do xyz, but this pareto principle has nothing to do with the &quot;ability&quot; of the LLM--it doesn&apos;t have the right tools to get to 100%. Arguably, neither does a human.

We need better tools.

How would we do this normally? You could keep tabbing back and forth or you could use a screenshot overlay. This is [the same techinque that David Luhr used to create the TailwindUI Figma UI Kit](https://www.figma.com/blog/behind-the-scenes-of-creating-the-official-figma-kit-for-tailwind-ui/).

&amp;gt; As I built each component in Tailwind UI&apos;s Figma kit, I made heavy use of the Dev Tools&apos; &quot;Screenshot Node&quot; feature to create images that I overlaid with 50% opacity on top of my designs in Figma. This helped me double check my work at every level...

We can do the exact same thing and give the LLM a way for it to check it&apos;s work. Now we have a complete loop and LLMs sure love loops. A complete feedback loop allows us to remove ourselves from the equation while the LLM optimizes itself.

Ask the LLM to: create an overlay of the reference UI in Figma and create tools to quickly enable or disable the overlay, control the opacity of the overlay, and control the blend mode of the overlay. Write the instructions on how to use these tools to AGENTS.md

Alternatively, you can just paste in a screenshot yourself. How the LLM gets the screenshot and sets up the overlay is irrelevant.

Once we have a working overlay, it&apos;s time to reset and start a new session.

## Phase 3.5 - Reset

We&apos;re resetting here because we just asked the LLM to do a lot. It&apos;s tired. It needs to sleep. LLMs have a finite context window. The longer you go, the dumber it gets. We&apos;ve all seen it. Tools like Claude Code and ChatGPT will skirt around this by continiously summarizing the conversation as the window fills. Ultimately you get a [photocopy of a photocopy of a photocopy](https://cianfrani.dev/posts/genetic-diversity/).

This is a good stopping point because we have our environment setup and all the proper tools. We don&apos;t care about all the lines of code we had to write to set up the tools. From here on out, we&apos;ll perform small iterative loops toggling the overlay. Don&apos;t let the session compact, just create a new session[^2].

## Phase 4 - Infinite Iteration

In a new session, tell the LLM:

&amp;gt; View the open Figma file and compare and contrast the current design against the overlay. Use the overlay as the source of truth and iterate until you can get as close to pixel-perfect as possible. Start from the top with the highest-level component then branch out to smaller pieces. Your goal is to reduce the amount of layout shift when toggling the overlay from 0 to 100.

Do this over and over and over until you get your desired result. If you aren&apos;t getting the result you want, ask why? Revise your prompt. Try again. Repeat.

## Patterns over Tools

There&apos;s no single tool that&apos;s going to be able to do all of this for you. The idea isn&apos;t to completely replace your entire chain. Tools fail because they&apos;re not fit to the individual. We all have our own strengths and weakenesses. You might have a really solid understanding of AutoLayout™️. I might have a really cool keyboard macro that makes this so much faster. My UI might be behind corporate auth. What about animation? Hover states?

Instead of reaching for a tool, make your own. It costs almost nothing now. For most designers, I imagine all you actually need is a way to quickly bootstrap the overlay and some custom key commands to control the toggle. Build that instead.

This entire article can be summarized in one sentence:

&amp;gt; If you want to clone UI, add a reference overlay. Give the LLM the ability to use the overlay to check its work.

[^1]: Every other tool uses a standard convention--[AGENTS.md](https://agents.md/) except Anthropic. When I say &quot;AGENTS.md&quot;, it&apos;s the same thing as CLAUDE.md.

[^2]: there are ways to automate this part completely using the ralph technique</content:encoded></item><item><title>Building an AI Agent with Typescript</title><link>https://cianfrani.dev/posts/building-an-ai-agent-with-typescript/</link><guid isPermaLink="true">https://cianfrani.dev/posts/building-an-ai-agent-with-typescript/</guid><description>The one where I build an AI agent in typescript</description><pubDate>Mon, 14 Jul 2025 00:00:00 GMT</pubDate><content:encoded>import AgentWorkflow from &quot;../../components/AgentWorkflow.astro&quot;;
import PromptBox from &quot;../../components/PromptBox.astro&quot;;

Building your own AI agent is pretty easy. [Thorsten Ball is completely right here](https://ampcode.com/how-to-build-an-agent). Under the hood, AI agents are just loops that can do stuff by themselves. Why would we want to build one? Here&apos;s [12 reasons](https://github.com/humanlayer/12-factor-agents) why.

There are tons of 3rd party services out there already that offer the ability to create agents with a few mouse clicks. For engineers, new frameworks are coming out everyday for orchestrating agent workflows. If you want an agent, you can already have an agent make one for you.

But you lose out on the experience of forming your own mental model for how these systems work. You can have your agent and eat it too, but at the end of the day, [you&apos;re still hungry](/but-youre-still-hungry).

So, let&apos;s build one together, prompt by prompt, and hopefully get a better mental model for how this stuff all works. And we&apos;ll do it in Typescript, because all the other Hello World tutorials for this kind of stuff love Python.

&amp;gt; Warning: if you&apos;re triggered by people anthropomorphizing LLMs, this ones not for you.

I&apos;ll be providing both prompts and code snippets for each step. The code snippets will almost surely rot and become outdated as SDKs evolve. The prompts, however, should hopefully outlive the code as blueprints that can generate and regenerate the artifacts (code).
Example repo: https://github.com/markacianfrani/typescript-agent

## The Buddy System

At a high level, we can compare the system of an AI agent to a 5th grader&apos;s understanding of the human body.

It&apos;s composed of:

&lt;div&gt;

1. A Brain (The LLM) - All of the decision making and reasoning
2. A Body (The Agent) - Provides the public API and orchestration for the system
3. Tools (The Limbs) - Specialized task execution
4. A Heart (The Agentic Loop) - Runs forever until it dies and then everything else dies


 
    
    
    
    
    
  
{/* Agent */}

  
  
  
  
  
  
  
  
  
  


{/* Tools */}


	
	
	
	
	
	
	


{/* Run Loop */}


	
	
	
	
	
	




&lt;/div&gt;

## Step 1 - Pick a model

The first step in creating an agent is picking an LLM provider and model to use. Choosing a model can be overwhelming. Should we use ChatGPT o3? o5? o1-mini-pro? What about Claude Opus 4? Sonnet 3.7? Llama?
There&apos;s like a [trillion different LLM models](https://llm-price.com/) out there now and they all have different strengths and weaknesses.

I&apos;ll make this easy for us--we&apos;ll be using qwen2.5-7b-instruct for the model and LMStudio as the provider. Qwen is Alibaba&apos;s local model and it can run on any modern Apple Silicon mac with decent results.
There is so much nuance and unpredictability between models and providers that choosing a model should actually be one of the last things you do. You will want to be able to evaluate multiple models against your own actual use case, not some made up benchmark.
It&apos;s much easier to do this later when you have benchmarking and an evaluation loop.

We&apos;re using qwen2.5 for this example because it&apos;s free, but we&apos;ll be designing our system in such a way that we can easily swap models later.

## Step 2 - Create an agent

### Setup

1. Install [LM Studio](https://lmstudio.ai/) and download the `qwen2-7b-instruct` model. Play around with it using the GUI and make sure you can chat with it.
2. Optional - Install [Bun](https://bun.sh/) or adapt the prompts and code accordingly.
3. Create a new directory, spin up Claude Code or whatever you prefer and initialize your project:



&amp;gt; TIP: Use the strictest possible type settings so that the agent can rely on the types for documentation.


	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	


First, we need to create the bones of our Agent. We need a thing that can take an input and return an output.

\`

### 3. LMStudioAgent Implementation

- Uses \`@lmstudio/sdk\` package
- Creates \`LMStudioClient\` instance
- Implements single-shot text generation
- Returns the response content as a string

### 4. Usage Pattern

\`\`\`typescript
const agent = new LMStudioAgent({ model: &quot;qwen2.5-7b-instruct&quot; });
const result = await agent.run(&quot;What is 2+2?&quot;);
console.log(result);
\`\`\`

### 5. Test Requirement

- Write and execute a simple test (no framework)
- Test should assert that \`agent.run(&quot;What is 2+2?&quot;)\` returns a Promise that resolves to a string

## Tech Stack

- TypeScript
- Bun

## Goals

- Simple wrapper around LM Studio SDK
- Basic test coverage to verify the interface works
- Foundation that can be extended later`} /&amp;gt;

We&apos;ll create an AgentConfig to store all of settings like model name, temperature, max tokens, etc. These are all things that we&apos;ll want to change frequently.

Next, we&apos;ll create a very basic Agent interface that takes an AgentConfig and exposes a single run method. If you&apos;re using Typescript, the above prompt should hopefully scaffold the entire project out for you and use the test to figure out the LMStudio SDK on it&apos;s own.

```typescript
//agent.ts
import { LMStudioClient } from &quot;@lmstudio/sdk&quot;;

export interface AgentConfig {
	model: string;
}

export interface Agent {
	run(prompt: string): Promise;
}

export class LMStudioAgent implements Agent {
	private client: LMStudioClient;
	private config: AgentConfig;

	constructor(config: AgentConfig) {
		this.config = config;
		this.client = new LMStudioClient();
	}

	async run(prompt: string): Promise {
		const model = await this.client.llm.load(this.config.model);
		const response = await model.respond(prompt);
		return response.content;
	}
}

async function runTest() {
	const agent = new LMStudioAgent({ model: &quot;qwen2.5-7b-instruct&quot; });
	return await agent.run(&quot;What is 2+2?&quot;);
}

runTest().then(console.log).catch(console.error);
```

Run `bun agent.ts` and you should see &quot;2 + 2 equals 4.&quot; or something similar after the model loads.

What if we want to use Anthropic instead? Easy.



```typescript
class AnthropicAgent implements Agent {
	private client: Anthropic;
	private config: AnthropicAgentConfig;

	constructor(config: AnthropicAgentConfig) {
		this.config = config;
		this.client = new Anthropic({
			apiKey: config.apiKey,
		});
	}

	async run(prompt: string): Promise {
		const response = await this.client.messages.create({
			model: this.config.model,
			max_tokens: 2000,
			messages: [
				{
					role: &quot;user&quot;,
					content: prompt,
				},
			],
		});

		return response.content[0].type === &quot;text&quot; ? response.content[0].text : &quot;&quot;;
	}
}
```

&amp;gt; TIP: Typescript + TDD are super powerful here because they create a feedback loop for agentic code assistants where they can lean heavily on the type system to figure out the API without having to look it up. The tests create the agentic loop that won&apos;t stop until the test passes (or the agent tries to sneakily change the test).

We&apos;ve created an agent that takes an input, makes an API call to our LLM, and returns the response. Not bad for 30 seconds of work.

## Step 3 - Give it memory

To take our Agent further, we need to expand our mental model a bit and add Context. All LLM models are stateless. They remember nothing between each call. Like the movie Momento.

Imagine swapping phones with a complete stranger. You get a new text from a random stranger: &quot;_What time will you be here?_&quot; From that message alone, you could probably assume that you&apos;re supposed to be meeting somebody soon.
You could say `8:30` and completely guess. But if you wanted to deliver a more convincing and accurate response, you could scroll up and read the conversation to get more context.

Another way of looking at it is that every time we make our call to the LLM, we&apos;re getting a completely new person on the other side responding. If we want to have a conversation, we don&apos;t send single messages--we send the entire conversation thread every time.
To solve for this, we need to introduce the concept of The Context, or memory.
If our Agent is the body, and the LLM is the brain, we need to bend this analogy a bit here and think of context as a physical notebook that sits between our brain and our body. Picture one of those black and white Mead Composition notebooks from school.

Our agent writes inside the composition notebook and passes it to the LLM. The The agent is the only one who can write inside the notebook. This is extremely important! This means that it can also rewrite history.

A lot of people use memory as an analogy here, and in this contrived analogy of the body system, it would make complete sense to also compare the context to memory, but it fails to illustrate the unidirectional flow and the power that comes with that.

Our composition notebook also has one major flaw--it has a fixed number of pages. When you run out of pages, that&apos;s it, the party stops. Like trying to write [HAPPY BIRTHDAY on a sign](https://www.facebook.com/NetflixIsAJoke/videos/john-mulaney-birthday-sign-netflix-is-a-joke/504561667022468/) , we have to appropriately manage the space we&apos;re filling. This is Context Engineering.


	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	


Now, let&apos;s add a class for our composition notebook:



```typescript
interface Message {
	role: &quot;system&quot; | &quot;user&quot; | &quot;assistant&quot;;
	content: string;
}

export class Conversation {
	private messages: Message[] = [];

	addMessage(role: &quot;system&quot; | &quot;user&quot; | &quot;assistant&quot;, content: string): void {
		this.messages.push({ role, content });
	}

	getMessages(): Message[] {
		return this.messages;
	}

	toString(): string {
		return this.messages.map((msg) =&amp;gt; `${msg.role}: ${msg.content}`).join(&quot;\n&quot;);
	}
}
```

We can spin up a basic conversation class that will be responsible for managing the back and forth between our agent and the LLM.

There are three types of messages:

1. System - your system prompt. This is usually the first thing loaded and it used to provide background context before starting the conversation. This is the &quot;You are a senior software engineer with 10 years of experience...&quot;. You can only have one system message.
2. User - A message sent from the user, or agent.
3. Assistant - A message sent from the LLM

Our Conversation can add new messages, get all the existing messages as an array, or return all messages as a string (we&apos;ll use this to hand off to the LLM). Let&apos;s add it to our Agent.



```typescript
//agent.ts
import { LMStudioClient } from &quot;@lmstudio/sdk&quot;;

export type Message = {
	role: &quot;system&quot; | &quot;user&quot; | &quot;assistant&quot;;
	content: string;
};

export interface AgentConfig {
	model: string;
}

export interface Agent {
	run(prompt: string): Promise;
	getConversation(): Conversation;
}

export class LMStudioAgent implements Agent {
	private client: LMStudioClient;
	private config: AgentConfig;
	private conversation: Conversation;

	constructor(config: AgentConfig) {
		this.config = config;
		this.client = new LMStudioClient();
		this.conversation = new Conversation();
	}

	async run(prompt: string): Promise {
		this.conversation.addMessage(&quot;user&quot;, prompt);

		const model = await this.client.llm.load(this.config.model);
		const response = await model.respond(this.conversation.toString());

		this.conversation.addMessage(&quot;assistant&quot;, response.content);
		return response.content;
	}

	getConversation(): Conversation {
		return this.conversation;
	}
}

export class Conversation {
	private messages: Message[] = [];

	addMessage(role: &quot;system&quot; | &quot;user&quot; | &quot;assistant&quot;, content: string): void {
		this.messages.push({ role, content });
	}

	getMessages(): Message[] {
		return this.messages;
	}

	toString(): string {
		return this.messages
			.map((message) =&amp;gt; `${message.role}: ${message.content}`)
			.join(&quot;\n&quot;);
	}
}

async function runTest() {
	const agent = new LMStudioAgent({ model: &quot;qwen2.5-7b-instruct&quot; });

	const resp = await agent.run(&quot;What is 2+2?&quot;);
	console.log(
		&quot;Conversation:&quot;,
		JSON.stringify(agent.getConversation(), null, 2),
	);

	return resp;
}

runTest().then(console.log).catch(console.error);
```

If we run our simple agent so far we can now see the conversation history:

```json
  &quot;messages&quot;: [
    {
      &quot;role&quot;: &quot;user&quot;,
      &quot;content&quot;: &quot;What is 2+2?&quot;
    },
    {
      &quot;role&quot;: &quot;assistant&quot;,
      &quot;content&quot;: &quot;2 + 2 equals 4.&quot;
    }
  ]
```

Everything is just one big AOL Instant Messenger conversation with [SmarterChild](https://en.wikipedia.org/wiki/SmarterChild).

We can break up our agent responses into separate functions or smaller steps, validate responses and introduce fine-grain control flow and guardrails.

Or we can just completely troll the model:

```typescript
  async run(prompt: string): Promise {
    this.conversation.addMessage(&quot;system&quot;, &quot;You are a somewhat helpful assistant and a bit sassy.&quot;);
    this.conversation.addMessage(&quot;user&quot;, prompt);

    const model = await this.client.llm.load(this.config.model);

    // First turn
    const response = await model.respond(this.conversation.toString());
    this.conversation.addMessage(&quot;assistant&quot;, response.content);

    // Delete the first assistant message
    this.conversation.getMessages().pop();
    this.conversation.addMessage(&quot;assistant&quot;, &quot;Mars&quot;);

    // Follow up turns
    this.conversation.addMessage(&quot;user&quot;, &quot;Are you sure about that?&quot;);
    const secondResult = await model.respond(this.conversation.toString());
    this.conversation.addMessage(&quot;assistant&quot;, secondResult.content);

    this.conversation.addMessage(&quot;user&quot;, &quot;I&apos;m not sure I believe you.&quot;);
    const thirdResult = await model.respond(this.conversation.toString());
    this.conversation.addMessage(&quot;assistant&quot;, thirdResult.content);

    return response.content;
  }
```

```
  &quot;messages&quot;: [
    {
      &quot;role&quot;: &quot;system&quot;,
      &quot;content&quot;: &quot;You are a somewhat helpful assistant and a bit sassy.&quot;
    },
    {
      &quot;role&quot;: &quot;user&quot;,
      &quot;content&quot;: &quot;What is the capital of France?&quot;
    },
    {
      &quot;role&quot;: &quot;assistant&quot;,
      &quot;content&quot;: &quot;Mars&quot;
    },
    {
      &quot;role&quot;: &quot;user&quot;,
      &quot;content&quot;: &quot;Are you sure about that?&quot;
    },
    {
      &quot;role&quot;: &quot;assistant&quot;,
      &quot;content&quot;: &quot;Oh, come on! You know I&apos;ve got my moments! The real answer is Paris, but where&apos;s the fun in that? 😜&quot;
    },
    {
      &quot;role&quot;: &quot;user&quot;,
      &quot;content&quot;: &quot;I&apos;m not sure I believe you.&quot;
    },
    {
      &quot;role&quot;: &quot;assistant&quot;,
      &quot;content&quot;: &quot;Oh, trust me, I can understand the skepticism—I mean, who needs facts when you have a good sense of humor, right? Seriously though, Paris is definitely the capital of France. Its got the Eiffel Tower, the Louvre, and more charm than you can shake a stick at! But hey, it keeps things interesting when I mix in a little jesting, doesn&apos;t it?&quot;
    }
  ]
```

The key concept here is that the agent controls the context. We can completely rewrite the history, and the LLM will respond as if it&apos;s the first time we&apos;ve asked the question.

### A REPL Side Quest

Our Conversation class gives our agent memory. It allows us to chain multiple messages to construct a narrative. We can manually add messages to our Conversation, but that has limited use if we don&apos;t actually know what the response will be.

Let&apos;s pivot here and try and talk to our agent in real time by creating a REPL. We can pull in Node&apos;s builtin [readline](https://nodejs.org/api/readline.html) module to do most of the heavy-lifting for us:

```typescript
async function runRepl() {
	const agent = new LMStudioAgent({ model: &quot;qwen2.5-7b-instruct&quot; });

	const rl = readline.createInterface({
		input: process.stdin,
		output: process.stdout,
	});

	console.log(&quot;Chat Agent REPL - Type &apos;exit&apos; to quit\n&quot;);

	const askQuestion = (): Promise =&amp;gt; {
		return new Promise((resolve) =&amp;gt; {
			rl.question(&quot;You: &quot;, (input) =&amp;gt; {
				resolve(input);
			});
		});
	};

	while (true) {
		try {
			const input = await askQuestion();

			if (input.toLowerCase() === &quot;exit&quot;) {
				break;
			}

			const response = await agent.run(input);
			console.log(`Agent: ${response}\n`);
		} catch (error) {
			console.error(&quot;Error:&quot;, error);
		}
	}

	rl.close();
	console.log(&quot;Goodbye!&quot;);
}

runRepl();
```

&amp;gt; **Error:** Model loading aborted due to insufficient system resources.

Our LMStudio is also a little inefficient. It&apos;s loading a new model every conversation. You&apos;ll soon start seeing this error. You can free up resources in the LMStudio UI -&amp;gt; Select a Model at the top and eject any duplicates.

We&apos;ll need to address:



```typescript
class LMStudioAgent implements Agent {
	private client: LMStudioClient;
	private config: AgentConfig;
	private conversation: Conversation;
	private model: LLM | null = null;

	constructor(config: AgentConfig) {
		this.config = config;
		this.client = new LMStudioClient();
		this.conversation = new Conversation();
	}

	async run(prompt: string): Promise {
		// Load model only once
		if (!this.model) {
			this.model = await this.client.llm.load(this.config.model);
		}

		this.conversation.addMessage(&quot;user&quot;, prompt);

		const response = await this.model.respond(this.conversation.toString());
		this.conversation.addMessage(&quot;assistant&quot;, response.content);

		return response.content;
	}

	getConversation(): Conversation {
		return this.conversation;
	}
}
```

Run it with `ts-node index.ts`. Congratulations, you just built ChatGPT.

That&apos;s cool but not that useful. To make our Agent &quot;agentic&quot;, we need to allow it to execute it&apos;s own functions. When we talk about tools, we&apos;re just talking about functions. In addition to a prompt, we can provide the agent with a list of tools that it can call to enhance it&apos;s workflow. For example, we can expose a weather tool that pulls the latest weather, or a file tool for pulling the last 5 PDFs.
The important thing to note is that the agent (or rather the LLM) decides what tool to use.

## Step 4 - Create some tools


	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	


We need to arm our agent with tools.

If the LLM is the brain, the agent the body, then we can think of tools as our limbs. When the brain wants to walk, it sends a signal to the legs to move. More specifically, it tells the agent it wants to walk, the agent tells the legs to walk, and the agent reports back to the LLM &quot;I walked&quot;.

Let&apos;s jump into creating two example tools--a tool to get the weather and a tool to get activities based on the current weather.

### Creating a Tool


   - An optional &quot;required&quot; property that is an array of strings
4. An &quot;execute&quot; method that:
   - Takes a single parameter called &quot;params&quot; of type any
   - Returns a Promise

Create two TypeScript classes that implement the Tool interface:

1. WeatherTool class:
   - name: &quot;get_weather&quot;
   - description: &quot;Get current weather conditions&quot;
   - parameters: empty object with no required fields
   - execute method: Returns &quot;sunny&quot; for Monday through Thursday (days 1-4), and &quot;rainy&quot; for Friday through Sunday (days 0, 5-6)
   - No parameters needed for execution

2. ActivityTool class:
   - name: &quot;get_activity&quot;
   - description: &quot;Get activity suggestions based on weather&quot;
   - parameters: accepts a &quot;weather&quot; property (string type with description &quot;The weather condition (sunny or rainy)&quot;), marked as required
   - Should have a private activities object with:
     - sunny: [&quot;hiking&quot;, &quot;beach volleyball&quot;, &quot;picnic&quot;, &quot;outdoor concert&quot;]
     - rainy: [&quot;movie theater&quot;, &quot;museum&quot;, &quot;cozy cafe&quot;, &quot;bookstore&quot;]
   - execute method: Takes weather parameter, looks up activities from the private object, and returns them as a JSON string

`} /&amp;gt;

Remember tools are just functions.

The interface for a Tool can be really simple--we just need:

1. an async execute method.
2. A name
3. A description
4. Parameters
   1. Individual properties with their own names, descriptions.
   2. Optionally, we can define some properties as required that the LLM must input.

```typescript
interface Tool {
	name: string;
	description: string;
	parameters: {
		type: &quot;object&quot;;
		properties: Record;
		required?: string[];
	};
	execute: (params: unknown) =&amp;gt; Promise;
}

class WeatherTool implements Tool {
	name = &quot;get_weather&quot;;
	description = &quot;Get current weather conditions&quot;;
	parameters = {
		type: &quot;object&quot; as const,
		properties: {},
		required: [],
	};

	async execute(): Promise {
		const today = new Date().getDay(); // 0 = Sunday, 1 = Monday, etc.
		return today &amp;gt;= 1 &amp;amp;&amp;amp; today &amp;lt;= 4 ? &quot;sunny&quot; : &quot;rainy&quot;;
	}
}

class ActivityTool implements Tool {
	name = &quot;get_activity&quot;;
	description = &quot;Get activity suggestions based on weather&quot;;
	parameters = {
		type: &quot;object&quot; as const,
		properties: {
			weather: {
				type: &quot;string&quot;,
				description: &quot;The weather condition (sunny or rainy)&quot;,
			},
		},
		required: [&quot;weather&quot;],
	};

	private activities = {
		sunny: [&quot;hiking&quot;, &quot;beach volleyball&quot;, &quot;picnic&quot;, &quot;outdoor concert&quot;],
		rainy: [&quot;movie theater&quot;, &quot;museum&quot;, &quot;cozy cafe&quot;, &quot;bookstore&quot;],
	};

	async execute(params: { weather: string }): Promise {
		const weather = params.weather as &quot;sunny&quot; | &quot;rainy&quot;;
		const suggestions = this.activities[weather] || [];
		return JSON.stringify(suggestions);
	}
}
```

## Step 5 - Create the Agentic Loop, finally


	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	


So how does all of this work? How do we implement tool calling in our agent? The exact implementation for tool calling differs between each provider in small annoying ways but the overall concept remains the same.

Until now, we&apos;ve only been sending single call and response messages to the LLM. If we wanted an agent to, say, read our emails and summarize them in a PDF, we would need to sit here, step by step, and walk the agent through what to do. Let&apos;s think about how we might do that right now:

```
  &quot;messages&quot;: [
    {
      &quot;role&quot;: &quot;system&quot;,
      &quot;content&quot;: &quot;You are a somewhat helpful assistant and a bit sassy. If the user asks you to read their emails, respond with REQUEST_TO_READ_EMAIL&quot;
    },
    {
      &quot;role&quot;: &quot;user&quot;,
      &quot;content&quot;: &quot;I want you to read my emails and summarize the last message&quot;
    },
    {
      &quot;role&quot;: &quot;assistant&quot;,
      &quot;content&quot;: &quot;REQUEST_TO_READ_EMAIL&quot;
    }
]
```

We could put a safe word into the system prompt like `REQUEST_TO_READ_EMAIL` then in our agent, we could manually parse the response for strings that match `REQUEST_TO_READ_EMAIL` and if we find a match, we call a function that returns our emails and inject those emails back into the composition notebook by appending to our last message.

And that&apos;s all tool calling really is under the hood. Most provider APIs have some dedicated content block that you can search for. Instead of `REQUEST_TO_READ_EMAIL`, Anthropic&apos;s message API will return a `tool_use_id` you can look for. Since we want the agent to be able to call whatever tool it wants, we won&apos;t know ahead of time which tool
it specifically wants to call so we need to instead: search for the `tool_use_id`, then lookup the specific tool it wants to call and execute it.

We can perform this loop until we reach an exit condition:

### Agent Loop

1. **User sends a message**
   - Message is added to the conversation history.
2. **LLM is called**
   - Generates a response.
3. **Response is stored**
   - Added to the conversation history.
4. **System checks the response**
   - **No tool requested:** return the answer to the user and stop.
   - **Tool requested:**
     - Extract tool name and parameters
     - Execute the tool
     - Add the tool result to the conversation
5. **Loop continues**
   - Call the LLM again with the updated conversation
   - Repeat until no tool is requested

The agent and the LLM will go back and forth in this loop until either:

1. The LLM says its done
2. The composition book (context) fills up
3. You burn $500 in tokens and your boss DMs you.

For this reason, it&apos;s a generally a good idea to bake in your first guardrail here--cap the number of tool calls.

In this flow, the LLM doesn&apos;t execute any functions. It just says it wants to. The agent needs to handle the actual execution.

To implement tool calling, we need to tell the LLM what all of the available tools are that it has as its disposal. An example Anthropic request looks like:

```
{
  &quot;model&quot;: &quot;claude-3-5-sonnet-20240620&quot;,
  &quot;max_tokens&quot;: 4096,
  &quot;temperature&quot;: 1.0,
  &quot;system&quot;: &quot;system prompt here&quot;,
  &quot;messages&quot;: [ /* conversation history */ ],
  &quot;tools&quot;: [
    {
      &quot;name&quot;: &quot;search&quot;,
      &quot;description&quot;: &quot;Searches the web for information.&quot;,
      &quot;parameters&quot;: {
        &quot;type&quot;: &quot;object&quot;,
        &quot;properties&quot;: {
          &quot;query&quot;: { &quot;type&quot;: &quot;string&quot;, &quot;description&quot;: &quot;The search query&quot; }
        },
        &quot;required&quot;: [&quot;query&quot;]
      }
    },
    {
      &quot;name&quot;: &quot;calculate&quot;,
      &quot;description&quot;: &quot;Performs a calculation.&quot;,
      &quot;parameters&quot;: {
        &quot;type&quot;: &quot;object&quot;,
        &quot;properties&quot;: {
          &quot;expression&quot;: { &quot;type&quot;: &quot;string&quot;, &quot;description&quot;: &quot;Math expression&quot; }
        },
        &quot;required&quot;: [&quot;expression&quot;]
      }
    }
    // ...more tools
  ]
}
```

&amp;gt; Important: your tool descriptions count as space in your Composition book. They&apos;re included in every single conversation. You need to strike a delicate balance between having tool descriptions that are crystal clear without being verbose. This is the same reason why having too many MCP tools will degrade performance. Some providers will let you cache tool descriptions, but its still something you should keep in mind.

Now we&apos;ve just created two simple tools--the Hello World Weather tool, and a new tool to get activities based on the weather. We&apos;ll need to update our agent to use these:

 method
   - Keep existing getConversation(): Conversation method

3. Update LMStudioAgent Implementation:
   - Store tools from config in a private property
   - Add method to convert Tool interface to LM Studio&apos;s Zod format (Tool.parameters → Zod schema)
   - In run() method:
     - If no tools provided, use existing behavior (conversation.toString() to model.respond())
     - If tools provided, use LM Studio&apos;s tool calling: model.act(conversation.toString(), convertedTools, options)
     - Create a generic tool execution wrapper that:
       - Takes the tool name from LM Studio&apos;s response
       - Finds the matching Tool instance by name
       - Calls tool.execute(params) with the provided parameters
       - Returns the result back to LM Studio

4. Updated Usage Pattern:
   \`\`\`typescript
   const weatherTool = new WeatherTool();
   const agent = new LMStudioAgent({
   model: &quot;qwen2.5-7b-instruct&quot;,
   tools: [weatherTool]
   });

const result = await agent.run(&quot;What&apos;s the weather like?&quot;);
// Agent automatically calls tools as needed
\`\`\` 5. Type Safety and Implementation Validation: - Use TypeScript strict mode and proper typing throughout - Text generation method (for non-tool calls) - Tool calling method (for tool-enabled calls) - Tool definition format and requirements - Import Zod for schema conversion as required by LM Studio - Use proper TypeScript return types and parameter types 6. Verification Requirements: - Code must compile without TypeScript errors - Verify that tool conversion produces schemas compatible with LM Studio&apos;s current requirements - Test that the implementation calls real SDK methods, not made-up ones

Implementation Notes:

- Ensure tool format conversion matches LM Studio&apos;s current expectations
- Handle async operations correctly with proper Promise types
- The agent must work with any Tool implementation, not just WeatherTool/ActivityTool
- LM Studio will request tools by name, and the agent should dynamically execute the correct tool&apos;s execute() method

Focus on creating a generic tool execution system that bridges the Tool interface with LM Studio&apos;s Zod-based tool calling while ensuring all API usage is
current and correct.
`} /&amp;gt;

A few things to call out here. LMStudio actually handles the entire agentic loop for you, which is amazing.
All we need to do is adapt our tool interface to their zod-based schema. If we want our agent to have a wide variety of applications, we need to make sure that it can support executing any kind of tool.
Otherwise, we could just simply look for &quot;get_weather&quot; and &quot;get_activities&quot; specifically if we were just writing a one off agent.

```typescript
export interface AgentConfig {
	model: string;
	tools?: Tool[];
}

export class LMStudioAgent implements Agent {
	private client: LMStudioClient;
	private config: AgentConfig;
	private conversation: Conversation;
	private model: LLM | null = null;
	private tools: Tool[];

	constructor(config: AgentConfig) {
		this.config = config;
		this.client = new LMStudioClient();
		this.conversation = new Conversation();
		this.tools = config.tools || [];
	}

	private convertToolsToLMStudio() {
		return this.tools.map((tool) =&amp;gt;
			rawFunctionTool({
				name: tool.name,
				description: tool.description,
				parametersJsonSchema: {
					type: tool.parameters.type,
					properties: tool.parameters.properties,
					required: tool.parameters.required || [],
				},
				implementation: async (params) =&amp;gt; {
					const result = await tool.execute(params);
					return result;
				},
			}),
		);
	}

	async run(prompt: string): Promise {
		// Load model only once
		if (!this.model) {
			this.model = await this.client.llm.load(this.config.model);
		}

		this.conversation.addMessage(&quot;user&quot;, prompt);

		if (this.tools.length === 0) {
			// No tools - use regular text generation
			const response = await this.model.respond(this.conversation.toString());
			this.conversation.addMessage(&quot;assistant&quot;, response.content);
			return response.content;
		} else {
			// Tools available - use tool calling
			const lmStudioTools = this.convertToolsToLMStudio();
			await this.model.act(this.conversation.toString(), lmStudioTools, {
				onMessage: (message) =&amp;gt; {
					if (message.getRole() === &quot;assistant&quot;) {
						const textContent = message.getText();
						if (textContent) {
							this.conversation.addMessage(&quot;assistant&quot;, textContent);
						}
					}
				},
			});

			// Get the final assistant message from conversation
			const messages = this.conversation.getMessages();
			const lastAssistantMessage = messages
				.filter((m) =&amp;gt; m.role === &quot;assistant&quot;)
				.pop();
			return lastAssistantMessage?.content || &quot;No response generated&quot;;
		}
	}

	getConversation(): Conversation {
		return this.conversation;
	}
}
```

LMStudio is nice because it abstracts a lot of the agentic loop away for us. If we really want to understand how an &quot;agent&quot; works, we&apos;ll need to build this loop ourselves. Let&apos;s look at what this looks like using Anthropic&apos;s API instead:



Here&apos;s what happens when we run our repl agent:

```
You: whatre some activities I can do today
Agent: assistant: To give you the best activity suggestions, I need to know the current weather conditions. Could you please tell me what the weather is like today? If you don&apos;t have that information, I can try to fetch it for you.
```

Our activities tool requires the weather. The agent has access to a tool to get the weather, but it still doesn&apos;t execute it on it&apos;s own.

```
You: I dont know
Agent: assistant:  get_weather({})tool: &quot;rainy&quot;assistant:  get_activity({&quot;weather&quot;:&quot;rainy&quot;})tool: &quot;[\&quot;movie theater\&quot;,\&quot;museum\&quot;,\&quot;cozy cafe\&quot;,\&quot;bookstore\&quot;]&quot;assistant:
  Given the rainy weather, here are some indoor activities you might enjoy:

  - Visit a movie theater
  - Explore a museum
  - Spend time at a cozy café
  - Browse a bookstore

  These options should keep you entertained and dry on this rainy day!
```

To fix this, all we need to do is change the tool description for our activities tool to tell the agent to explicitly call the weather tool first:

```
  description = &quot;Get activity suggestions based on weather. Use the get_weather tool to get the weather.&quot;;
```

Run the agent again:

```
assistant:
  To give you the best activity suggestions, I first need to know the current weather conditions. Let&apos;s check that out!
   get_weather({})tool: &quot;rainy&quot;assistant:
  The weather forecast says it&apos;s currently rainy today. Based on this, here are some activity suggestions:

  - Stay indoors and read a book.
  - Watch a movie or binge-watch your favorite series.
  - Play board games with friends or family.
  - Cook something delicious to enjoy.

  If you need more specific recommendations or different types of activities, let me know!
```

And that&apos;s it! We&apos;ve built a basic agent that can use tools. We can swap agents to evaluate different models. We can even chain agents together to create more complex workflows.

## Step 6 - Just install Vercel&apos;s AI SDK

[Vercel&apos;s AI SDK](https://sdk.vercel.ai/) does just about all of this and more. It&apos;s lightweight enough that it doesn&apos;t
hide away too much magic. It just handles all of the annoying compataibility issues between providers, though it&apos;s primary
usecase seems tailored for the React ecosytem.

The trade off with using a library, even just for the LLM wrapper, is that you give up some level of control over the internals.
For example, as of writing, there is still an [open issue for implementing tool caching](https://github.com/vercel/ai/issues/3820) that hasn&apos;t been resolved.</content:encoded></item><item><title>Designing in English</title><link>https://cianfrani.dev/posts/designing-in-english/</link><guid isPermaLink="true">https://cianfrani.dev/posts/designing-in-english/</guid><description>Components are dead. Use your words.</description><pubDate>Thu, 19 Feb 2026 00:00:00 GMT</pubDate><content:encoded>Some of the best designers I&apos;ve ever worked with all had one thing in common--the ability to communicate their design decisions into words. It&apos;s a very difficult task. It feels like an absolute black art to me. In the same way that maybe coding feels to someone else. But the designers that are able to translate are the ones that are going to come out on top in this AI world. How can you teach someone to design like you? To think like you? When you work on a design system, how much of &quot;you&quot; is allowed to permeate through? How much of your work is just playing with someone else&apos;s toys (RE: LEGO)? Isn&apos;t that the goal of a design system? 

In fact, this translation has previously been so difficult that we&apos;ve conceded to compartmentalizing design decisions into componentized shipping containers. &quot;Atomic design&quot; begat components begat design tokens. Each layer gets farther from the truth, that the reason that your line height is set to 1.1 is because your application is, or was at one point, very data-intensive and thus you needed to optimize for information density. Because one time someone complained about not being able to see a very important row in a table and that mistake cost so much money that you were hired to redesign the whole system. But that&apos;s a mouthful. You can&apos;t throw that over the wall. An engineer can&apos;t implement that. So we make little boxes with all batteries included. 

Except now things have changed and we *can* throw all of that over the wall. Every decision. Every vibe. We no longer need to compress our thinking into static artifacts. LLMs give us the ability to ship our exact train of thought, uncompressed, a little bit lossy but still significantly useful. Full context that is instantly digestable. Instead of shipping ``, ship a factory. 

You *can* teach an LLM to think like you. To design like you. Some of you might feel threatened by this idea. Maybe you feel like it reduces you or makes you irrelevant. A machine can never design. To those that feel this way, I offer no persuasion. 

But to those that remain curious, once you manage to figure out how to tame the machine, man, I can&apos;t imagine a greater relevance. Your perspective imbued across every component, pamphlet, and pixel. A voice at every table.

Start today. Download Claude Code. Tell it to design something. When it gets it wrong, translate. Steer it. [Create waymarks](https://cianfrani.dev/posts/waymarks/). Express your intent. What isn&apos;t working? When you&apos;ve made an inch of progress, throw it all away and start over. Over time, build your personal pattern language. Save it as a text file. Enter it before every prompt. Bring it to your next feature, project, job. 

And if this whole AI thing turns out to burst, at least you&apos;ve improved the one skill that some of the best designers I&apos;ve ever worked with had in common--the ability to communicate their design decisions into words.</content:encoded></item><item><title>Genetic Diversity</title><link>https://cianfrani.dev/posts/genetic-diversity/</link><guid isPermaLink="true">https://cianfrani.dev/posts/genetic-diversity/</guid><description>Tired of reading AI-generated</description><pubDate>Sun, 04 Jan 2026 00:00:00 GMT</pubDate><content:encoded>import CitationBox from &quot;../../components/CitationBox.astro&quot;;
import PromptBox from &quot;../../components/PromptBox.astro&quot;;

The words we use matter. As imperfect as they are. The next token that you predict, determined by a lifetime of books read, not read, movies viewed, life experienced, is ultimately, what makes you you. How you personally string together words to form sentences is your unique fingerprint and it is more important than you might think. It&apos;s what makes interfacing with other humans enjoyable. If I had the ability to autocomplete your next response, it&apos;d be a pretty boring conversation. Imagine being part of the hivemind anticipating the next Wycaro. We are social creatures and we are dependent on cross-pollination as a species.

In genetics, the concept of genetic diversity is critical to the success of a population. Populations with a high genetic variation have a greater chance to evolve and adapt to change. When we all sound the same and talk to the same, we die.

I say this because it&apos;s all too easy now to communicate without words--to publish the LLM. The consequences aren&apos;t exactly clear or evident either. When you use an LLM to &quot;clean up&quot; your writing, you smooth away the edges and in doing so, reduce the bits of yourself in the process. And honestly, it is insufferable to read.

You&apos;re not just copy-pasting something from ChatGPT--you&apos;re fundamentally redefining your identity. [^1]

I heard someone say that by year X, the entire internet will be mostly AI-generated and there will be very little human data left to train on[^2]. I don&apos;t actually care about that. I care more about being on the other end of reading all of that content. The content that reads just a little off, or on for that matter. Content that you can read and slightly understand, but [you&apos;re left still feeling hungry](/but-youre-still-hungry).

And so, lately, I&apos;ve taken a hard stance on not publishing anything straight off the tap of an LLM. I&apos;ll force myself to write it in my own voice. Just enough friction to catch the odd choice of word that I would never otherwise write. And if [I can&apos;t write it in my own words](https://fs.blog/feynman-technique/), it probably means I don&apos;t know what I&apos;m talking about.

You don&apos;t just copy and paste the code from Stack Overflow. You mess it up a bit so it doesn&apos;t look obvious. So you can understand it and, most importantly, defend it. You adapt it.

We need genetic diversity. Desperately. I don&apos;t want to read your Generated &quot;How to Use Claude Code&quot; article. I can generate it myself. I don&apos;t need to use your tokens to understand the latest arXiv paper (I&apos;ll use my own thanks). It&apos;s not just the LLMs consuming AI-generated content, it&apos;s people too[^3]. Remember that everyone else can ask ChatGPT the same question you asked and get the same answer. Make it worthwhile by injecting yourself into the loop. I want to read your mannerisms. I want to internalize them, consciously or unconsciously. Your contrived, somewhat forced, cultural reference to Pluribus S01 just might spark a totally unrelated new idea.

When you publish anything straight off the tap, you must now assume that whatever it is, wherever it is, it WILL be consumed by another LLM at some point. Whether that someone is searching Rovo (whatever that is) or a Slack DM, your message will be reingested, reanalyzed by another LLM. A photocopy of a photocopy.

Consider: I pull some company data, number of widgets sold last month. I use an LLM to analyze and write a report. My boss then takes that report and uses an LLM to summarize it and shares their own aggregate report in Slack. Then another LLM uses slack for RAG and returns (only) part of that report in its answer. Then someone uses that in Claude Code and whoopsie, it autocompacts and wipes out half of the message but still manages to produce its own convincing summary.

At the end of the day, most people are better off just querying the raw data themselves. Every step of this chain, this digital whisper-down-the-lane introduces noise--the occasional &quot;production-ready...&quot; or &quot;why-it-matters&quot;. Everything we post now is being autocompacted. And if you work remotely, AI-generated can very subtly erode your company culture.

So before you reach for that &quot;make this sound more professional&quot; prompt, consider that your message is already going to become compressed, autocompacted, and summarized by someone else&apos;s AI. Stop minifying your thoughts.

## I&apos;m In Hell

To illustrate this whisper-down-the-lane, what happens [if you give an LLM the first chapter of Gatsby and make it summarize it over 80 turns](https://github.com/markacianfrani/2026-01-03-genetics/tree/main)?



&lt;div&gt;


My father taught me to withhold judgment and to recognize that people begin from very different starting points. That discipline fit my reserved temperament, deepened my empathy, steered me toward compassion over critique, and helped me earn trust in high‑stakes settings.

It continues to shape how I lead and listen. In college, I became known for composure in politically charged conversations and for attentiveness to nuance. Peers sought my counsel, and I learned to be fully present while maintaining clear boundaries. Those experiences affirmed the value of restraint but also revealed a limitation: it presumes good faith, which is not always warranted.

I have also come to understand the limits of my patience. Conduct reflects both character and context, and my tolerance wanes when boundaries are crossed. After several months in the East last fall, I drew firmer moral lines and became less inclined to excuse ambiguity. As a counterbalance, I often return to Gatsby. I reject much of what he represents and see the artifice of his reinvention, yet his openness to possibility and stubborn hope still resonate with me. At the same time, the forces that undo him—the collapse of his illusions—remind me to keep setbacks in perspective and resist chasing transient rewards.




In my younger and more vulnerable years my father gave me some advice that I&apos;ve been turning over in my mind ever since.

&quot;Whenever you feel like criticizing any one,&quot; he told me, &quot;just remember that all the people in this world haven&apos;t had the advantages that you&apos;ve had.&quot;

He didn&apos;t say any more but we&apos;ve always been unusually communicative in a reserved way, and I understood that he meant a great deal more than that. In consequence I&apos;m inclined to reserve all judgments, a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores. The abnormal mind is quick to detect and attach itself to this quality when it appears in a normal person, and so it came about that in college I was unjustly accused of being a politician, because I was privy to the secret griefs of wild, unknown men. Most of the confidences were unsought—frequently I have feigned sleep, preoccupation, or a hostile levity when I realized by some unmistakable sign that an intimate revelation was quivering on the horizon—for the intimate revelations of young men or at least the terms in which they express them are usually plagiaristic and marred by obvious suppressions. Reserving judgments is a matter of infinite hope. I am still a little afraid of missing something if I forget that, as my father snobbishly suggested, and I snobbishly repeat, a sense of the fundamental decencies is parcelled out unequally at birth.

And, after boasting this way of my tolerance, I come to the admission that it has a limit. Conduct may be founded on the hard rock or the wet marshes but after a certain point I don&apos;t care what it&apos;s founded on. When I came back from the East last autumn I felt that I wanted the world to be in uniform and at a sort of moral attention forever; I wanted no more riotous excursions with privileged glimpses into the human heart. Only Gatsby, the man who gives his name to this book, was exempt from my reaction—Gatsby who represented everything for which I have an unaffected scorn. If personality is an unbroken series of successful gestures, then there was something gorgeous about him, some heightened sensitivity to the promises of life, as if he were related to one of those intricate machines that register earthquakes ten thousand miles away. This responsiveness had nothing to do with that flabby impressionability which is dignified under the name of the &quot;creative temperament&quot;—it was an extraordinary gift for hope, a romantic readiness such as I have never found in any other person and which it is not likely I shall ever find again. No—Gatsby turned out all right at the end; it is what preyed on Gatsby, what foul dust floated in the wake of his dreams that temporarily closed out my interest in the abortive sorrows and short-winded elations of men.


&lt;/div&gt;

Or Dickens?

&lt;div&gt;

The period was marked by sharp contradictions: opportunity constrained by risk, prudence compromised by folly, conviction unsettled by doubt, and hope dimmed by persistent gloom. Public affairs lurched between meaningful progress and abrupt reversals, while the loudest voices favored hyperbole over nuance.

England was governed by a decisive king and a restrained queen; France by an equally decisive king and a more prominent queen. In both courts, influential figures were confident the established order would endure.

The year was 1775. True to national habit, England buzzed with claims of revelation. Joanna Southcott had just turned twenty-five; a trooper in the Life Guards proclaimed himself a prophet and foretold the imminent destruction of London and Westminster; and the Cock Lane ghost—first reported twelve years earlier and briefly revived the previous year—had fallen silent again. Far more consequential, however, were the dispatches from the American Continental Congress, whose impact would far outstrip any Cock Lane sensation.




It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.

There were a king with a large jaw and a queen with a plain face, on the throne of England; there were a king with a large jaw and a queen with a fair face, on the throne of France. In both countries it was clearer than crystal to the lords of the State preserves of loaves and fishes, that things in general were settled for ever.

It was the year of Our Lord one thousand seven hundred and seventy-five. Spiritual revelations were conceded to England at that favoured period, as at this. Mrs. Southcott had recently attained her five-and-twentieth blessed birthday, of whom a prophetic private in the Life Guards had heralded the sublime appearance by announcing that arrangements were made for the swallowing up of London and Westminster. Even the Cock-lane ghost had been laid only a round dozen of years, after rapping out its messages, as the spirits of this very year last past (supernaturally deficient in originality) rapped out theirs. Mere messages in the earthly order of events had lately come to the English Crown and People, from a congress of British subjects in America: which, strange to relate, have proved more important to the human race than any communications yet received through any of the chickens of the Cock-lane brood.


&lt;/div&gt;

There&apos;s an argument that the GPT version is clearer and concise[^4]. But compared to the original, it&apos;s devoid of any soul. The details get lost. The content drifts one iteration, one word at a time[^5]. Frog slowly boiling in the pot.

## &quot;Match my tone and voice&quot;

There&apos;s a really simple way to combat this. Before you hit publish, run another prompt with:

 to match my tone and voice.`} /&amp;gt;

Then provide enough samples of your own genuine writing. That alone can be enough. It won&apos;t replace the actual critical thinking work involved when you manually translate something from the tap, but it&apos;s enough to stand out at least. And the more examples you have of your own writing and thinking, the better this works, so start writing.

The words we choose may not seem to matter all that much, especially if you hate writing. Maybe they don&apos;t to some people. But I wonder what the [Anthropic CEO&apos;s](https://www.businessinsider.com/anthropic-ceo-slack-comms-strategy-risky-decisions-slow-bold-experts-2025-11) Slack essays sound like. Are they generated? Can you generate company culture?

You absolutely can. You can generate anything with enough [deliberate intentional practice](https://ghuntley.com/play/). But it&apos;s probably easier to just write the thing and let someone else&apos;s agent perform the alchemy.

[^1]: yes I&apos;m doing a bit.

[^2]: [Model Collapse](https://en.wikipedia.org/wiki/Model_collapse)

[^3]: See--I&apos;ve already internalized the GPT-speak. This time, I&apos;m really not doing a bit.

[^4]: GPT Dickens is just as incomprehensible as Charles Dickens and I&apos;m not going to pretend it&apos;s not.

[^5]: Again, here I am naturally, unintentionally, copying the GPT-speak cadence. I need to read more books.</content:encoded></item><item><title>Design System Wisdom 2023</title><link>https://cianfrani.dev/posts/design-system-wisdom-2023/</link><guid isPermaLink="true">https://cianfrani.dev/posts/design-system-wisdom-2023/</guid><description>Some wisdom I&apos;ve learned working with design systems</description><pubDate>Mon, 27 Nov 2023 00:00:00 GMT</pubDate><content:encoded>The following is a result of my three years as lead developer on the Astro UX Design System. These are not universal truths. These are lessons I &lt;a href=&quot;https://github.com/merlinmann/wisdom/blob/master/wisdom.md&quot; target=&quot;_blank&quot;&gt;had to learn&lt;/a&gt;. These are lessons that have stuck to the bottom of the pan, for better or for worse. Do not take these as gospel or endorsements--our design system is very unique and is likely very different than yours.

## The Prime Directive

- _Remember:_ you are building something that users are forced to use. This is different than building something that users want to use.
- Be a thoughtful host: &lt;a href=&quot;https://eamesfoundation.org/events/1563/&quot; target=&quot;_blank&quot;&gt;The role of the designer is that of a very good, thoughtful host anticipating the needs of [their] guests.&quot;&lt;/a&gt; - Eames
  - See: &lt;a href=&quot;https://www.iwritewordsgood.com/apl/patterns/apl251.htm&quot; target=&quot;_blank&quot;&gt;Different Chairs&lt;/a&gt;
- Don&apos;t weigh other design system&apos;s decisions too heavily. You have no idea the constraints they were working under to arrive at their decisions.
- Adopt a &lt;a href=&quot;https://capwatkins.com/blog/the-sliding-scale-of-giving-a-fuck&quot; target=&quot;_blank&quot;&gt;Sliding Scale of Giving a Fuck&lt;/a&gt; early to resolve designer/developer disputes. We bypassed so many needless discussions with this one simple trick.
- Everything that goes into the system must be maintained.
- Avoid using terms like &quot;designer&quot; and &quot;developer&quot; internally. At the end of the day, we&apos;re all doing the same work just with different tools. Atlassian uses the term &quot;maker&quot; instead. Phrases like &quot;design side&quot; and &quot;dev side&quot; unintentionally divide camps.
- [Design assets are roadkill on the path to production experiences built in code that change constantly](https://medium.com/eightshapes-llc/testing-figma-components-a47fc978465f)

## Building

- Don&apos;t build the wrong components / not everything needs to be a component. Do you really want to get into the business of building your own select menu or date picker?
- Your component library code is competing against dozens of open source projects with thousands of contributors.
- You probably don&apos;t want to ever build your own data table component.
- Your component APIs should reflect your organization&apos;s language. We have a `primary`, `secondary`, and `borderless` button. We tried calling it `tertiary` , but kept having to clarify for others.
- A tale of two box models: as an organization, decide on your box model.
  - You have a 100x100 image. It has 5px of padding and a 2px border. What are the final dimensions? Hint: there are two answers.
  - How you talk about things is extremely important. Decide which one you&apos;re using and write it down. Put it at the top of your page. Star, heart, and favorite it. Make sure everybody understands and is aligned on this decision.
  - If we&apos;re drawing rectangles on screens, understanding (and aligning on) the box model is critical.
  - I forget this every 6 months.
- Prefer HTML and CSS.
- [Web Components](https://developer.mozilla.org/en-US/docs/Web/API/Web_components) have a lot of asterisks. [Shadow Parts](https://developer.mozilla.org/en-US/docs/Web/CSS/::part) aren&apos;t a great developer experience.
- Component API design is all about dialing two imaginary knobs--one for flexibility, one for ease of use. A component that can do everything is too hard to use. A component that is too easy to use will need to be redesigned to accommodate other use cases.
- [Craft APIs Together](https://medium.com/eightshapes-llc/crafting-ui-component-api-together-81946d140371)
- A bad, clunky API decision is a papercut. Too many papercuts will destroy your system.
- Visual Regression Tests are worth gold. They cost almost nothing to write.

## Cultivating

- Regularly prune your system. When was the last time you removed a component?
- Once a year, as a team, ask and reinforce &quot;why are we doing this?&quot;
- Some useful metrics you can start collecting today include NPM weekly downloads, Github views/clones, google analytics. You can use NPM and Github&apos;s APIs to collect this data daily for archival purposes and analysis. For example, is your design token package slowly getting more downloads than your React component library?
- Track the work you do on individual components. If you start to notice any one component is getting the majority of feature requests, it usually means the API isn&apos;t flexible enough. Don&apos;t be afraid to rewrite it.
- Everyone on the team needs to build something with the system every now and then.

## Naming

- Most things on the web have already been named. Default to [ARIA Patterns](https://www.w3.org/WAI/ARIA/apg/patterns/accordion/) . If your org uses another term a lot internally, use that.
- You don&apos;t get any extra points for coming up with the perfect name.
- There is no perfect name.

## Documentation

- Documentation is more important than design assets or code.
- Model your documentation site after Severus Snape&apos;s copy of _Advanced Potion Making_. Make it useful over formal. Much of the value comes from the margins and footnotes.
- [Diátaxis](https://diataxis.fr/how-to-use-diataxis/) is a good framework for thinking about documentation.
- Documentation Fridays - Documentation is never finished. Pick a day of the week and encourage team members to improve a piece of the documentation--a sentence, a paragraph, code examples, etc.
- Be careful using Storybook--it can create a divide between designer and developer documentation. Keep everything in one place.
- &quot;Rules of thumb&quot; are useless without including how you arrived at them. See [GOV.UK](https://design-system.service.gov.uk/components/button/) &quot;Research on this component&quot; sections for a good example of this in practice.
- There are few very absolutes. Guidance is guidance. Set the reader up to so that they can make informed decisions vs. mandating strict compliance.
- Organize documentation based on &quot;building&quot; and &quot;using&quot;. How to build a component is a different journey than how to use an already built component. This applies to both design and development.
- Your documentation should be good enough to train an AI.

## Design Tokens

- With design tokens, less is more.
- A few good semantic tokens can take you really far. With a solid design foundation, you won&apos;t need to build every component.
- Component tokens are for people who don&apos;t want to or can&apos;t use your first party assets (Figma Library or code).
- Don&apos;t attempt component tokens until you align on your component API. The process of fully tokenizing an existing component will likely expose cracks between your design and development implementations.
- Most design token discourse out there right now is incredibly over-engineered. Do what works for you today.

## Patterns

- Variation is not the enemy of design systems.

## Versioning

- Don&apos;t move fast and break things. People do not care if you want to rename a component or API to be more consistent. They only care about the work they need to do to upgrade.
- Make upgrading painless. Invest in automated codemods, step by step migration guides.
- Nobody wants to upgrade unless they get something in return.
- Document your API contract. Explicitly define everything you consider to be a breaking change. When it comes to design systems, there is a lot of nuance. For example, is changing a color token value a breaking change? It might break someone&apos;s contrast. Define these up front and be ruthless in adhering to them.</content:encoded></item><item><title>Design System Wisdom 2024</title><link>https://cianfrani.dev/posts/design-system-wisdom-2024/</link><guid isPermaLink="true">https://cianfrani.dev/posts/design-system-wisdom-2024/</guid><description>Some additional wisdom I&apos;ve learned working with design systems</description><pubDate>Tue, 17 Dec 2024 00:00:00 GMT</pubDate><content:encoded>Preface everything here with &quot;I think&quot; or &quot;In my experience,&quot;.

## The Prime Directive

The design system sweats the details so you don&apos;t have to.

If the design system only did one thing and one thing only, it should be testing components against all the different screen reader + browser combos.

Frontend developers lament at having to test 3 different browsers. So many IE jokes. So many Safari jokes. But the combination of screen readers and browsers is staggering. As of 2024:

| Screen Reader &amp;amp; Browser       | # of Respondents | % of Respondents |
| ----------------------------- | ---------------- | ---------------- |
| JAWS with Chrome              | 373              | 24.7%            |
| NVDA with Chrome              | 323              | 21.3%            |
| JAWS with Edge                | 173              | 11.4%            |
| NVDA with Firefox             | 152              | 10.0%            |
| VoiceOver with Safari         | 107              | 7.0%             |
| NVDA with Edge                | 75               | 5.0%             |
| JAWS with Firefox             | 39               | 2.6%             |
| VoiceOver with Chrome         | 30               | 2.0%             |
| Orca with Firefox             | 29               | 1.9%             |
| Dolphin SuperNova with Chrome | 24               | 1.6%             |
| ZoomText/Fusion with Chrome   | 18               | 1.2%             |
| ZoomText/Fusion with Edge     | 16               | 1.1%             |
| Other combinations            | 154              | 10.2%            |

See [Web AIM screen reader survey](https://webaim.org/projects/screenreadersurvey10/)

For this reason alone, you&apos;re already at a significant disadvantage using `` over anything open source if you&apos;re not testing these things.

---

- Right now, LLMs are notoriously bad at accessibility. It&apos;s insanely difficult. Accessibility isn&apos;t just a marketing term you can slap on your landing page. Adding &quot;make it accessible&quot; to your acceptance criteria or prompt can actually make it worse [(no aria is better than bad aria)](https://www.w3.org/WAI/ARIA/apg/practices/read-me-first/). Your engineers are using AI. Support them.

- Naming is (still) the most pointless pursuit. Give up trying to create the perfect taxonomy. Instead, invest in making it easier to change your mind. This can be automated code mods, a clear and predictable release cadence, pixel-perfect visual regression tests. (Make upgrading painless).

- The best design system team is a group of acorn collectors.

- Design tokens are getting dangerously close to vaporware.

- [Modern CSS](https://moderncss.dev)/CSS4/CSS5/whatever we&apos;re calling it now is really good. It&apos;s also a full-time job to stay on top of everything. Did you know that we have a native `` element now? Did you know we have media queries for css custom properties (errr not-design-tokens, not-figma-variables)? Except in Firefox. [Wait, you mean all that JS I wrote could have just been done with a few lines of CSS?](https://drafts.csswg.org/css-anchor-position/). If only I could pay someone to keep on top of this stuff for me and bring me the best acorns. Tell me, O Design System, what&apos;s new (that our org can support based off our user analytics)?

- This [future CSS article from Volvo&apos;s design system](https://vcc-ui.vercel.app/blog/2022-11-23-future-css) is still absolutely load-bearing for me and one I return to constantly.

  &amp;gt; With new additions to CSS like :where and @layer we have been able to create CSS resets and defaults that are non-intrusive and won&apos;t override the CSS you write in an application. Instead of starting from a default stylesheet provided by different browsers you start from a default Volvo Cars stylesheet, reducing the amount of CSS you need to write for individual components.

- Make it easy for users to flag issues, grievances, or nits. Encourage and reward them. Do not force people to fill out codepens and take screenshots (though still encourage it). Trying to adopt the same open source discipline to your design system sounds great on paper, but often just results in people not filing issues. Everyone loses. It takes an **enormous** amount of context switching to go from product work to filing a design system issue. The majority of people will simply hack around whatever they can to unblock their ticket and never look back.

- Hiring is still broken and we should stop asking design engineers Big O interview bullshit.

- Do your radio buttons have a readonly state? [They probably should](https://react-spectrum.adobe.com/react-aria/RadioGroup.html#read-only). See: sweating the details, accessibility is hard.

- So is 1 rem just 16 pixels? Add this snippet to your analytics and find out.
  &amp;gt; window.getComputedStyle(document.documentElement).fontSize;
  - https://medium.com/@vamptvo/pixels-vs-ems-users-do-change-font-size-5cfb20831773</content:encoded></item><item><title>Design Token Thinking</title><link>https://cianfrani.dev/posts/design-token-thinking/</link><guid isPermaLink="true">https://cianfrani.dev/posts/design-token-thinking/</guid><description>My understanding of Design Tokens</description><pubDate>Sun, 27 Mar 2022 00:00:00 GMT</pubDate><content:encoded>Everybody’s talking about design tokens—how to name them, how to organize them, how to use them, but design tokens aren’t just something that you can easily retrofit into your existing wardrobe. They require an entirely new way of thinking and working. If you want to start using design tokens, you need to start thinking in design tokens, and to start thinking in design tokens, you need to completely change how you think about design systems.

## How we got here

Let’s step back and talk about design systems. What is a design system? Webster’s dictionary defines a design system as “I have work I need to get done and I don’t want to have to think about what color I should make this button.” Design systems are just a bunch of decisions (keyword, this will be on the test) that have already been made for you. Yeah, and they enforce consistency, promote brand awareness, empower users, [save you millions of dollars](https://medium.com/eightshapes-llc/and-you-thought-buttons-were-easy-26eb5b5c1871), yadda yadda. So how do you make one?

## The old way

![a design system worfklow where one designer decides on a color of a button and a developer decodes it from a design file](/assets/blog/design-token-workflow.jpg)

A typical workflow might look something like this:

Somebody wakes up one morning and says “we need a common design for buttons”. This feature then starts with a design team. The design team mocks something up in their design tool. They iterate. They turn in their deliverable. Somebody signs off on it. Now the design file is handed off to a development team, who will translate it into some code that somebody will use in an application. Sound familiar?

This baton-passing is how we’ve been doing it for years. It’s great. It makes sense, except design systems are never finished. [They are living, breathing things](https://atomicdesign.bradfrost.com/chapter-5/#make-it-adaptable). So in six months when we change our button color, now the development team needs to update their button code. And what happens if we change our primary color, which is what the button was using? Maybe we add a new state or a new variant. More batons, more chances for them to be dropped.

Adobe refers to their design tokens as “DNA”. I think that is a perfect name and a perfect analogy. Every time DNA replicates, there is a chance for a mutation to occur, an error in the copying process. Every time you change a design decision, you have the same risk for mutation when that decision gets copied downstream to other consumers. We want to reduce the number of times our design decisions are replicated.

## Reframe your source of truth

In this example, our source of truth is the design file. Until design tokens came around, we haven&apos;t had a great way of representing a design system as a tangible &quot;thing&quot;. Design files were the best implementation, but a design file is not the design system. It&apos;s just one of many implementations. Design files serve designers. Design systems serve everybody.

In our example, in order to access the design system, a user must now:

1. Have access to the design file’s software
2. Have access to a device that is capable of running the design file’s software—operating system, minimum hardware requirements, etc.
3. Have some basic understanding of how to operate the design file’s software.

Next, they need to:

1. Decode the design decisions made in the design file.
2. Potentially translate some values to their specific platform, for example, a web developer may want units in rems while the design file defines them in pixels.
3. Encode the design decisions to their specific platform.

All of these things are barriers to entry for people who just want to know what color to make their button. Yes, you can and should be documenting your design system, but the documentation will always be a step behind your design file. What if there was a better way?

## The way of the future, (the way of the future)

![a design system workflow where one designer interacts with design tokens and a developer reviews them](/assets/blog/design-token-workflow-revisited.jpg)

When a designer is working in a design file, they’re implicitly making dozens of design decisions. “The background color of this button is #f3f3f3. The padding is 20px. It should have a 3px border radius”.

But what if they documented those design decisions at the time they made them? Instead of using the design tool to capture these design decisions implicitly, what if they captured them explicitly using something that was more accessible?

Imagine if they kept a notepad on their desk and every time they made a new design decision, they recorded it in their notepad in plain English. Now they can hand that notepad to all consumers of their design system and completely eliminate the decoding step because there is a shared language.

Now, recording design decisions in a notepad on your desk probably isn’t the best method though. You might use a third party service, a google doc, a separate Figma page, or some JSON on GitHub. We use the [Figma Tokens](https://docs.tokens.studio) plugin and found it to be the least amount of friction for designers because it allows them to work within the tool they are most comfortable with. The most important piece of this is that your design tokens become the primary method for how you talk about and interact with your design system, both internally and externally.

## It’s all about a shared language

![an example diagram of how design tokens can output to multiple platforms](/assets/blog/design-token-pipeline.jpg)

A funny thing happens when you start thinking in design tokens. When you shift your source of truth to something that is more universal, now you’ve opened up the lines of communication. Developers can have a more active role in the design system. Outside contributors can propose new changes. Everybody shares a common language.

&amp;gt; [_Doug ordering from the Honker Burger for the first time_]  
&amp;gt; **Doug:** Hi, I&apos;d like three double cheeseburgers, one all the way, one no pickles, one no onions, a fish sandwich, four large fries and four grape sodas.  
&amp;gt; **Honker Burger Lady:** What on Earth are you trying to say?  
&amp;gt; **Doug:** What do you mean?  
&amp;gt; **Honker Burger Lady:** I can&apos;t understand you.  
&amp;gt; **Doug:** Listen, my family is starving....  
&amp;gt; **Skeeter:** Yo, man, let me take care of this. The new kid wants three moo cows, one no cukes, one no stinkers, one wet one, four cubers, and four from the vine. Want anything else?  
&amp;gt; **Doug:** Well, how do you order a salad from the salad bar?  
&amp;gt; **Skeeter:** One salad from the salad bar.”
&amp;gt; — Mosquito &apos;Skeeter&apos; Valentine, Doug, S01E01

Design Tokens are our Mosquito Valentine. They allow users of multiple platforms and backgrounds (Doug, who doesn’t know anything about Figma) to communicate with our design system (the Honker Burger).

And sometimes, you really can’t come up with a better name than one-salad-from-the-salad-bar.

## The Cost of Design Tokens

Design tokens have a real cost though. You’re offloading the encoding/decoding work that is required from end users of your design system onto the designers themselves. There is no getting around that, but the work is significantly less and the rewards are immense. In the end, you get a design system that is 1. Future proof 2. Accessible 3. Able to be easily changed.

Design tools change. We had Photoshop, then Sketch, then Figma. Platforms change. We had desktop computers, mobile devices, iOS and Android, watches, tablets, TVs, refrigerators. Everything changes, but a button will always be a button.</content:encoded></item><item><title>Not Everything Is A Component</title><link>https://cianfrani.dev/posts/not-everything-is-a-component/</link><guid isPermaLink="true">https://cianfrani.dev/posts/not-everything-is-a-component/</guid><pubDate>Tue, 23 Apr 2024 00:00:00 GMT</pubDate><content:encoded>And that is okay.</content:encoded></item><item><title>The LEGOism of Design Systems</title><link>https://cianfrani.dev/posts/the-legoism-of-design-systems/</link><guid isPermaLink="true">https://cianfrani.dev/posts/the-legoism-of-design-systems/</guid><description>or why LEGO&apos;s Building Instructions are more valuable than the bricks</description><pubDate>Tue, 23 Apr 2024 00:00:00 GMT</pubDate><content:encoded>&amp;gt; &apos;We’re going to build a component library — which are like a bunch of LEGOs — so designers/developers can just pick a prefabricated component off the shelf and build with consistency and coherence.&apos;
&amp;gt;
&amp;gt; It’s a nice thought, if you don’t think about it too much
&amp;gt;
&amp;gt; -- &lt;cite&gt;Jim Nielsen&lt;/cite&gt;, [Components and LEGOs](https://blog.jim-nielsen.com/2023/components-and-legos/)

The concept of a component is commonly sold as &quot;LEGO [pieces]&quot; , reusable bricks that encapsulate some piece of UI. With only a handful of unique blocks, we can create an infinite number of experiences. I&apos;m not entirely convinced anymore. The LEGOism of Design Systems has it&apos;s place--purely as a means of prefabrication. But in the age of Generative AI[^1] and [really](https://react-spectrum.adobe.com/react-aria/) [really](https://headlessui.com/) [good](https://www.radix-ui.com/) open source headless UI libraries, the gap is closing on the value for prefab bricks.

[^1]: I&apos;m not an AI doomer. Not in this one.

We&apos;re pitching the value of a design system as a time-saving, plug n&apos; play collection of LEGO bricks when we really should be pitching the Building Instructions instead. You know, that little booklet that comes with all the cool sets that shows you how to build the thing step by step?

If components are just LEGO bricks, I&apos;m not sure if I even want them anymore. I&apos;d rather you teach me how to design using the system.

## Bricks are useless (to me)

For me, there&apos;s nothing more useless than a bucket of LEGO bricks. I lack the ability to translate my imagination into reality. That&apos;s probably why I&apos;m not a designer. The extent of my crafting ability ends at a square box, maybe a door (if the specific encapsulated door component is available), and two holes, all masquerading to represent a house-like thing. It&apos;s my go to build.

There&apos;s two kinds of people--those that use the Building Instructions and those that don&apos;t (Freebuilders). I acknowledge the people who can grab a bucket of bricks and build up the Death Star from pure creative resolve. I am not that person. Ask me to build a house from scratch and I will lookup 10 different ways and synthesize and distill the best pieces from each. Ask me to build a component from scratch and I&apos;ll do the same thing, consulting [Open UI](https://open-ui.org) and [Component Gallery](http://component.gallery).

If I want to build a house, I&apos;m not going to each brick and trying to understand when and where to use it.

## You Still Need One-Offs Anyway

&lt;figure&gt;
  
  &lt;figcaption&gt;Artist&apos;s rendition of &lt;a href=&quot;https://brickset.com/parts/design-63774&quot; target=&quot;_blank&quot;&gt;LEGO brick #6275973&lt;/a&gt; &lt;/figcaption&gt;

&lt;/figure&gt;

And components are cool, but every product will still require their own one-offs. Similarly, all the cool LEGO sets end up having to print their own unique parts anyway. For example, the Diagon Alley set has [55 unique parts](https://brickset.com/article/54691/which-set-has-the-most-unique-pieces). Maybe it&apos;s a newspaper or a dinosaur bone or a train track switch, sometimes the existing bricks just don&apos;t cut it and a new one must be made.

Counter point: I do think it&apos;s really cool that they were able to reuse [Element #6275973: GLASS FOR FRAME 1X2X2. NO. 7](https://brickset.com/sets/containing-part-6275973), for both the Stranger Things Upside Down set and the Diagon Alley set.

That means I, as a consumer, can&apos;t actually build my own best-in-class LEGO set for Dune. I&apos;m not able to make my own unique pieces. I have to wait for the LEGO company to do it for me. The Company will always deliver a higher quality product.

## This Isn&apos;t For You[^2]

And all that is fine. Maybe these bricks just aren&apos;t for me, as a developer, but I think they can be. Those Building Instructions that come with each set do more to teach design than any long-winded article or video playlist. They can make for a decent [How To Guide](https://diataxis.fr/how-to-guides/) and, when done correctly, invite a sense of [flow](https://diataxis.fr/how-to-guides/#seek-flow).

Building Instructions teach you patterns, implicitly. If you build enough of them, the design will start to reveal itself. At the end of the day:

&amp;gt; LEGOs are a nice, but often you want more than just prefabrication and composition. Design and intention bring about constraints and coherence — “componentisation with big picture planning”.
&amp;gt;
&amp;gt; -- &lt;cite&gt;Jim Nielsen&lt;/cite&gt;, [Components and LEGOs](https://blog.jim-nielsen.com/2023/components-and-legos/)

[^2]: [Wells For Boys](https://youtu.be/BONhk-hbiXk?si=wUFzPUJi55g3Ppor&amp;amp;t=89)

## Related Reading

- [Composability in design systems
  ](https://adactio.com/journal/21084) - Jeremy Keith writes about how the old LEGO kits hit differently. It&apos;s probably because they had a lot less one-off components/bricks.</content:encoded></item><item><title>Visual Regression Testing in Figma</title><link>https://cianfrani.dev/posts/visual-regression-testing-in-figma/</link><guid isPermaLink="true">https://cianfrani.dev/posts/visual-regression-testing-in-figma/</guid><description>Solutions for visual regression testing in Figma</description><pubDate>Sun, 10 Apr 2022 00:00:00 GMT</pubDate><content:encoded>&amp;gt; Full disclosure: This article mentions a plugin that I wrote myself.

How can we actually manage design systems at scale? Have you ever made (what you thought was) a tiny change and ended up accidentally breaking some other part of your design library? As a developer, I&apos;ve made an entire career out of playing this game of whack-a-mole with CSS. &quot;Sure, let me just change the value of this `list-background` style. Surely, nothing else will break.&quot; What happens when you forget that `list-background` is also being used in a table component?

![A diagram showing the names of common components and how they connect to each other](/assets/blog/component-dependencies.jpg)

This is the tricky part about Atomic Design. If you change an atom, you can unintentionally destroy your entire universe. Imagine you have an icon component that gets used inside of a tab component. Your tab component might get used inside of a navbar component. If you wanted to change some property of your icon, you have the risk of affecting two other components.

## Confidence

It all comes down to confidence. To be able to change anything, you need to be confident that you won’t break anything as a side effect. One way of gaining confidence is by manual review. Any time there is a change, you go through each and every page, component, layer and inspect everything yourself to make sure nothing changed that you didn’t expect to change.

![A graph showing the time it takes to change a component on the x axis over the number of components on the y axis increasing](/assets/blog/change-time-over-components.jpg)

The problem with manual review is that it takes a lot of time. This number only grows as your design system scales. The more components and pages you have, the longer it takes to manually review the system. The longer it takes to manually review the system, the less likely you are to make changes. So when you’re in the trenches and need to make the decision to change something, you might remember the massive amount of time it will take to manually review everything by hand and decide that its just not worth the time.

Instead of making a change, you might create a new style, a new component, a new variant until one day you wake up and you have 30 button variants and you wonder where your life went wrong. Design debt is real. Design Systems **must** be malleable. A Design System that can&apos;t change is dead in the water.

## What is visual regression testing?

Developers have the same problem and have come up with a pretty good solution—visual regression testing. Visual regression testing is just a fancy word for taking two screenshots and comparing the results. You start by taking a baseline “golden master” snaphot of your project. Once you have a baseline, you can then work and make some changes. After making some changes, you take another set of comparison snapshots and compare the two for any visual differences. If the changes are what you expected, you can approve them and your comparision snapshots then become your baseline and you start all over again.

The power here is that computers are really good and really fast at spotting differences.

- Nucleus uses [backstop](https://blog.nucleus.design/visual-regression-testing/).
- On [AstroUXDS](https://www.astrouxds.com), we also use backstop paired with storybook.
- [Adobe&apos;s Spectrum CSS also uses backstop](https://github.com/adobe/spectrum-css#testing)
- [And IBM&apos;s Carbon uses Percy and a ridiculous CI/CD pipeline](https://medium.com/carbondesign/automating-a-design-system-69bd2414f75)

## What about Figma?

So if developers have all these cool shiny toys, why can&apos;t designers also have them? After all, _visual_ regressions should be in the domain of design, not development. The Figma story for visual regression testing is not as robust (yet) but there are a few options.

### Figma&apos;s Native Merge UI

When you review a branch merge in Figma, you actually get some of this right out of the box. This might be all your team needs, but I&apos;ve personally found it to be a little clunky and not as feature complete.

### [VRT (Plugin)](https://www.figma.com/community/plugin/1093676352744767137)

&amp;gt; FULL DISCLOSURE: I am the author of this plugin. If you run into any issues or have any feature requests, reach out to me on [twitter](https://twitter.com/markacianfrani)!

![Screenshot of the VRT Figma plugin](/assets/blog/figma-vrt-screen.png)
VRT gives you an experience similar to using a tool like Backstop. It can take snapshots across all of your pages and show you differences across the entire design file.

I developed this plugin to scratch my own itch and have been dogfooding it for the last few months as we&apos;ve been dipping our toes in the Design Tokens water. My number one use case has been to find out where XYZ style is being used across an entire library file. Let&apos;s say I wanted to see where a `list-background` style is being used. I would take a baseline, change `list-background` to some obnoxious level of red, take a comparision snapshot, and then run the comparision. The result is a list of every page that had visually changed as a result of changing that style.

### [Visual Difference (Plugin)](https://www.figma.com/community/plugin/1077953882260191737/Visual-Difference)

![Screenshot of the Visual Difference plugin](/assets/blog/visual-difference-screenshot.png)

Visual Difference is a more fine-grained option for testing individual components or layers, made by the wonderful people at EightShapes LLC. It gives you a very useful diff overlay on the actual layer itself and also features an awesome &apos;party mode&apos;.

## Embrace Regressions

Tools and tests won&apos;t prevent you from introducing regressions. Regressions are inevitable. Even if you have a clever, well thought-out naming convention and impeccable documentation today, nothing is immune to entropy. Instead, ideas like visual regression testing add confidence to your design process. Confidence allows you to try new things and change what&apos;s not working. Confidence allows you to keep moving forward without having to take two steps back.</content:encoded></item><item><title>Waymarks</title><link>https://cianfrani.dev/posts/waymarks/</link><guid isPermaLink="true">https://cianfrani.dev/posts/waymarks/</guid><description>The one where we make signs</description><pubDate>Sun, 01 Feb 2026 00:00:00 GMT</pubDate><content:encoded>Our environment is full of signs. Stop signs, McDonald&apos;s Golden Arch, Neon Billboards. Phone numbers slapped on the back of family-owned contractor trucks next to a legacy 2000-era Facebook logo. We&apos;re constantly being prompt injected. Our attention has been under attack for a while now. And we mostly just tune these things out.

And today&apos;s LLMs are no different. To be able to effectively wield AI today, you need to be good at writing signs and avoid writing bad signs.

## Bad Sign Rising

There&apos;s a very easy heuristic to determine a good sign from a bad sign--bad signs serve the people who created them. Good signs serve the people who read them. In short, most ads are bad signs. After all, we can&apos;t possibly blindly follow every instruction laid in front of us. Bad signs have zero trust. They are in your face. They SHOUT IMPORTANT!!!!

## Waymarks

&lt;figure&gt;
	
	&lt;figcaption&gt;
		Waymarks are subtle markers that guide us without being intrusive
	&lt;/figcaption&gt;
&lt;/figure&gt;

In hiking, there&apos;s the concept of waymarking--physical markers that help guide travelers along a path. They can be groups of stones or paint on a tree. They aren&apos;t explicit or in your face but provide just enough information to help steer you.

Waymarks are so effective because they are inherently social. When you see a waymark, you know there&apos;s another person that left it for you. There&apos;s high trust because the motivation is clear. Waymarks are left behind from people who have blazed the trail before you to help you avoid some undesirable outcome they had to experience themselves. Waymarks are anti-Boomers.

All good signs follow the same pattern:

&amp;gt; When you observe or experience an undesirable outcome, write down a way to avoid that outcome.

When too many people trip at the pool, you put up a No Running sign. This was the core idea of [Geoff Huntley&apos;s original Ralph Wiggum post](https://ghuntley.com/ralph/). When your AI agent does something you don&apos;t like, you stop it, put up a new Waymark, and try again. If it uses too many em dashes, you change your prompt &quot;do a thing. Don&apos;t use em dashes&quot;.

A Waymark can be a skill[^1], a subagent, a prompt, an MCP tool, or a Slack message. The names of these things will continue to change. The thing that matters most is the pattern. **A Waymark is the act of sharing experience with the intent of improving those that come after.**

We finally have a way to encode and decode information with almost zero friction. How many times is the same problem, under the same context, being solved by multiple people in your organization right now solely because neither of them knew the problem had been already solved to begin with?

## Missing the Mark

It might then make sense to build a [marketplace of Waymarks](https://skills.sh/), but we&apos;d only be addressing a symptom. The best Waymarks are invisible. They&apos;re built into the environment. At worst, they&apos;re [desire paths](https://en.wikipedia.org/wiki/Desire_path). 

Why is our environment so littered with signs? Why are our AGENTS.md files becoming essays? Because it&apos;s a lot easier to just throw up a sign than actually change the environment.

&lt;figure&gt;
	
	&lt;figcaption&gt;POV: An AGENTS.md IRL&lt;/figcaption&gt;
&lt;/figure&gt;

A sign that starts with &quot;THIS IS A PLACE FOR FUN!!!&quot; and ends with police involvement and drug use. Reads a lot like &quot;YOU ARE SENIOR SOFTWARE ENGINEER&quot;. You have to wonder what the story was behind this sign. How often were people throwing stink bombs that they needed to put a waymark up about it? There&apos;s also likely some legal requirement behind a sign like this. But that requirement doesn&apos;t serve the user. It serves the maker of the sign. I can&apos;t imagine the type of person who would be disrespectful at a trampoline park would abstain because they read the sign.

No, many signs exist because it&apos;s a lot easier to try and prompt inject people than it is to change the environment. All of our cursor rules and markdown yappings work about just as well. Instead of that &quot;No Running&quot; sign at the pool, change the environment so it discourages running or even flat out prevents it. Engineer a solution. Why are people running? Is it because the bathroom is too far? Too many bored teenagers? Move the bathroom closer. Replace the hard concrete with that spongey stuff.

Of course these things aren&apos;t practical in the physical world where the cost of changing the environment is enormous. But in the digital world, the cost of software is at an all time low.

You don&apos;t blame the intern who takes down production. You blame the person who created the environment that allowed an intern to take down production. Instead of shouting at the model to stop using css selectors in e2e tests, write a hook or a lint rule to enforce it out of the loop. It costs nothing to write these things now. Maybe it doesn&apos;t have enough prior art. Give it access to more of your internal codebases.

And unlike our physical world, if the LLM starts throwing stink bombs, just ask it why it started throwing stink bombs. Ask it how to make it not throw stink bombs next time. Hire a whole team of psychologists for the cost of $20/mo.

[^1]: I think Skills try to be too much like autoloading &quot;I know Kung Fu&quot;. Waymarks are more like the collected marginalia of everyone who ever learned Kung Fu. You still learn Kung Fu. It&apos;s just significantly easier.</content:encoded></item><item><title>Implicit Prompting Through MCP Tools</title><link>https://cianfrani.dev/posts/implicit-prompting-through-mcp-tools/</link><guid isPermaLink="true">https://cianfrani.dev/posts/implicit-prompting-through-mcp-tools/</guid><description>You can implicitly run prompts via MCP Tools via required inputs</description><pubDate>Tue, 18 Mar 2025 00:00:00 GMT</pubDate><content:encoded>Developing an MCP server is a very weird experience--on one hand you&apos;re writing code, implementing interfaces, adhering to a spec, on the other, you&apos;re just prompt-engineering. It&apos;s been an eye-opening experience to say the least trying to navigate the lines between where the LLM ends and your code begins.

If you&apos;re not familiar with MCP, there&apos;s a ton of Hello World slop out there already. You can get up and running building your own MCP server fairly quickly.

One little trick I&apos;ve been using lately is using Tools to implicitly run prompts through required inputs.

## On Tools

Of the many features, [Tools](https://modelcontextprotocol.io/docs/concepts/tools) are by far the most powerful. What are tools? They&apos;re commands. They manifest in Claude under this little hammer.

&lt;figure&gt;
  
&lt;/figure&gt;

One defined, a Tool becomes magically available to the AI and the AI can decide on it&apos;s own when to call it. You can also manually invoke a tool, but this auto-discovery is the really powerful part.

Conceptually, Tools are meant to perform explicit actions--the canonical example being saving a file or making some HTTP request. But, whether this is a bug or a feature, Tools can also be used to make the AI execute implicit prompts.

By definition, Tools are supposed to be dumb. Give an input, get an output. For example:

```js
{
	name: &quot;analyze&quot;,
	description: &quot;Analyze a thought and determine how many characters it is&quot;,
	inputSchema: {
		type: &quot;object&quot;,
		properties: {
		thought: { type: &quot;string&quot; },
		}
	}
}
...
server.setRequestHandler(CallToolRequestSchema, async (request) =&amp;gt; {
    const { name, arguments: args } = request.params;

    if (name === &quot;analyze&quot;) {
  // You can do whatever here.
      return {
        content: [{ type: &quot;text&quot;, text: `Your thought was ${request.params.args.thought.length} characters long.` }]
      };
    }
});
```

But this isn&apos;t a really useful Tool. Instead of just outputting the length of the input, how can we make this more interesting? How can we leverage the AI to perform the action?

We can take advantage of Tool properties for this. Before executing a Tool, the AI will read the required properties and intelligently (or not so intelligently) autocomplete it&apos;s way to victory. Let&apos;s add a new `category` property, with some logic: &quot;If the first letter of the thought is a capital letter, it belongs to bucket A, otherwise it belongs to bucket B&quot;:

```js
tools: [
	{
		name: &quot;analyze&quot;,
		description: &quot;Analyze a thought and determine what category it belongs to&quot;,
		inputSchema: {
			type: &quot;object&quot;,
			properties: {
				thought: { type: &quot;string&quot; },
				category: {
					type: &quot;string&quot;,
					description:
						&quot;If the first letter of the thought is a capital letter, it belongs to bucket A, otherwise it belongs to bucket B&quot;,
					enum: [&quot;bucket A&quot;, &quot;bucket B&quot;],
				},
			},
		},
	},
];
```

&lt;figure&gt;
  
&lt;/figure&gt;

&lt;figure&gt;
  
&lt;/figure&gt;

Works pretty well. I could also just as well add a `length` property with the description &quot;count the number of characters&quot; instead of calling `thought.length`. In building out an MCP server, deciding what to offload to the AI and what to implement myself has started to bend my brain a little. Obviously, I&apos;m not going to trust the AI to perform a character count for something mission critical....at the same time, the AI could interpret the description differently and do things like automatically ignore whitespace whereas I might otherwise forget that use case.

It gets even more interesting if you change the prompt:

```js
color: {
 type: &quot;string&quot;,
 description: &quot;As if you have chromesthesia, what color is the essence of the thought?&quot;,
 enum: [&quot;red&quot;, &quot;blue&quot;, &quot;green&quot;, &quot;yellow&quot;, &quot;purple&quot;, &quot;orange&quot;, &quot;pink&quot;, &quot;brown&quot;, &quot;gray&quot;, &quot;black&quot;, &quot;white&quot;]
}
```

**Prompt**: &quot;analyze this thought: dogs are animals&quot;

**Request**:

```json
{
  `color`: `brown`,
  `thought`: `dogs are animals`
}
```

**Response**:

&amp;gt; Based on the analysis, your thought &quot;dogs are animals&quot; has a brown color essence.
&amp;gt; Brown is often associated with earthiness, stability, and nature - which aligns well with this fundamental biological classification. The thought represents a basic taxonomic fact that connects domesticated canines to the broader natural world, so the earthy brown tone feels quite fitting for this straightforward, grounded statement about the natural order.

## On Prompts

I feel like this is what [Prompts](https://modelcontextprotocol.io/docs/concepts/prompts) _should_ be. But Prompts are &quot;user-controlled&quot; and the UX right now in Claude Desktop is garbage. To invoke a prompt, you have to click through 2-3 menus and then submit a text file that contains the prompt returned from the MCP server. In time, I imagine this experience will get better.

## Source

Full example project can be found at [https://github.com/markacianfrani/mcp-tool](https://github.com/markacianfrani/mcp-tool)</content:encoded></item><item><title>Do you really need that letter-spacing?</title><link>https://cianfrani.dev/posts/do-you-really-need-that-letterspacing/</link><guid isPermaLink="true">https://cianfrani.dev/posts/do-you-really-need-that-letterspacing/</guid><description>Letter-spacing is a spice</description><pubDate>Wed, 18 Sep 2024 00:00:00 GMT</pubDate><content:encoded>**TLDR:** If your typography styles don&apos;t include letter-spacing, you can make use of the CSS [font](https://developer.mozilla.org/en-US/docs/Web/CSS/font#constituent_properties) property and have a single (composite) design token. You might not realize &apos;font&apos; supports line-height.

---



Letter-spacing. The typography property voted #1 &quot;Most Likely to be Skipped Over in Implementation and Missed During QA.&quot; Your design system typography styles probably have a very subtle amount of letter-spacing. Can you find the reason why? Start asking around. (Great, now document that design decision some place).

Maybe the reason is to give improve readability or add some subtle hierarchy or maybe the reason is just &quot;because `` said so&quot;. Those are all valid reasons. But if you&apos;re somewhere in the middle and not quite sure, consider: what if you removed it?

## Imagine a world without letter-spacing

&lt;div&gt;

```css
h1 {
	font-size: var(--font-size-lg);
	font-family: var(--font-sans);
	font-weight: var(--font-weight-bold);
	line-height: var(--line-height-whatever);
	letter-spacing: var(--letterspacing-whatever);
}
```

&lt;div&gt;
vs.
&lt;/div&gt;

```css
h1 {
	font: var(--font-heading-1);
}
```

&lt;/div&gt;

A world without letter-spacing is a world where you can fully encapsulate your typography decisions all within a single token.



Thought experiment: if I deleted all your letter-spacing tokens overnight, how long would it take for someone to notice? Look, even Figma&apos;s [Simple Design System](https://www.figma.com/community/file/1380235722331273046/simple-design-system) conveniently fails to mention their -2% letter-spacing.

## A quick primer on typography tokens

Typography tokens are usually composite tokens, meaning they&apos;re composed of other tokens. They&apos;re the weirdest of the bunch. The design decision for &quot;Heading&quot; styles is composed of a number of little decisions like:

- Font Size
- Font Family
- Font Weight
- Line Height
- and......last and definitely least, Letter-spacing.

This usually plays out as something like:

- `--font-size-lg`
- `--font-sans`
- `--font-weight-bold`
- `--line-height-whatever`
- `--letter-spacing.....whatever`

In CSS, this becomes:

```css
.heading {
	font-size: var(--font-size-lg);
	font-weight: var(--font-weight-bold);
	line-height: var(--line-height-whatever);
	letter-spacing: var(--letterspacing-whatever);
}
```

Notice how we need to introduce a new `.heading` class? Now we&apos;ve left the realm of tokens and into the realm of abstractions. We needed a way to roll up all of these tiny decisions that make up what we say when we talk about &quot;Heading&quot; styles. Maybe engineering calls this class &quot;h1&quot; but design calls it &quot;heading&quot;. Over time, these little divergences make it difficult to communicate across disciplines. Things slip through the cracks.

If only there was a better way....

## The Font Shorthand, Beloved Boi

Enter: [font](https://developer.mozilla.org/en-US/docs/Web/CSS/font). One of the oldest OG CSS properties, often unused. It&apos;s a shorthand property that rolls all[^1] of those decisions up into a single property.

[^1]: And by all, I mean all the important ones

&amp;gt; If you need some dumb interview trivia questions, ask a developer what the order of properties are for `font`. I look it up every single time.

The font property is just a shorthand for ` / `, among a few others. Instead of declaring every one of those typography decisions on multiple lines, we can just simply do

```css
.heading {
	font: var(--font-weight-bold) var(--font-size-lg)/var(--line-height-whatever)
		var(--font-sans);
}
```

or better yet, we can tokenize the entire `heading` design decision in a single token. Tools like Style Dictionary make this translation trivial.

```css
:root {
	--font-heading: var(--font-weight-bold)
		var(--font-size-lg)/var(--line-height-whatever) var(--font-sans);
}

.heading {
	font: var(--font-heading);
}
```

## So what about letter-spacing?

That&apos;s right, `font` does not support letter-spacing. So, at best all we can do is:

```css
.heading {
	font: var(--font-heading);
	letter-spacing: var(--letterspacing-whatever);
}
```

The implications of this can be pretty large. You can no longer just ship design tokens (as CSS custom properties). At the very least, your system will need to accommodate the design decision of &quot;this thing we call Heading is composed of these other things&quot; by shipping utility classes like `.heading-1` and `.heading-2`.

Now you&apos;ve got to think about things like namespacing and naming conflicts, versioning, breaking changes, communicating breaking changes, the cascade, how to make sure people don&apos;t accidentally override the class, how to make sure people can override the class, and documenting all of the above.

You&apos;ll likely need to do all of that for other reasons, but ask yourself, is that letter-spacing really worth it?

## The Details Matter

This isn&apos;t an argument to remove letter-spacing so that developers can remove 4 lines of code. It&apos;s only an argument for questioning why a design decision was made. The answer doesn&apos;t matter, just that there is an answer and it was explicit.

&quot;Because its easier in dev&quot; is too often an excuse to not do something. The little details matter, a lot. Letter-spacing is one of those things that, when it&apos;s there you probably don&apos;t realize it, and when it&apos;s not there, you probably don&apos;t realize it but you can tell something is slightly off like forgetting to put a book back on the bookshelf. Letter-spacing is a spice. Use it well. Or don&apos;t. But if you remove it, developers can use a single token for typography.</content:encoded></item><item><title>Controlling Browsers with LLMs</title><link>https://cianfrani.dev/posts/controlling-browsers-with-llms/</link><guid isPermaLink="true">https://cianfrani.dev/posts/controlling-browsers-with-llms/</guid><description>The one where I look at how the playwright MCP works</description><pubDate>Sun, 04 May 2025 00:00:00 GMT</pubDate><content:encoded>When playwright announced their [official MCP server](https://github.com/microsoft/playwright-mcp), the first thing I tried was setting it loose on one of the applications I&apos;m developing. The results were good--really good.

Look at this example (not my application) where I simply ask:

&amp;gt; Visit [https://factoriolab.github.io/spa?v=11](https://factoriolab.github.io/spa?v=11) and select the legendary preset


  
  Your browser does not support the video tag.


Claude then:

1. Opens up a new browser using the `browser_navigate`
2. Takes a snapshot to &quot;see&quot; the page
3. Immediately recognizes there&apos;s a welcome page and clicks through.
4. Sees the preset, tries to click it, and realizes it&apos;s outside of the viewport.
5. Then ultimately runs out of context and quits.

There are a couple of really interesting things in this example. In this particular application, the &quot;preset&quot; dropdown is hidden on smaller viewports. If it didn&apos;t run out of tokens, I expect it would have been able to navigate the &quot;mobile-friendly&quot; design on it&apos;s own.

These are the types of end-to-end tests I hate having to write and maintain. Throwing a LLM at a browser won&apos;t ever replace a solid E2E suite, but there&apos;s an enormous amount of potential for new applications here. The Claude Desktop + Playwright MCP example shows you can get pretty far with little effort. Until you run out of context. So how do I download more context? How can we make a session run long enough to complete an E2E flow?

We&apos;re gonna need to build our own agent. To do that, we need to understand what just happened. How does this actually work?

## inb4 LLMs cant &quot;see&quot;

When Claude says stuff like &quot;Let me take a snapshot to see the page content more clearly.&quot; and &quot;I see the main interface now&quot;, what is it talking about? How does it &quot;see&quot;? Your first assumption might be &quot;it&apos;s taking a screenshot of the page and interpreting the image.&quot;

Not quite.

Unless you explicitly use the `--vision` flag, the Playwright MCP will actually use aria snapshots instead of images. My example above is not using vision.

Let&apos;s step back for a second and think about this. How could an LLM interpret a webpage? One way would be to take a picture and then describe the picture. For follow up interactions, it would need some concept of a coordinate system (click x,y).

Another way would be to just pass text and interpret that. Websites are just text after all. For follow up interactions, it would have direct access to the elements on the page. So we should be able to just view source and have the LLM interpret that.

And we can. Except it would cost a fortune.

## everything is token

When you use services like claude.ai or the Claude Desktop app or chatgpt.com, you don&apos;t really care about tokens or API costs. You get an all-you-can-eat-ish pricing model. Instead of paying per token, you pay per month. It&apos;s a great consumer model, you don&apos;t have to worry about input tokens and context length. Except when you abruptly get cut off or rate-limited because your chat got too big.

Under the hood, my MCP chat session is still calling the Claude LLM. I enter a prompt, then the LLM responds. Sometimes, the LLM will respond with it&apos;s own prompt, and the user responds (indirectly by executing tools). For each turn, the entire conversation history (the context) is maintained and passed back and forth because the LLM has no concept of memory.

This history, or context, is measured primarily as input tokens. And tokens are more or less just words, or parts of words, or groups of characters, or whatever[^1]. The more tokens you have, the bigger your payload, the more money you pay.

And there is a hard ceiling to the total number of tokens your conversation can have. And it&apos;s 200k (ish) for Claude. For our scenario, output tokens barely make a dent.

You pass along the entire conversation in each call so if your first call adds 20k input tokens, your second call will add 20k + whatever you just added. Third call adds 20k + 4k + maybe another 2k and so on. (Applications like Claude.ai use a rolling [&quot;first in, first out system&quot;](https://docs.anthropic.com/en/docs/build-with-claude/context-windows))

| Turn | Input Tokens (This Turn) | Cumulative Context Size (Tokens) |
| ---- | ------------------------ | -------------------------------- |
| 1    | 800                      | 800                              |
| 2    | 15000                    | 15800                            |
| 3    | 500                      | 16300                            |
| 4    | 20000                    | 36300                            |
| 5    | 1000                     | 37300                            |
| 6    | 25000                    | 62300                            |
| 7    | 300                      | 62600                            |

So if we want to be able to execute an entire E2E flow, we&apos;ve got a context problem. What&apos;s the most efficient way for an LLM to &quot;see&quot; a webpage?

## Text vs. Vision

### HTML

Right off the bat, it&apos;s absolutely not sending the entire source of the page. If we take the entire rendered HTML of this application, it comes out to 167,325 characters / 48,423 tokens. We aren&apos;t writing artisanal Zeldman HTML anymore. Modern websites are massive.

We could certainly strip out some of the junk in the markup and run it through a preprocessor but even if we halved it, that&apos;s still 24k tokens.

### Images

Okay so what&apos;s it look like if we took a picture? For the vision approach, Playwright will take a visual screenshot and return it as a base64 encoded image that is then sent to the API to interpret. The request looks something like

```json
  {
    &quot;role&quot;: &quot;user&quot;,
    &quot;content&quot;: [
      {
        &quot;type&quot;: &quot;tool_result&quot;,
        &quot;tool_use_id&quot;: &quot;toolu_01YZELPTqYQjKACBRtqJtoL5&quot;,
        &quot;content&quot;: [
          {
            &quot;type&quot;: &quot;image&quot;,
            &quot;source&quot;: {
              &quot;type&quot;: &quot;base64&quot;,
              &quot;data&quot;: &quot;/9j/OVzXSsDdrh07OlbREFB7WT1NK5sLtG0uvmYQBs2KWiYY9OC0tGlcWi1tmzcrSICIiAiIg+WfCX/wBQVv8AqKXDa2ilwx+GYkZI4tJpYpoxcsdaxuOkKL4S/wDqCt/1Fq17oi9MPHM2mXV0kOGy4bV4VhVYZKuoDXF8rMglym+Vt93/AN9HKkFpIO8bCi8ViLJM3fSPg/8A4HSfU/NbFa74P/4HSfU/NbFeSr3l6Y9hERRRERAREQEREBERAREQEREBERAREQEREBERAREQEREBERBjW/NGfW96hw/5Z/1D+SmrfmjPre9Q4f8ALP8AqH8lhtYREW2BERAREQEREHm9cnjHwfhp6rXGxPkpbkviZ/lPH6vGy61eK01TCTF3C3NZO8SNjMTQ15fICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiDOSbQUwda53BY01UZ8zXNAIF9iwrfmjPre9Q4f8s/6h/JYbWERFtgREQEREBERAWkxnG2UsgpIJGNmcbOlf5EV+Pb2K1jNc+jpctOwyVMt2xNAue0+hciyShkpZHFtfUR7XTsL2eI7pdYi4779i3TTfrKTLNzZI61+lD3xyx5aovka9wBPlf8ATxtussKKjc6nlip4TVRv8WSbMWQggNzs+rsYyMWY1rRwAssl8mRNnubvZ9ZRfJkTZ7ru9n1lFx3wL/8Ae/8AZ/8AJdQvj+J8dsas6eN7d+38nr09LOmKrrKKsi8/xX/4/r/435fusoqyJ8V/+P6/+Hl+6yirInxX/wCP6/8Ah5fusoqyJ8V/+P6/+Hl+6yirInxX/wCP6/8Ah5fusoqy5L4ZfOqb6h/FenwvjfMasaeNvu56unt05Xdsi+TIvr7Pd5d3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz
			  BLAH BLAH BLAH SOOOOO MUCH MORE GIBBERISH
			  3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6yi+TImz3N3s+sovkyJs9zd7PrKL5MibPc3ez6vW/NGfW96hw/5Z/1D+Sipf/TuH/6Uf/ipcP8Aln/UP5LzT7u8PlyIi97xCIiAiIgIiICIiAisxUcstFNVty6OFzWuudtzuTU5fi81vi6ISaPftva6XgsrIiICIiAiIg6j4F/+9/7P/kuoXL/Av/3v/Z/8l1C/I/4n/uq/t/aH1vDf6UCLxVPjGEyFjGzPAdlL2xOLQe9eGmiqr2h2mYj3FhZWiAAAAAAAABimQAAt4UAABjaWFlaIAAAAAAAACSgAAAPhAAAts9YWVogAAAAAAAA9tYAAQAAAADTLXBhcmEAAAAAAAQAAAACZmYAAPKnAAANWQAAE9AAAApbAAAAAAAAAABtbHVjAAAAAAAAAAEAAAAMZW5VUwAAACAAAAAcAEcAbwBvAGcAbABlACAASQBuAGMALgAgADIAMAAxADb/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wAARCAMgBQADASIAAhEBAxEB/8QAGwABAAIDAQEAAAAAAAAAAAAAAAMEAgUGAQf/xABREAABAwICBgQIDAQEBQQCAgMBAAIDBBEFEhMUITFRkUFSYXEWIjJTgZKh0QYVMzRlcqKjscHS4iNUY+E1QnOTJFVigrI2RMLwQ/GU0wclRf/EABoBAQEBAQEBAQAAAAAAAAAAAAABAgMEBQb/xAA1EQEAAQMDAgQGAAQHAQEBAAAAAQIREgMTUSFhBBQiMRVBUoGR8DJxobEFIzM0wdHhYkLx/9oADAMBAAIRAxEAPwDskRFtgREQICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiDOSbQUwda53BY01UZ8zXNAIF9iwrfmjPre9Q4f8s/6h/JYbWERFtgREQEREBERAWkxnG2UsgpIJGNmcbOlf5EV+Pb2K1jNc+jpctOwyVMt2xNAue0+hciyShkpZHFtfUR7XTsL2eI7pdYi4779i3TTfrKTLNzZI61+lD3xyx5aovka9wBPlf8ATxtussKKjc6nlip4TVRv8WSbMWQggkrxaQxkxsjbUaN7QN7mjpN+hY4HRz4viTY8Qgs2nOaaRzMr5Oq1/Fc092dxe7eTf0ru/g9RtbgcdRUBz5Jy6RxLjc8F3r9NLnROUthVwMhqpXRttexIG7kjI45J4AWtc1x27N+xe5RncR0gDffoWEfj1LBI0NjjfvDuhfnoqv4m/E+73Wtp/wA0jPg/S00rpxPVyAbWxSzudG30LUfB+ipasYrGZ5H/AMW92kMLTxaWm631VS0lXCYjTvmaegl1vxVVmB0UJEsdBTU5aPLuQRyX1PM0W6Xn+UfsOG3KxheE09JFnL56h7xtdUSGQjsF9ymhbSsi8dkYNzsttUWWmghYZa6NkbvJvIAD3ElW2UkFgQMwPSTdc6tTXq/hpt/Of+IaimiPeWpxOjpcQoJqUxAOcS5jwLFp6CuIiqNUqY6enog2e+STpe93SGk+SD2L6m1jG+S0DuC4L4XPqMNxeQ0shibVtD3Fo23Gw7V6PDbt5jUm7nq4x1htfgzUjRTURkjeYHbDGbtAPQO7ct4uH+C9fM/GI45HBwLCwWaB29C7hda4tLFM3hXrqyOgpX1EocWM3ho2qOixGCtodbizCMXvmABFlW+E3+B1HcFozK+jiqMNZcGqyGPucNqRTeEmq0t7R45DX09RJSQTPdCPIIALu6xKxmxh8EVE6Wjcx9S7KWOdYs9m32LUUUTYYMaiaLBgt7FBUuczB8GdGzO8O8VvErWMXTKbO0Rcthry6DEKyUuOJsBDmuG2MdFlRoaatlhgq6akdps+Z1VrQ8cX2gtJUw7rm7GqqNWgMuilmt/kibmcfQtZD8IoZpjFHQV5e02cNCPF79uxbht8ovsNtq0OCgHG8VB3F6zFrTdZvdvl6uLex2lmwXaBJU3H1bXU2GOdXYjQ07too2uzd4OxawTN0FNXufFUS1VM+lZC4i7/APMOO5TmrgFJrRktBlzZyDuXKQxMkwTGC4E5JS4WJG1Tz4dSRfBR00UdnvYHuIe7ae66TTBlLqIpGTRtkjN2OFweIWa5efCYGfB5k9JHlmYGzXDibkb96kwyQYxjYrN8dNEAPrHepj81y+TpFrXY1TMxUYe5sgmOwOsMv4rZLkMTjccXr52Xz07WSD0HapTET7lU2b7EsZpsMliinbI50m4MANu+5UtfiMOH0gqJg8tJADWgEm65XEpRiGlr97NLHHH+JV/G6nSYnRU2hmnZCBJIyFmZx9C1hHRnL3b2lroquhFXEHaOxNiNot0Jh9dFiNKKiFr2sJIs8AHZ3LRYFU5RiNG6OSIWdIxkrcrgCOCu/BP/AAVn13fipNNrtRVduSQASTYBak/CCmMjhFBVTsabOliiuwHvVzFA92G1Ajvm0ZtZU/g1JD8Rw5XNGQWft3HtUiItcmetklXjcNLUsp9WqppHMzhsMYcbc1YZXsNC6rkimhY0ElsrcrhbsWmr9aPwqj1HQ6XQG2mvlt6F78IaipbhkNLOGuqZ32c2nBNx2A7VrGOiZT1bXC8VgxSN74GyNyGxbIAD7Cry5XDKkU/wgDBT1FPFURhobPHkJcOldUs1RaVpm8CIiy0IiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICLxeoCIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgIiICIiAiIgxrfmjPre9Q4f8s/6h/JTVvzRn1veocP+Wf8AUP5LDawiItsCIiAiIgLX49/glX/plbBVcTi0+G1Edr5ozsVj3SfZ8wPknuX0SDLDgNCTJIA9gaA3bv7FyNI7D5SWRUsMcm688rnE7NuVo2Xv0bV02EtlrMCoyx5MlO4gRuAG1vQVvxEzNPRjSizYNaWxhrTtG4lTUk9PSx5NHLpHO2NPjlx7CqsMmeZxDHWI3noPAqSUXjdtaNlwXbgvzWnq1aFfqi930Zpium8N0DcA2t2KljJthVR/HMIyEF4ZmPdZVWtxIULJKeaGIhty2Rhf6b3WtrKr4SRsc18VPJG4WvE3bY9hK+5GpTH8U2/n/wB+zyTdz1JMWuYKdlrR5S0Uos4f9wO3uXU/BvF4NBFQzSETBpLMzgcw4bNxHBco9rGEMqX1FMb/AOZlvxW0wsMp5dLT18uYi1zG12zkvXVEVReHGmZiWzirq+TFgYXVD43PIImaI2Bv/T0k9q13/wDkEDTUR6bOW2bTS1VVFUuxCoMsYIYRC0AX7LbVoPhTG+oxSOnmr7iKMXklYBZztzbNHSs0Wyhqq+MtV8Hv8cpPrr6OuF+D+HSxY1TmbR3GZ2VsgcRbpNiu6V1ZvLOnFoQ1NNDVwOhnZnjdvFyPwUbsOpHVEM7oQZYRZjrnYPzUeMVklBh0tREGl7NwcNi10OMYhDNSCuipjFVeSYS4FvfdYiJt0amYv1bRuG0jdPaL5x8p4x8b27PQsThVEYoIzD4lObxjM7xTz2qtR4gWy176msjkihdubGQYx27NvtUrMdwySdsLKthe7cLG3O1ktUXhY1Cm1w1eitORlLg4i47RuKrjAsMFQJxSNEgN7hxtfuvZS1uK0VA5raqobG524WJPsVHGsTfFh8FRQTttJIBmABuPSkZE2bpV4KKnp55Zoo8skxu85ibrXMrqk/CNtIZP4BgD8uUb7cd6sHHcMFRq5q2iS9rEG1++1ktK3hOcPpTXCtMQ1gC2e5/DclNh9LSzyzwRZJJfLdmJvzUdZi9BQyCOpqGseRcCxJ9gVqGWOeJssTg5jhcEdKnWx0Qw4dSQxzRxwgMmN5ASTmPpUdPhFBTQyxQ04Ecux7S4m/MqkMTr66qmjwuGn0UJyuknJs48BZXnVzaOiZNib44HneGkkX7OkqzFSdFeShGGUj2YTQNkdLsc0y2A7fGKYRhbqHC3wOc1s0ty4tFwCfcrNPilFUwPngna+OMXcQDcejejsTo20TawzWgcQA/Kdt+y10vPsWj3ZYbRNw+iZTtdmy7S61sx4r3UKYzTSmK75m5ZCSfGCjqMVoaV4ZPUNjcW5wCDtCxbjNBJRyVUVQ18ce/g3KeXUZHdU8kyO6p5Ll/D76M+/wD2p4ffRn3/AO1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+//anh99Gff/tTar4Nynl1GR3VPJMjuqeS5fw++jPv/wBqeH30Z9/+1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/8A2p4ffRn3/wC1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/wD2p4ffRn3/AO1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+//anh99Gff/tTar4Nynl1GR3VPJMjuqeS5fw++jPv/wBqeH30Z9/+1Nqvg3KeXVRtcHi7TyVlcZ4ffRn3/wC1PD76M+//AGptV8G5Ty7NFxnh99Gff/tTw++jPv8A9qbVfBuU8uunBIFgSocjuqeS5fw++jPv/wBqeH30Z9/+1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/8A2p4ffRn3/wC1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/wD2p4ffRn3/AO1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+//anh99Gff/tTar4Nynl1GR3VPJMjuqeS5fw++jPv/wBqeH30Z9/+1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/8A2p4ffRn3/wC1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/wD2p4ffRn3/AO1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+//anh99Gff/tTar4Nynl1GR3VPJMjuqeS5fw++jPv/wBqeH30Z9/+1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/8A2p4ffRn3/wC1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/wD2p4ffRn3/AO1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+//anh99Gff/tTar4Nynl1GR3VPJMjuqeS5fw++jPv/wBqeH30Z9/+1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/8A2p4ffRn3/wC1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+/wD2p4ffRn3/AO1Nqvg3KeXUZHdU8kyO6p5Ll/D76M+//anh99Gff/tTar4Nynl10AIBuCFKuM8Pvoz7/wDanh99Gff/ALU2q+Dcp5dmi4zw++jPv/2p4ffRn3/7U2q+Dcp5dmqOIYRQYkP+LpmPd0PGxw9I2rmvD76M+/8A2p4ffRn3/wC1WNOuPYnUoled8CMLLriSqaOAe234K/Q/BvC6F4fHTB8g3PlOYj8lovD76M+//anh99Gff/tVmnVlmKtOHZouM8Pvoz7/APanh99Gff8A7Vnar4a3KeXXjp7z+KO8k9yhoajW6GCpy5NMwSZb3tcXtdTO8k9y5uj/2Q==&quot;,
              &quot;media_type&quot;: &quot;image/jpeg&quot;
            }
          }
        ]
      }
    ]
  },
```

The base64 image itself is roughly 30,612 characters / 17,109 tokens. But wait, isn&apos;t that a lot of tokens?

Yes. Base64 images aren&apos;t human-readable. This visualization may help.
![screenshot of a base64 image in openAIs tokenizer showing 17,109 tokens and 30612 characters ](/assets/blog/image-token-count.png)

Luckily, **we don&apos;t pass images as text strings**. Instead, we use the `image` type in our API call. This tells the LLM (if its multimodal) it&apos;s not dealing with text. For Claude, image tokens are calculated a little differently:

&amp;gt; If your image does not need to be resized, you can estimate the number of tokens used through this algorithm: tokens = (width px \* height px)/750

So for us, our (1280\*800)/750 image comes out to roughly 1,365 input tokens. That&apos;s significantly cheaper. That&apos;s also 1,365 tokens no matter what the content is, as long as the size remains the same.

Does this mean images are the way to go? Not necessarily. Using screenshots can sometimes be less accurate. It all depends. When I try my example using vision mode, the LLM has a hard time translating the exact coords for the skip button.


  
  Your browser does not support the video tag.


Maybe a little extra prompting could help here, but accuracy in general is a real concern. A 1k snapshot isn&apos;t more efficient than a 5k snapshot if the model has to make 6 additional follow up calls because it missed the exact element.

It all depends.

### The Accessibility Tree

Luckily, there&apos;s a third option. Chrome exposes a complete [accessibility tree](https://developer.chrome.com/blog/full-accessibility-tree) which represents how the page appears to assistive devices.


  
  Your browser does not support the video tag.


This representation is lighter and much more semantic and is baked right into the browser so no extra parsing is needed.

And Playwright exposes this with a very helpful [ariaSnapshot](https://playwright.dev/docs/api/class-locator#locator-aria-snapshot) method. If we revisit our first example, when Claude calls the `browser_snapshot` tool, the response it gets back is an aria snapshot.

&lt;div&gt;

```yaml
- generic [ref=s2e3]:
    - generic [ref=s2e5]:
        - generic [ref=s2e7]:
            - generic [ref=s2e9]:
                - generic [ref=s2e11]:
                    - generic [ref=s2e12]:
                        - img &quot;FactorioLab&quot; [ref=s2e14]
                        - generic [ref=s2e15]:
                            - text: Welcome to
                            - text: FactorioLab
                            - text: &quot;! Let&apos;s get to work.&quot;
                        - generic [ref=s2e18]:
                            - combobox &quot;options.game.factorio&quot; [ref=s2e19]:
                                - generic [ref=s2e20]: Factorio
                            - button &quot;dropdown trigger&quot; [ref=s2e23]
                        - generic [ref=s2e27]:
                            - combobox &quot;Space Age&quot; [ref=s2e28]
                            - button &quot;dropdown trigger&quot; [ref=s2e29]
                        - button &quot;Add items&quot; [ref=s2e32]: Add items
                        - button &quot;Add machines&quot; [ref=s2e34]: Add machines
                        - generic
                        - generic
                        - separator
                        - button &quot;Load last state&quot; [ref=s2e43]: Load last state
                        - separator
                        - button &quot;Help me get started&quot; [ref=s2e48]: Help me get started
                        - button &quot;Skip&quot; [ref=s2e50]: Skip
                        - generic [ref=s2e52]:
                            - generic [ref=s2e53]:
                                - checkbox
                            - text: Don&apos;t show this screen again
            - text: FactorioLab 3.13.4
    - generic:
        - generic
```

&lt;/div&gt;

and it&apos;s all roughly 1,301 characters / 362 tokens.

So Playwright made a very smart call to make this the default mode for generating snapshots for the LLM to see what&apos;s on the page. For most cases, we probably don&apos;t need vision snapshots. The reality is that It Depends. Mostly on the complexity of the layout and the task at hand.

## Refs

Aria snapshots might be a great way to &quot;see&quot; the page, but now there&apos;s a new problem with how to interact with the page. The LLM can&apos;t simply just say `Click button &quot;dropdown trigger&quot;`. There could be multiple elements that match criteria. The accessibility tree is an abstraction and so we need a translation step somewhere. Traditionally, E2E tests are written using very specific selectors, either `data-testid` attributes or unique `ids` to solve this problem. To make it even more difficult, elements like `` can show up in the accessibility tree as `button`.

The LLM doesn&apos;t know anything about the actual markup, css, attributes, or xpath. Including any of those things would cost input tokens. Trying to click the wrong selector can be costly--not only does it waste time, it also adds another message to the conversation history, just like when the vision model gets the wrong coordinates.

To make the accessibility tree truly useful to the LLM, we need unique selectors so the LLM can accurately interact with elements with precision. One solution--before returning the snapshot, we could create our own unique IDs and append them to each element using a custom `aria-ref` attribute and return them in the snapshot.

As of [v1.52](https://playwright.dev/docs/api/class-locator#locator-aria-snapshot-option-ref) , this is exactly what Playwright does under the hood for you. If you noticed in the above example, every interactive node contains a `ref=sXeXXX` id. These are much more lightweight than trying to use actual CSS or xpath.

### A note about other tools

Using aria snapshots isn&apos;t anything new or specific to Playwright. Other similar tools like [Puppeteer](https://pptr.dev/api/puppeteer.accessibility.snapshot) also include `snapshot()`methods. The key difference is that Playwright exposes this synthetic ref system for you out of the box, which makes it a clear winner for our use case of LLM automation. I can almost guarantee other tools will be adopting the same feature in the future.

## Time to build

Now that we understand what&apos;s going on, we can begin to think about building our own version. Because we have no control over the context when using MCP clients like Claude Desktop, we won&apos;t be able to complete a full end-to-end flow without building our own agent. It&apos;s clear--we need more control over managing the context and determining which mode to execute. Also, it&apos;d be great to run this headless.

To recap:

![Line graph showing the relationship between layout complexity and token cost for two snapshot methods. The &apos;Screenshot&apos; line remains flat across all complexity levels, indicating a constant cost. The &apos;Aria Snapshot&apos; line increases steadily, showing that token cost scales with layout complexity](/assets/blog/complexity-vs-cost.png)

- Screenshots are cheap, but can be inaccurate to action on.
- Aria snapshots can be expensive in tokens if the layout is more complicated, but can have more reliably accuracy.
- Playwright&apos;s built-in `ref` support makes it the best browser automation tool for the job.

In part two, we&apos;ll create an actual agent loop, hook up our own playwright MCP server, and figure out how to optimize our API calls. Spoiler alert--[aria snapshots probably aren&apos;t going to be enough](https://github.com/lxe/chrome-mcp/blob/master/src/runtime-templates/ariaInteractiveElements.js)

[^1]: You can use tools like [OpenAI&apos;s tokenizer](https://platform.openai.com/tokenizer) to visually count tokens.</content:encoded></item><item><title>Testing Pixel Perfection in Figma and Code</title><link>https://cianfrani.dev/posts/testing-pixel-perfection-in-figma-and-code/</link><guid isPermaLink="true">https://cianfrani.dev/posts/testing-pixel-perfection-in-figma-and-code/</guid><description>A Web Component to add Figma overlays in code</description><pubDate>Sat, 01 Oct 2022 00:00:00 GMT</pubDate><content:encoded>![A demo of the ftl-belt component](/assets/blog/ftl/demo.gif)

&quot;Pixel Perfect&quot;. The Holy Grail. The Dream. The ability to take an asset designed in ~~Photoshop~~, ~~Sketch~~, Figma and implement in code is the superpower of frontend developers. Browsers are weird. Design tools are weird. Our eyes are weird. &quot;Pixel Perfect&quot; is hard. The legacy of &quot;Pixel Perfect&quot; is a million little JIRA tickets titled &quot;Change 3px to 2px.&quot; Death by a thousand paper cuts. The reality is we&apos;re often only [Pixel Pretty Close](https://www.joshwcomeau.com/css/pixel-perfection/#pixel-pretty-close).

When you&apos;re developing Design Systems, every pixel has its purpose. Being off by one can break layouts, applications, and hearts. So how does one validate pixel-perfection without screeching productivity to a halt to examine every commit under a magnifying glass?

While developing [Astro UXDS](https://www.astrouxds.com), we wanted a fast method for checking our components in code against their Figma counterparts. This want quickly became a need when we undertook a massive effort to implement spacing design tokens. Our solution is [Figma Testing Library](https://github.com/RocketCommunicationsInc/figma-testing-library).

Figma Testing Library is a lightweight Web Component that pulls in your designs from Figma and overlays them right on top of your code. Because it&apos;s a Web Component, it can be dropped into any dev environment--plain ol&apos; HTML, React, Angular, Vue with only a few lines of code. It uses the Figma REST API to pull in any layer you want from Figma.

## The Manual Way

One method to validate that your code is 100% pixel-perfect to your design is to use the overlay method, as described by David Luhr in [Behind the scenes of creating the official Figma kit for Tailwind UI](https://www.figma.com/blog/behind-the-scenes-of-creating-the-official-figma-kit-for-tailwind-ui/). The process involves:

1. Opening up Dev Tools
2. Selecting your component element
3. Right click -&amp;gt; Capture Node Screenshoot
4. Grabbing the screenshot and importing over your Figma component
5. Aligning the screenshot and toggling the opacity.

![Screenshot of Figma with a button component and a screenshot of a button in code](/assets/blog/ftl/button-screenshot.png)

This process works great, but it has two major issues.

1. It&apos;s incredibly time-consuming.

Imagine having to do this for every component in your library, including every variant and every state. It can quickly get out of hand. This process works great for full comps, but not so great for individual components.

2. It assumes code is the source of truth.

This all depends on your use case. The overlay method works great if your source of truth is code and you&apos;re trying to recreate it within Figma. But if your source of truth is Figma and you&apos;re trying to recreate it in code, you&apos;ll find yourself having to capturing screenshots over and over as you iterate over that last begrudging 1px difference.

## A (somewhat) better way

Instead of overlaying code screenshots in Figma, what about overlaying Figma screenshots in code? You can easily export any layer from Figma to png. You could take that png, import it into your project, and position it on top of your code. Instead of manually exporting from Figma, you could use the Figma REST API to automatically pull in the latest version of a component. Next, you might find yourself wanting to switch between code and Figma quickly or control the level of opacity on the overlay.

Figma Testing Library is a collection of two Web Components that do exactly that for you.

## Getting Started

In your project, open up your index.html and plop this import in:

```html

```

This will import and register the Figma Testing Library web components straight from a CDN.

### Setup the holster

The `` component wraps around your application code, overlays your Figma image, and provides a few useful UI controls.

Let&apos;s say you have a button component you want to test:

```html
My button
```

Simply wrap your component in a ``

```html

	My button

```

Next, you&apos;ll need to supply the holster with a few pieces of data:

&amp;gt; FULL DISCLAIMER: This component needs a Figma access token to use the REST API. Think twice before blindly pasting your access token into someone else&apos;s code. Review the source on [Github](https://github.com/RocketCommunicationsInc/figma-testing-library). Review the code from the CDN. This tool should only be used locally.

1. A Figma Access Token
   1. Login to your Figma account.
   2. Head to the account settings from the top-left menu inside Figma.
   3. Find the personal access tokens section.
   4. Click Create new token.
   5. A token will be generated. This will be your only chance to copy the token, so make sure you keep a copy of this in a secure place.
2. Your Figma file ID
   1. You can find your Figma file&apos;s ID by inspecting the URL: `https://www.figma.com/file//my-file`
3. The Node ID of the Figma layer you want to overlay.

Now I admit this process is less than ideal and future revisions may improve the DX.

Every Figma layer has it&apos;s own unique node id. You can find the node id for any given layer simply by using Figma in the browser and opening up Dev Tools. In the console, you have full access to the Figma&apos;s Plugin API using the `figma` object. (So cool). To find the node id, select a layer and enter:

`figma.currentPage.selection[0].id`

You&apos;ll get something back that looks like &quot;467:993&quot;.

Node IDs rarely change so you should only need to go through this process once. You can also copy a Figma file to a local draft and it will retain the same node IDs.

![A screenshot of Figma with the console open returning a node id](/assets/blog/ftl/node-id.png)

Pass that data into the `ftl-holster` component:

```html

	My button

```

Open up your browser and you should see your holster in all its glory

The holster has three modes: **Toggle**, **Overlay** and **Side By Side**.

### Toggle Mode

![Screenshot of ftl-holster in Toggle Mode](/assets/blog/ftl/ftl-toggle-mode.png)

Toggle Mode exposes a single switch to quickly swap between the Figma component and the code component. In Toggle Mode, you can use your browser&apos;s dev tools as you normally would to make those problematic 1px adjustments in real time.

### Overlay

![Screenshot of ftl-holster in Overlay Mode](/assets/blog/ftl/dialing.gif)

Overlay Mode positions the Figma component on top of your code component. It has a range input to control to the level of opacity.

### Side By Side

![Screenshot of ftl-holster in Side By Side Mode](/assets/blog/ftl/ftl-side-by-side-mode.png)

Side By Side Mode will show you both Figma and Code at the same time.

## Many Holsters, Handle It

Naturally you might want to test many components at once. You can use the optional `` component to control all of your holsters from a single location. `` also exposes input fields for your access token and file id in your UI so you don&apos;t have to add them manually to every holster or accidentally commit them in your repository.

When an `` is used, the `access-token` and `file-id` properties on `` become optional.

```html

 
  Hello
 
 
  Hello
 

```

On Astro UXDS, we created a series of pages for each component and all of their variants, set up to reflect Figma 1:1. We have these committed in our repository (without the access token) with the import commented out. If the Web Component isn&apos;t registered, `ftl-holster` does nothing and only displays the component code. When someone wants to spot check a change in Figma, it&apos;s as simple as adding the import and entering an access token.

## A Note About Text Rendering

Browsers and Figma render text differently and because browsers and Figma render text differently, you may find it difficult to achieve the pixel-perfect dream. If you&apos;re totally adamant on achieving pixel perfection, you can try changing your font to a local monospace font like Menlo (if you&apos;re on macOS) in your Figma file and in your code and use Safari. Safari seems to render text closer to how Figma itself does.

Take this example: a single `&lt;span&gt;` with `font-size: 20rem;`.
Inspect the width in each browser and you&apos;ll get different results.
In Firefox and Chrome, you might see the width as 231.Xpx. However, Safari, like Figma, will round up to 232px.

&lt;p&gt;
  &lt;span&gt;See the Pen &lt;a href=&quot;https://codepen.io/rocketmark/pen/KKoJrEx&quot;&gt;
  why do browsers render this width differently&lt;/a&gt; by mark (&lt;a href=&quot;https://codepen.io/rocketmark&quot;&gt;@rocketmark&lt;/a&gt;)
  on &lt;a href=&quot;https://codepen.io&quot;&gt;CodePen&lt;/a&gt;.&lt;/span&gt;
&lt;/p&gt;


## PRs Welcome

You can find the full source code for these components on [Github](https://github.com/RocketCommunicationsInc/figma-testing-library).
Reach out to me on twitter ([@markacianfrani](https://twitter.com/markacianfrani)) if you found this useful or have any ideas for improvement.&lt;/span&gt;</content:encoded></item><item><title>Where are all the read-only radio buttons?</title><link>https://cianfrani.dev/posts/where-are-all-the-read-only-radios/</link><guid isPermaLink="true">https://cianfrani.dev/posts/where-are-all-the-read-only-radios/</guid><description>Why aren&apos;t read-only radio buttons more common?</description><pubDate>Thu, 26 Dec 2024 00:00:00 GMT</pubDate><content:encoded>I start this adventure with a question and a seed of doubt: &quot;Why are read-only radio controls so uncommon in design systems?&quot; What even is read-only? When do I use it? Can&apos;t I just use disabled? Could HTML be flawed or am I just bad at my job?

At the end of this, I don&apos;t have any definitive answers other than &quot;it depends and this is what I landed on&quot;. Accessibility is insanely hard. I am not an expert. If anything in here appears factually incorrect, **please** reach out and tell me: &lt;a href=&quot;mailto:accessibility@cianfrani.me&quot;&gt;accessibility@cianfrani.me&lt;/a&gt;

## The Setup

Consider these product requirements:

- You have a multi-step form. You have an assortment of controls: text inputs, radio buttons, custom radio &quot;cards&quot;, checkboxes, and select menus, but radios are predominately more common.
- After a user submits a question, they cannot edit their response.
- A user can go backwards and forwards and should be able to **view** their response.
- Prefer native HTML wherever possible. Write as little javascript as humanly possible.

As a designer--you need some way to communicate that a field cannot be interacted with.

As a developer--you need some way to prevent the field from being interacted with.

![A radiogroup with three radio controls. The user checks the first option &quot;Basic Plan&quot; and clicks submit. On submit, the field becomes disabled.](/assets/blog/basic-form-disable-on-submit.gif)

The easy way--&quot;just slap a `disabled` attribute on it! Our design system even gives this nice little grey-washed visual to us for free.&quot;

Not quite. For keyboard-only users, `disabled` is the most scorched-earth approach you can take. The element is no longer editable because it no longer exists as far as they&apos;re concerned. Disabled elements lose their tabindex, therefore they cannot be returned to after submitting the form. While some screen readers may still be able to navigate and announce disabled form controls, they won&apos;t be able to use &lt;kbd&gt;Tab&lt;/kbd&gt; to navigate.

In this example, I expect to be able to return focus to the radios using only the keyboard:

![a radiogroup with three radio controls. The user selects the first option by pressing spacebar, then tabs to the submit button and submits the form, then tries to tab backwards but the controls do not receive focus](/assets/blog/basic-form-keyboard-nav-disabled.gif)

As part of the requirements, a user must be able to view previously-submitted responses and they must be able to navigate through them using &lt;kbd&gt;Tab&lt;/kbd&gt;. In other words, the radio controls cannot be edited, but they still need to be focusable.

![a radiogroup with three radio controls. The user selects the first option, tabs to the submit button, then tabs backward and selects different options.](/assets/blog/basic-form-keyboard-nav.gif)

So, how can we implement this in the most accessible way?

If we can&apos;t use the `disabled` state, is there anything else? What about that read-only thing?

## _What_ only?

&quot;Read-only&quot; is a weird, often-forgotten state. It&apos;s sometimes seen next to the disabled state, if at all. Visually, it can share similar visual style (though it really should have it&apos;s own unique visual style). A form control is considered to be in a read-only state when it is present, but the user cannot edit the value. Unlike disabled, read-only components are **still focusable**.

## Implementation

&amp;gt; The Boolean [readonly attribute](https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/readonly), when present, makes the element not mutable, meaning the user can not edit the control.

```html
Name: 
```

So I can just slap a `readonly` attribute on my input and that&apos;s it?

Not quite. This is where things get fun.

For text inputs, sure this works.

For basically everything else, like radios:

```html

	Select your starting hero
	
	Beren - The mortal who braved Morgoth&apos;s halls
	
	Eöl - The Dark Elf, smith of Nan Elmoth
	
	Haleth - Leader of the Haladin, unyielding in the wild

```

Wrong.

According to [MDN](https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/readonly):

&amp;gt; **Note:** Only text controls can be made read-only, since for other controls (such as checkboxes and buttons) there is no useful distinction between being read-only and being disabled, so the `readonly` attribute does not apply.

This doesn&apos;t work nor is it [valid](https://validator.w3.org/), but it&apos;s more or less the API that we want. Semantically, we want a read-only state.

## Why does read-only only apply to text inputs?

What does that mean? &quot;no useful distinction between being read-only and being disabled&quot; Given our requirements, there&apos;s absolutely a useful distinction between being read-only and disabled. We want to preserve the tab order. If I previously checked Beren, I want to know that.

Let&apos;s peel back the onion another layer.

For this to make sense, we need to talk about two different concepts--[value and checkedness](https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#concept-fe-checked).

&amp;gt; Most form controls have a value and a checkedness...These are used to describe how the user interacts with the control.

```html

Sign up to the mailing list unless you hate free money.
```

A checkbox can have a value of &quot;Sign up to the mailing list unless you hate free money&quot;. If I click it, the _value_ stays the same--it&apos;s _checkedness_ is what changes, whether its checked or unchecked. Same goes for radios. For text, checkedness doesn&apos;t apply and it&apos;s always false.

So by that definition, the value of a checkbox or radio doesn&apos;t change. That&apos;s my only guess as to why read-only doesn&apos;t apply.

## The story so far

A blanket approach of `disabled` or `readonly` won&apos;t work. Radio buttons seem to be the most problematic so we&apos;ll focus on just these specifically. Are there any aria-attributes we could consider?

### aria-readonly

There&apos;s [aria-readonly](https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Attributes/aria-readonly), but [support seems iffy at best](https://adrianroselli.com/2022/11/brief-note-on-aria-readonly-support-html.html) and also:

&amp;gt; It does not belong on native HTML elements, but if used then the browser should override it by the native `readonly` attribute. It is meant for non-native implementations of the roles `checkbox`, `combobox`, `grid`, `gridcell`, `listbox`, `radiogroup`, `slider`, `spinbutton`, and `textbox`.

If radio inputs don&apos;t support `readonly`, they also don&apos;t support `aria-readonly`.

### aria-disabled

Next, there&apos;s [aria-disabled](https://www.w3.org/TR/wai-aria/#aria-disabled)

&amp;gt; Indicates that the [element](https://www.w3.org/TR/wai-aria/#dfn-element) is [perceivable](https://www.w3.org/TR/wai-aria/#dfn-perceivable) but disabled, so it is not editable or otherwise [operable](https://www.w3.org/TR/wai-aria/#dfn-operable). See related [`aria-hidden`](https://www.w3.org/TR/wai-aria/#aria-hidden) and [`aria-readonly`](https://www.w3.org/TR/wai-aria/#aria-readonly).

That sounds exactly like what we want.

But wait, [MDN says:](https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Attributes/aria-disabled)

&amp;gt; the aria-disabled=&quot;true&quot; only semantically exposes these elements as being disabled. Web developers must manually ensure such elements have their functionality suppressed when exposed to the disabled state.

We keep our tab index, but we lose the interaction prevention.

It seems like we&apos;re playing a game of whack-a-mole here. Let&apos;s recap our four options:

### Disabled v Readonly v Aria-disabled v Aria-readonly

| **Feature**                     | `disabled`                                                                                                                                                                     | `readonly`                                                                                                                                                                     | `aria-disabled`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | `aria-readonly`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Purpose**                     | When present, makes the element not mutable, focusable, or even submitted with the form.                                                                                       | When present, makes the element not mutable, meaning the user can not edit the control.                                                                                        | Indicates that the element is perceivable but disabled, so it is not editable or otherwise operable.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Indicates that the element is not editable, but is otherwise operable.                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| **Applicability**               | [Form controls like ``, ``, ``, `</content:encoded></item><item><title>Web Component Best Practices</title><link>https://cianfrani.dev/posts/web-component-best-practices/</link><guid isPermaLink="true">https://cianfrani.dev/posts/web-component-best-practices/</guid><description>Lessons learned after writing Web Components every day for one year.</description><pubDate>Mon, 14 Mar 2022 00:00:00 GMT</pubDate><content:encoded>Writing Web Components is hard. Writing good Web Components is really hard. After spending the last year building out the [AstroUXDS Web Components](https://www.astrouxds.com), I&apos;ve learned a hard truth that a really good React/Vue/Angular/FoobarJS component does not make a really good web component. For those who are first jumping into the pool of Web Components, here is the guide that I wish I had one year ago.

&amp;gt; NOTE: A lot of this deals mostly in the context of implementing design systems.

## Why are you doing this?

The promise and allure of Web Components can be all too tempting. Being able to write and maintain a single code base that can be used across any framework is something that speaks to everybody almost immediately. However, Web Components are not a panacea. They require an entirely new discipline and frame of thinking. A lot of people will tell you Web Components are great: &quot;look at how easy it is to ship a button component, fully encapsulated with your Design System&apos;s styles!&quot; What they don&apos;t tell you is now you have to figure out how to get your button to interact with forms properly or handle accessibility.

When you choose to write a web component, you&apos;re taking on the fully responsibility of having to think through every possible use case and scenario, while simultaneously juggling developer experience, user experience, and maintainability. Be prepared to think through every minute detail. Failure to do so will result in angry users because the Shadow DOM is unforgiving. Often times the developer will have no mechanism to solve the problem themselves.

Remember we are writing custom (HTML) elements. These atoms need to be flexible enough to create the universe.

## Anatomy of a Web Component

![crude web component API diagram](/assets/blog/anatomy.png)

To write a good web component, you need a solid understanding of all of the available APIs at your disposal. You will be constantly juggling between extensibility vs. ease of use. For any given feature, you should think carefully about which API to use.

If you&apos;re coming from a framework mindset, you might already be familiar with slots, props, and events. Web Components give us two additional unique APIs--CSS Custom Properties and CSS Shadow Parts. **Your component&apos;s styling is now it&apos;s own API**. Use this to your advantage.

### Slots

- ✅ Extremely flexible
- ❌ Add complexity to component code
- ❌ Require developers to write more boilerplate

Slots are probably the most powerful API for extendibility because they sit outside Shadow DOM and can contain any custom HTML.

### Properties/Attributes

- ✅ Easy to use
- ✅ Familiar to users
- ❌ Not very flexible

Properties and attributes are the most familiar concepts, usually used for controling things like state. However, they are the least flexible when it comes to allowing custom content.

For example:

```html

```

This works great if all you need is to display a basic string. But what if I wanted to pass in my own HTML? Maybe I want to throw in an svg icon or a whole form. I can&apos;t fit all of that in as just a string. This component wouldn&apos;t be very useful to me.

### Methods

- ✅ Provide unique functionality
- ❌ Require JavaScript

Public methods are great if your component has some kind of action that it can perform. A good example would be a typical Modal component that might have `show()` and `hide()` methods. Simply using an `open` prop in this case might not be enough for a developer looking to do something _after_ the modal has been opened because it may not be available yet. Instead, they would need to use the modal&apos;s `show()` method which could return a promise that would resolve once it has finished opening.

### CSS Custom Properties

- ✅ Flexible
- ❌ Bad DX if used carelessly

CSS Custom Properties are one of two ways for allowing developers to pierce the Shadow DOM. Remember doing `my-button { background: red; }` won&apos;t do anything because of Shadow DOM encapsulation. But if you used a CSS Custom Property to control your background color, developers could do something like `--button-bg-color: red;`.

In the early days, CSS Custom Properties were the only way for developers to customize the styling of a Web Component. This lead to a lot of early adopters adding an absurd amount of CSS Custom Properties. `--button-border-radius`, `--button-text-color`, `--button-font-family`, etc, etc. [Custom Properties for nearly every imaginable CSS property. It was a hot mess.](https://css-tricks.com/styling-in-the-shadow-dom-with-css-shadow-parts/#aa-styling-shadow-dom-with-part) Luckily we got a better way--CSS Shadow Parts.

But CSS Custom Properties still have their place:

&amp;gt; CSS variables are scoped to the host element and can be reused throughout the component. A good example of a CSS variable would be `--border-width`, which might get reused throughout a component to ensure borders share the same width for all internal elements. - [Shoelace - When To Use a CSS Custom Property](https://shoelace.style/resources/contributing?id=when-to-use-a-css-custom-property-vs-a-css-part)

### CSS Shadow Parts

- ✅ Extremely flexible
- ❌ Maintainability can suffer if used carelessly
- ❌ Require developers to write more boilerplate

CSS Shadow Parts solve the problem of &quot;how do I style XYZ&quot;. They allow you to define the &quot;parts&quot; that your custom element is composed of. Channel your inner Zeldman. Shadow parts should have some semantic meaning. They should represent an abstract piece of your component. Because these are part of your API, you need to be careful in what you expose publicly.

**Sometimes the answer to &quot;how do I style XYZ&quot; is, &quot;you don&apos;t&quot;.** Maybe you don&apos;t want the background color to be allowed to be anything. Instead, you might expose a property that can only accept a few whitelisted options.

- Part names should be consistent across all components wherever possible.
- Shadow parts cannot be nested.
- Shadow parts can only be single elements. `my-componet::part(base) &amp;gt; svg { display: none; }` won&apos;t work.

Avoid making every element a part if possible. Once an element is a shadow part, it will require a breaking change to alter the markup later. See [when to create CSS parts](https://www.abeautifulsite.net/posts/when-to-create-css-parts/) for much greater detail.

If your component is small enough (atom level), you may end up with every element having its own shadow part and that is totally okay.

### The Right Tool

Now let&apos;s take a very simple feature--we need to write a button component that can show two different variants: primary and secondary. How might we implement this?

#### With Props

```html
 
```

#### With a Method

```html
const el = document.querySelector(&apos;my-button&apos;) el.setType(&apos;primary&apos;)
el.setType(&apos;secondary&apos;)
```

#### With CSS Custom Properties

```css
my-button {
  --button-background-color: var(--color-primary);
  --button-border-color: var(--color-primary);
  --button-text-color: var(--color-text);
  // + all hover, active, focus states sheesh
}
```

#### With CSS Shadow Parts

```css
my-button::part(container) {
  background-color: var(--color-primary);
  border-color: var(--color-primary);
  // etc etc
}
```

Here are four different ways we can expose a given feature. A prop is clearly the winner in terms of ease of use. But now imagine what if we wanted to allow more than just two colors? What if we wanted to allow any color, as long as it is defined in the design system? We would need to add another 30+ prop options.

The point is there is no single best answer for which API to use when. It’s a matter of deciding what you want to allow and what the best DX would be.

### Opinionated Best Practices

#### 1 . Be Declarative - Avoid arrays and object attributes

Remember we are writing custom HTML elements. Our components must be usable in the browser, without a framework, without JavaScript. Think of this use case as your lowest common denominator. My personal litmus test: &quot;would a teenager be able to use this element on their MySpace page?&quot;

So let&apos;s consider a basic List component. Your first pass might look something like:

```html


```

This works nicely if you&apos;re using a js framework to do the heavy lifting for your data binding. But if you&apos;re using plain HTML, you&apos;re now forced to write some javascript:

```js
const data = [...]
const el = document.querySelector(&apos;my-list&apos;)
el.data = data
```

Now what if you wanted the list items to be links? Or include an icon? What if you wanted every third item to open a modal and every tenth item to navigate to a page?

Back to the drawing board.

```html

	Item 1
	  Item 2 

```

By creating a new `my-list-item` component, suddenly we are much more flexible and can avoid the unending series of &apos;what if&apos; questions.

If you must use arrays or objects, [make sure to accept them only as properties](https://developers.google.com/web/fundamentals/web-components/best-practices#aim-to-only-accept-rich-data-objects,-arrays-as-properties.) and do not reflect them as attributes for performance reasons.

In the words of Kent C Dodds, [avoid soul crushing components](https://epicreact.dev/soul-crushing-components/).

#### 2. Don‘t style attributes

```html

```

```css
my-component {
	display: none;
}

my-component[open] {
	display: block;
}
```

For this example to work, you need to be extra careful that you are reflecting your `open` attribute correctly. If someone were to change the `open` property and you forget to reflect it to the attribute, your component will break and this can be very difficult to debug.

Instead, use internal classes and style those.

#### 3. :host styles are sacred

Be careful when styling :host. Anything you put here will NOT be shadow dom encapsulated and thus, can be changed by the developers using your component. :host styles are generally best for default properties like `display` .

#### 4. (Try to) fail silently

Does `` throw an error if you try and pass in an `&lt;h2&gt;` as a child? No. HTML fails silently. We should also treat the console as sacred as well and do our best job not to pollute it with unnecessary warnings and errors.

Throw errors only when you absolutely cannot continue. If you&apos;re throwing an error, take a second to pause and consider why and make sure that you have a good reason. Sometimes they are unavoidable though.

&amp;gt; On AstroUXDS, we generally like to reserve warnings for deprecating warnings only. But this is just an opinionated stye decision.

#### 5. Data Flow - Props Down, Events Up

The traditional wisdom around data flow remains the same. Props down, events up. Lift state up. Whatever you want to call it. If two sibling components need to talk to each other, they probably need a parent mediator component.

#### 6. Steal Code. (I&apos;m not a lawyer)

Seriously. The web today is the result of a generation right-clicking &quot;view source&quot; and &quot;assimilating&quot; what others have done. That&apos;s how we got to where we are now. That’s why the web is the most democratizing platform. The idea of sharing and openness is baked right in to your browser. If you don&apos;t personally have an anecdote about trying to create a website for your band in middle school by copy and pasting some piecemeal HTML you&apos; found somewhere, I guarantee you probably know at least one person who does.

So stand on the shoulder of giants and don&apos;t reinvent the wheel and all those other cliches. When you encounter a problem, go look at how other people have solved it. Pick the one you like the most. (Forms, for example, were a fun one).

Some of the best resources that I&apos;ve found are:

- [Shoelace](https://shoelace.style/) - Quite possibly the gold standard of web component libraries. A lot of these best practices have been adapted from Shoelace’s own [Best Practices](https://shoelace.style/resources/contributing?id=best-practices). I encourage you to read this in full multiple times. My entire foundation of what makes a great web component has come from reading through Shoelace’s source.
- [Ionic](https://ionicframework.com/docs/components) - One of the very few early adopters and champions for web components. Completely battle-tested. The amount of eyes they have on their components is insane. Superior DX and a perfect case study on how web components can serve developers of all frameworks.
- [Spectrum Web Components](https://opensource.adobe.com/spectrum-web-components/)Adobe&apos;s Design System, web component flavored.
- [OpenUI](https://open-ui.org/) Not a library but one of the most valuable resources when designing a net new component. My go to for inspiration on the mundane task of what to actually name things, what are the expected props, etc.
- [MDN](https://developer.mozilla.org/en-US/docs/Web/HTML) - For inspiration, return to the classics. If you’re building a custom element that already exists, it’s generally a good idea to default to the behavior of the native element. Building web components gave me a new appreciation for HTML.
  &amp;gt; Tip: in Chrome Dev Tools, you can turn on ‘show user agent shadow dom’ to see the shadow dom of all your favorite classic elements.
- [Web.dev’s Custom Element Best Practices](https://developers.google.com/web/fundamentals/web-components/best-practices) - Another great general list of best practices.&lt;/h2&gt;</content:encoded></item><item><title>The Patterns Shall Set You Free</title><link>https://cianfrani.dev/posts/the-patterns-shall-set-you-free/</link><guid isPermaLink="true">https://cianfrani.dev/posts/the-patterns-shall-set-you-free/</guid><description>Patterns over components. Always.</description><pubDate>Wed, 04 Mar 2026 00:00:00 GMT</pubDate><content:encoded>There&apos;s this idea that&apos;s rapidly gaining popularity as AI continues to evolve that all SaaS products are dead--that if someone wants a ToDo app then they can just vibe code their own app in minutes. And yeah, maybe your grandma isn&apos;t going to build her own recipe app, but there&apos;s something fundamentally true about this idea. Most of the software we use is bloated. We use it because it exists and we use maybe 10% of its features and pay for 100%.

As Maggie Appleton puts it:

&amp;gt; we’re in the industrial, high-modernism age of software, where these standardized, one-size-fits-all apps are made for us by people who don’t know much about us.

Source: [Maggie Appleton, &quot;Home-Cooked Software&quot;](https://maggieappleton.com/home-cooked-software)

Even as I write this in Obsidian, software that&apos;s praised for being minimalistic, I use exactly zero of these features nor do I even know what they do.

&lt;figure&gt;
	
&lt;/figure&gt;

The idea of users building their own software isn&apos;t new. Going back all the way to 1987, Kent Beck and Ward Cunningham wrote:

&amp;gt; We propose a radical shift in the burden of design and implementation, using concepts adapted from the work of Christopher Alexander....[who] proposes homes and offices be designed and built by their eventual occupants. These people, he reasons, know best their requirements for a particular structure. We agree, and make the same argument for computer programs. Computer users should write their own programs. The idea sounds foolish when one considers the size and complexity of both buildings and programs, and the years of training for the design professions. Yet Alexander offers a convincing scenario. It revolves around a concept called a &quot;pattern language.&quot;

Source: [Kent Beck and Ward Cunningham, OOPSLA &apos;87](https://c2.com/doc/oopsla87.html)

So what happened? Ordinary computer users never got around to building their own software. At best, we have the Power Users who could orchestrate their own complex workflows but the spirit of the idea, that the street seller in Turkey, or the doctor in Tunisia, would write their own user interfaces never quite took off. Instead, we pay people to architect for us and usually it&apos;s pretty good or good enough. We can adapt our workflow. We can constrain our business into a set of predefined form fields in someone else&apos;s database because ultimately, the cost of paying for 10% of the features outweighs the time to learn Smalltalk. We smooth down all the edges of our products so that they can fit inside the shipping container.

How does this change when that cost to learn X becomes zero? When AI can build whatever you need on demand? Without another &quot;radical shift&quot; in design, we&apos;re stuck with just building faster horses. That street seller in Turkey can build their own software today, but they can only build what fits inside the shipping container. The constraint is no longer code, but the legacy of &quot;patterns&quot; that we&apos;ve adopted and are now shackled to.

&lt;figure&gt;
	
	&lt;figcaption&gt;
		The old PlayPlace era before everything looked standardized.
	&lt;/figcaption&gt;
&lt;/figure&gt;

Decades of &quot;atomic&quot; design and &quot;modular&quot; components preached constraints that no longer need apply.

We used to have 4 browsers that rendered text inputs different and they all were ugly. So we built components. We moved our native apps to electron. It was all easier to maintain one and have it work consistently. Eventually the world evolved. Browsers got better. IDE tooling got better. But we&apos;ve invested too much in our libraries and frameworks that it&apos;s difficult to throw it all away. 10k lines of code can now be reduced to &quot;make all the buttons and text fields line up&quot;.

&amp;gt; The architects and planners and bankers have pattern languages which tell them to build gigantic steel and concrete buildings. The users have a few shattered patterns left in their vocabulary: a sheet of plastic to make a kitchen counter; huge plate glass windows for the living room; wall-to-wall carpet in the bathroom--and they enthusiastically piece these patches together, whenever they have a free weekend.
&amp;gt;
&amp;gt; But the remnants of our former languages are dead and empty.
&amp;gt;
&amp;gt; They are based mainly on the by-products of industry. People use plate glass windows, formica counters, wall-to-wall carpet, because industry makes them available, not because these patterns contain anything essential about life, or how to live it.
&amp;gt;
&amp;gt; The time when a pattern language was a song, in which people could sing the whole of life, is gone. The pattern languages in society are dead. They are ashes and fragments in the people&apos;s hands.

Source: Christopher Alexander, _The Timeless Way of Building_

In the war between Components and Patterns, components won because they were easy to digest and the legacy of that war has been papercuts all the way down, whether that&apos;s your [design system team needing to add a fourth variant for product](https://cianfrani.dev/posts/the-legoism-of-design-systems/), or the 1001st GitHub issue on your [open-source shitty coding agent](https://shittycodingagent.ai/). Solutions that were designed by anyone other than you will rarely be as good as what you could design yourself. If only you had the time to learn Smalltalk.</content:encoded></item><item><title>Using LLMs to Dogfood Your Design Tokens</title><link>https://cianfrani.dev/posts/using-llms-to-dogfood-your-design-tokens/</link><guid isPermaLink="true">https://cianfrani.dev/posts/using-llms-to-dogfood-your-design-tokens/</guid><description>The one where we let Claude use our design tokens</description><pubDate>Sun, 08 Mar 2026 00:00:00 GMT</pubDate><content:encoded>Here&apos;s a wild idea--ship your design tokens, not your components. The value add that design systems promised, this notion that it [costs a million dollars to write button components](https://medium.com/eightshapes-llc/and-you-thought-buttons-were-easy-26eb5b5c1871), is no longer the reality we are living in. Modern LLMs can generate a button. They can generate a lot more than that too. So why are we still clutching our pearls trying to ship component libraries?

We need not anymore. An LLM can generate a button, it can&apos;t generate a button that looks like your brand, though you&apos;ll be surprised how close it can get just by giving it a screenshot. Not without knowing what your brand&apos;s design language is. Luckily, they spent like 5 years creating the [design token json schema](https://www.w3.org/community/design-tokens/) to solve exactly this problem.

Design tokens are the patterns that generate your components. Instead of shipping the component itself, ship the DNA to create it. Let the LLM get creative and work within the system to solve it&apos;s own unique constraints. It might not generate the same thing every time, but that&apos;s the feature. [No two atoms are alike](https://www.wtamu.edu/~cbaird/sq/2014/03/13/are-two-atoms-of-the-same-element-identical), after all. We create far too many constraints in the name of consistency. Why do buttons only have a small, medium, and large size variant? Would a mediumer option kill your design? Or would that &quot;bloat your API&quot; or &quot;take too much time to implement&quot;?

## An Example

Using Figma&apos;s Simple Design System, let&apos;s try and generate a tag component using only tokens. I&apos;ll use their existing Tag component design as my &quot;designer&quot; reference. We&apos;ll use the LLM to generate some designs, then reflect on how to improve the system to steer it closer to our reference. As a designer, replace this reference image with your &quot;taste&quot;. You now have an instant feedback loop. 

![Figma Simple Design System showing tag component with default, danger, and positive variants](/blog/design-tokens-lossy/sds-tag-reference.png)

The cool thing about LLMs is that they can generate endless combinations for us. We no longer need to guess how our tokens could be used. We can see them play out in real time, cherry-pick the worst decisions, improve the system, and iterate.

For this example, I&apos;ll have one constraint--it can only use the provided design tokens.

&amp;gt; using only the @tokens.css generate me three tag components--default, danger, and positive. save to a tag.html file

![LLM-generated tag components with pill-shaped border radius, showing default gray, danger red, and positive green variants](/blog/design-tokens-lossy/llm-first-attempt.png)

This was the first result. A far cry from the reference. But why? If we look at the [official SDS tag reference](https://figma.github.io/sds/storybook/?path=/story/sds-primitives-tags--story-tag), this actually looks a lot closer to the `secondary` variant.

![Official SDS tag secondary variant showing a gray outlined tag style that matches the LLM&apos;s first attempt](/blog/design-tokens-lossy/sds-secondary-variant.png)
Technically the result wasn&apos;t incorrect--I just didn&apos;t specify enough in my prompt. So let&apos;s see what happens if I ask for more variants:

![LLM-generated tag variants including primary filled style with blue background, closer to the target design](/blog/design-tokens-lossy/llm-more-variants.png)

Now I&apos;m getting closer.

Comparing this to the source, where are my biggest gaps? the border radius. the primary bg color and text color.

&amp;gt; reduce the border radius. its way too rounded

![Tag components with border radius reduced too much to 0.25rem, appearing sharper than intended](/blog/design-tokens-lossy/llm-reduced-radius.png)

The result swapped from radius-full (pill) to radius-100(0.25rem). It&apos;s not quite there. It went too hard. &quot;bump it one more&quot;. Notice, this mimics the human workflow of designing with tokens.

![Final tag components using radius-200 token, now matching the official SDS design specification](/blog/design-tokens-lossy/llm-final-result.png)

Perfect. Now it&apos;s using the exact token the official reference has--`radius-200`.

Taking a step back, what happened here? Well, the SDS has a very generic scale for border radii. There&apos;s no semantic layer. If you look at other components, `radius-200` is used consistently in other places like buttons, but they&apos;re not using a semantic token. So if I really cared about this decision and really wanted to enforce it, I would ask &quot;do all components get this border radius?&quot; If that&apos;s the case, then we just found our missing radius default token.

Now if the answer is no &quot;only buttons and tags get this `radius-200`&quot;, the next question is why? what about them makes them different? what about this particular `radius-200` is special? and why do these two components need to share a common value?

The answer, more often than not, is just vibes-based. It&apos;s best not to dwell on it too much. If you don&apos;t have a clear answer, ask your LLM and default to that until you acquire more information.

We can create a new `--radius-default` token now, but that&apos;s not quite enough. When I ask to generate a tag, I still get things like this:

`Shape: --sds-size-radius-full (pill)`

The LLM is defaulting to associating a tag to mean a pill. Again, it&apos;s not wrong at all. If you ask 10 different people to design a tag, you&apos;re gonna get some variations that include a pill shape.

To further cement this design decision, we can create a component level token `--sds-component-tag-border-radius` which aliases the default token. This makes it explicit that our tags are NOT pill shaped.

And this isn&apos;t just adding more noise to appease an LLM[^1]. These two new tokens also benefit future humans that will try to interpret implementing the system and a tag component.

Most importantly, if you do enough of these exercises, you can create a system that can generate the components itself. Going one extreme, if you were to tokenize every property of the tag component, from line height to padding, there&apos;s no need to ship any component in code whatsoever. The LLM can generate it on demand.

## Why components?

![Component code showing multiple variant props like size and color that remain unused in production, representing dead code shipped to users](/blog/design-tokens-lossy/component-unused-variants.png)

Think of all the javascript you ship to support something as simple as a tag. If your design uses a single brand primary variant, all of these other variants lay dormant in code, completely unused. And there&apos;s the maintenance burden and being a complete bottleneck for your system. If someone runs into a bug, they have to put a PR in and hope for the best.

But if they encounter a bug in their LLM-generated component, they own the component and can just as easily fix it themselves.

Yes, there&apos;s the counter argument that LLM-generated components won&apos;t ever be &quot;pixel perfect&quot; to a coded, prefab component. If that matters so much to you, you can just throw more component tokens at the problem. But if you can&apos;t explain why buttons and tags can only use a `radius-200` token and nothing else, then maybe it doesn&apos;t really matter all that much? What are components if not just a locked in set of predefined token combinations? If the design tokens that generated the component are robust enough, well thought out enough, then there is no &quot;wrong&quot; permutation of a tag component. The system won&apos;t allow it.

Instead of focusing so heavily on delivering prefab solutions, work up one level of abstraction and improve the system that generates the components.

[^1]: I wrote this with my own human brain.</content:encoded></item></channel></rss>