Using LLMs to Dogfood Your Design Tokens

Here’s a wild idea—ship your design tokens, not your components. The value add that design systems promised, this notion that it costs a million dollars to write button components, is no longer the reality we are living in. Modern LLMs can generate a button. They can generate a lot more than that too. So why are we still clutching our pearls trying to ship component libraries?

We need not anymore. An LLM can generate a button, it can’t generate a button that looks like your brand, though you’ll be surprised how close it can get just by giving it a screenshot. Not without knowing what your brand’s design language is. Luckily, they spent like 5 years creating the design token json schema to solve exactly this problem.

Design tokens are the patterns that generate your components. Instead of shipping the component itself, ship the DNA to create it. Let the LLM get creative and work within the system to solve it’s own unique constraints. It might not generate the same thing every time, but that’s the feature. No two atoms are alike, after all. We create far too many constraints in the name of consistency. Why do buttons only have a small, medium, and large size variant? Would a mediumer option kill your design? Or would that “bloat your API” or “take too much time to implement”?

An Example

Using Figma’s Simple Design System, let’s try and generate a tag component using only tokens. I’ll use their existing Tag component design as my “designer” reference. We’ll use the LLM to generate some designs, then reflect on how to improve the system to steer it closer to our reference. As a designer, replace this reference image with your “taste”. You now have an instant feedback loop.

Figma Simple Design System showing tag component with default, danger, and positive variants

The cool thing about LLMs is that they can generate endless combinations for us. We no longer need to guess how our tokens could be used. We can see them play out in real time, cherry-pick the worst decisions, improve the system, and iterate.

For this example, I’ll have one constraint—it can only use the provided design tokens.

using only the @tokens.css generate me three tag components—default, danger, and positive. save to a tag.html file

LLM-generated tag components with pill-shaped border radius, showing default gray, danger red, and positive green variants

This was the first result. A far cry from the reference. But why? If we look at the official SDS tag reference, this actually looks a lot closer to the secondary variant.

Official SDS tag secondary variant showing a gray outlined tag style that matches the LLM's first attempt Technically the result wasn’t incorrect—I just didn’t specify enough in my prompt. So let’s see what happens if I ask for more variants:

LLM-generated tag variants including primary filled style with blue background, closer to the target design

Now I’m getting closer.

Comparing this to the source, where are my biggest gaps? the border radius. the primary bg color and text color.

reduce the border radius. its way too rounded

Tag components with border radius reduced too much to 0.25rem, appearing sharper than intended

The result swapped from radius-full (pill) to radius-100(0.25rem). It’s not quite there. It went too hard. “bump it one more”. Notice, this mimics the human workflow of designing with tokens.

Final tag components using radius-200 token, now matching the official SDS design specification

Perfect. Now it’s using the exact token the official reference has—radius-200.

Taking a step back, what happened here? Well, the SDS has a very generic scale for border radii. There’s no semantic layer. If you look at other components, radius-200 is used consistently in other places like buttons, but they’re not using a semantic token. So if I really cared about this decision and really wanted to enforce it, I would ask “do all components get this border radius?” If that’s the case, then we just found our missing radius default token.

Now if the answer is no “only buttons and tags get this radius-200”, the next question is why? what about them makes them different? what about this particular radius-200 is special? and why do these two components need to share a common value?

The answer, more often than not, is just vibes-based. It’s best not to dwell on it too much. If you don’t have a clear answer, ask your LLM and default to that until you acquire more information.

We can create a new --radius-default token now, but that’s not quite enough. When I ask to generate a tag, I still get things like this:

Shape: --sds-size-radius-full (pill)

The LLM is defaulting to associating a tag to mean a pill. Again, it’s not wrong at all. If you ask 10 different people to design a tag, you’re gonna get some variations that include a pill shape.

To further cement this design decision, we can create a component level token --sds-component-tag-border-radius which aliases the default token. This makes it explicit that our tags are NOT pill shaped.

And this isn’t just adding more noise to appease an LLM¹. These two new tokens also benefit future humans that will try to interpret implementing the system and a tag component.

Most importantly, if you do enough of these exercises, you can create a system that can generate the components itself. Going one extreme, if you were to tokenize every property of the tag component, from line height to padding, there’s no need to ship any component in code whatsoever. The LLM can generate it on demand.

Why components?

Component code showing multiple variant props like size and color that remain unused in production, representing dead code shipped to users

Think of all the javascript you ship to support something as simple as a tag. If your design uses a single brand primary variant, all of these other variants lay dormant in code, completely unused. And there’s the maintenance burden and being a complete bottleneck for your system. If someone runs into a bug, they have to put a PR in and hope for the best.

But if they encounter a bug in their LLM-generated component, they own the component and can just as easily fix it themselves.

Yes, there’s the counter argument that LLM-generated components won’t ever be “pixel perfect” to a coded, prefab component. If that matters so much to you, you can just throw more component tokens at the problem. But if you can’t explain why buttons and tags can only use a radius-200 token and nothing else, then maybe it doesn’t really matter all that much? What are components if not just a locked in set of predefined token combinations? If the design tokens that generated the component are robust enough, well thought out enough, then there is no “wrong” permutation of a tag component. The system won’t allow it.

Instead of focusing so heavily on delivering prefab solutions, work up one level of abstraction and improve the system that generates the components.

I wrote this with my own human brain. ↩

An Example

Why components?

Footnotes