tl;dr

Dimensionalize. Antithesize. Metaphorize. These are cognitive tools in an abstract arsenal: directed reasoning that you can point at your problems.

They’re now available as a Claude Skills library. Download the Future Tokens skill library here, compress to .zip, and drag it into Claude → Settings → Skills (desktop). When you want Claude to run one, type “@dimensionalize” (or whatever skill you want) in the chat.

Language models should be good at abstraction. They are. The Future Tokens skills make that capability explicit and steerable.

In an LLM-judged test harness across dozens of skills, Future Tokens skill calls beat naïve prompts by roughly 0.2–0.4 (on a 0–1 scale) on insight, task alignment, reasoning visibility, and actionability, with similar factual accuracy. That’s roughly a 20–40 percentage-point bump on “reasoning quality” (as measured).

Abstraction needs verbs

Abstraction means recognizing, simplifying, and reusing patterns. It’s a general problem-solving method. There’s no magic to it. Everyone abstracts every day, across all domains:

a database is an abstraction of events
a regression is an abstraction of data
a genre is an abstraction of media
“startup”, “capitalism”, “celebrity” — all abstractions

But most talk about abstraction stays at the noun level: “here is a concept that stands for a cluster of things.”

For thinking, abstraction needs verbs: what we do when we generate and refine patterns:

How do you flip a worldview and see its mirror? Antithesize.
How do you surface the assumptions that undergird a claim? Excavate.
How do you find problems that sound like the one you have? Rhyme.
How do you map your problem into a domain with clearer structure? Metaphorize.
How do you identify the key attributes that describe your choices? Dimensionalize.

This is what the skills are, and why I have named them using verbed nouns. Not metaphysics: reusable procedures for problem solving.

These aren’t totally original ideas. They’re what good thinkers already do. My contribution here is:

Naming them precisely enough to call on demand,
Scaffolding them into workflows with parameters, and
Making them callable by LLMs with “@skill_name”

Abstraction is underused

Here’s the problem: abstraction is hard.

Humans are the only species that (we know) abstracts deliberately. Our brains are built for it, yet we still spend decades training to do it well in even one domain.

We’re all constrained by some mix of attention, domain expertise, time, interest, raw intelligence. I believe everyone abstracts less, and less clearly, than they would if they were unconstrained.

The failure modes are predictable:

under-abstracting: stuck in specifics, drowning in anecdotes, having the same thought on repeat without realizing
mis-abstracting: finding incorrect or harmful patterns (stereotypes, spurious correlations, conspiracies)
over-abstracting: losing contact with concrete reality, building elaborate systems that never get used

The skills don’t fix everything. But:

They make under-abstraction easier to avoid (with an arsenal of abstractions you can apply in seconds), and
They add enough explicit structure that mis-abstractions are easier to see and fix.

You still need judgment. You still need priors. You just get more structured passes over the problem for the same amount of effort.

What Future Tokens actually is

Future Tokens is my library of cognitive operations packaged as Claude Skills. Each skill is a small spec that says:

when to use it (and when not to)
how to run it (what Claude should output)
what good looks like

The current public release includes 5 of my favorite operations. When to reach for them:

I’m having trouble choosing → “@dimensionalize” (find the axes and tradeoffs)
I don’t agree but don’t know why → “@antithesize” (generate a coherent opposing force)
I don’t know why they believe that → “@excavate” (surface the assumptions underneath)
I’m confused about what problem I have → “@rhyme” (find nearby domains and similar problems)
This feels familiar but I don’t know how → “@metaphorize” (bridge domains and pull back concrete implications)

Over time, you stop thinking “wow, what a fancy skill” and start thinking “oh right, I should just dimensionalize this.”

Why this should work (the prior)

Language models are trained on the accumulated output of human reasoning. That text is full of abstraction: patterns, compressions, analogies, causal narratives. Abstraction, in the form of pattern recognition and compression, is exactly what that training optimizes for.

Asking an LLM to dimensionalize or metaphorize isn’t asking it to do something foreign or novel. It’s asking it to do the thing it’s built for, with explicit direction instead of hoping it stumbles into the right move. So:

Asking an LLM to dimensionalize is asking it to surface patterns of “people who think using tradeoffs.”
Metaphorize is “do analogies on purpose, and don’t forget the ‘map back’ step.”
Excavate is “do causal reasoning, and then do causal reasoning on your causal reasoning.”

The interesting discovery is that these capabilities exist but are hidden: simple to access once named, but nontrivial to find. The operations are latent in the model^[1].

Most of the “engineering” this work entails is actually just: define the operation precisely enough that the model can execute it consistently, and that you can tell when it failed.

Evidence it works (the posterior)

I’ve been testing these skills against baseline prompts across models. Short version: in my test harness, skill calls consistently outperform naïve prompting by about 0.2–0.4 (on a 0–1 scale) on dimensions like insight density, reasoning visibility, task alignment, and actionability, with essentially the same factual accuracy. Against strong “informed” prompts that try to mimic the operation without naming it, skills still score about 0.1 higher on those non-factual dimensions. The long version is in the footnotes^[2].

The more interesting finding: most of the value comes from naming the operation clearly. Elaborate specifications help on more capable models but aren’t required. The concept does the work.

This is a strong update on an already favorable prior. Of course directing a pattern-completion engine toward specific patterns helps. The surprise would be if it didn’t.

Why this is free

I couldn’t find a compelling reason to gate it.

These operations are patterns that already exist in publicly available models because they are how good thinkers operate. I want anyone to be able to know and use what LLMs are capable of.

My actual personal upside looks like:

Being “the guy who systematized and named this”.
Stress-testing and improving the framework in the wild.
Getting pulled into more interesting conversations because people find it useful.

The upside of standardization is greater than the upside of rent-seeking. My goal isn’t to sell a zip file; it is to upgrade the conversational interface of the internet. I want to see what happens when the friction of “being smart” drops to zero.

So, it’s free. Use it, fork it, adapt it, ignore 90% of it. Most of all, enjoy it!

The future of Future Tokens

The current release is a subset of a larger taxonomy. Many more operations are in development, along with more systematic testing.

In the limit, this is all an experiment in compiled cognition: turning the better parts of our own thinking into external, callable objects so that future-us (and others) don’t have to reinvent them every time.

My ask of you

If you use these and find something interesting (or broken), I want to hear about it. The skills started as experiments and improve through use.

Download the library. Try “@antithesize” on the next essay you read. Let me know what happens!

^{^}
Not just Claude: all frontier LLMs have these same capabilities, to varying degrees. Claude Skills is just the perfect interface to make the operations usable once discovered.
^{^}
Testing setup, in English:
- Dozens of skills (my private repo)
- For each skill, I wrote a problem that op is supposed to help with
- For each problem, I generated 4 answers:
  - skill – Claude using the skill properly (@dimensionalize, etc.)
  - naïve – a generic prompt (e.g. “What should I think about on this?” for dimensionalize)
  - informed – a brief, engineered prompt that asks for a specific deliverable (e.g. “What are the key factors we should consider when ____?” for dimensionalize
  - adversarial – a 3-sentence LLM summary of the full skill file
Then I had a separate LLM instance act as judge with a fixed rubric, scoring each answer 0–1 on:
- factual accuracy
- insight density
- task alignment
- reasoning visibility
- actionability
Across the full skill set, averaged:
- vs naïve, skill calls were ~0.2–0.4 higher (on a 0–1 scale) on:
  - insight density
  - task alignment
  - reasoning visibility
  - actionability
With essentially the same factual accuracy (within ~0.03).
- vs informed prompts, skills were still ~0.1 higher on the same non-factual dimensions.
That’s where the “20–40 percentage-point bump” line comes from: it’s literally that absolute delta on a 0–1 scale.
Important caveats:
- This is LLM-judged, not human-judged; you’re measuring “how good this looks to a model with the rubric,” which is correlated with human usefulness but not identical.
- It’s small-N and tuned to the kinds of problems these ops are meant for. It’s not a general benchmark, and it doesn’t have correct answers.
- The adversarial prompts were actually pretty strong; in some cases they tie or beat the skill on a given axis.
So the right takeaway is not “guaranteed 30% better thinking in all domains.” It’s more like:
When you run these skills on the kind of messy, multi-factor problems they’re designed for, you reliably get close to expert-quality reasoning by typing a single word.

LESSWRONG
LW