Studio Behind-the-scenes Jun 07, 2026

The Open Source AI Atlas: How We Evaluate Local Tools

A look inside the Atlas, the studio's running record of every local AI tool we test. The same template for every tool, the same status ladder for every verdict. Here is the framework we use to separate what works from what merely demos well.

Written byH Hillary

Read time10 min

UpdatedJun 12, 2026

Filed underStudio · Behind-the-scenes

The Open Source AI Atlas: How We Evaluate Local Tools

Behind every build the studio publishes is a quieter document: the Atlas. It is our running record of the local AI tools we have tried, organized so that a tool we tested months ago can still tell us, at a glance, whether it earned a place in real work. This article opens that record and shows you the framework underneath it, because the framework is more useful than any single verdict.

The whole thing rests on two ideas. First, we ask every tool the same questions, in the same order, using the same template. Second, we sort every tool onto the same status ladder, so “we like it” and “we run it daily” never get confused. Same questions, same ladder. That is the method.

Why a template at all

The temptation with a new AI tool is to play with it, feel impressed, and move on. That feeling is worthless a week later, and it is exactly how people end up with a drawer full of tools they downloaded once. A template fixes this. By forcing every tool through the same sections, we make tools comparable to each other and we make our own past judgment legible to our future selves. If the template has a gap, the tool has a gap, and that shows up the same way every time.

The questions we ask every tool

Every tool note in the Atlas fills the same sections. Each one exists to answer a specific question, and together they move a tool from “looks interesting” to “we know what this is.”

What it is, in one line. Plain English, no marketing. If we cannot say what a tool is in a sentence, we do not understand it yet.
Why it exists. The problem it solves and who it is for. This is where we separate a genuinely new capability from a nicer wrapper on something we already have.
Stop here if. The honest blockers, named specifically: a RAM floor, an OS requirement, a prerequisite the tool assumes. This section saves a reader more time than any other, because it tells some readers to walk away.
Try it in 60 seconds. The fastest path to “it works.” Three commands at most. This is the test of whether a tool respects your time on the first run.
Full install. The real setup, per platform, Mac and Windows and Linux, including the dependencies the official docs gloss over.
First commands that prove it works. The concrete steps that demonstrate the tool is actually doing its job, with success described so you know what you are looking at.
What you can build. Outcomes, not features. The actual projects the tool makes possible.
How to demo this in 5 minutes. A talk track for showing the tool to a client or a class, including the “aha” moment most people do not expect and the thing that confuses people first. Teaching a tool is the hardest test of whether you understand it.
Stridenote verdict. Our plain-language take. Did we like it, what is it for, what frustrated us. If we have not used it yet, the verdict says exactly that.
Gotchas. The real-world annoyances the docs do not warn you about. The performance quirk, the compatibility edge, the silent failure.

We also record the boring metadata at the top of every note, because the boring metadata is what makes the Atlas searchable later: license, platforms, whether it needs a GPU, a RAM figure, a difficulty rating, the interface, and the dates we first added and last reviewed it. That last pair matters more than it looks. A verdict with no date is a rumor.

The status ladder

A template tells you what a tool is. The status ladder tells you how much to trust our opinion of it. There are three rungs, and a tool can only be on one.

Research. On our radar, read about, maybe watched a demo, not yet run by us. A research verdict is allowed to be wrong because we are honest that we have not touched it. This is the rung where most tools live, and that is correct.
Tested. We installed it and ran it on real material, not a toy example. A tested verdict comes with specifics: what worked, where it stopped, what it asked of us. This is the rung where a tool stops being a rumor and becomes a finding.
In-production. It earned a permanent place in the studio and we use it in real work. This is the highest bar and the rarest rung. A tool only gets here by surviving daily use, not by impressing us once.

The ladder only climbs by doing the work. We do not promote a tool to tested because it is popular, and we do not promote it to in-production because we want it to be good. Every rung up costs real hands-on time, which is the whole point. The status is a promise about how much of that time we have spent.

How the categories hold it together

Individual tools live inside category overviews, and those have their own template, because comparing tools is a different job than describing one. A category page opens with the problem the whole category solves, then a genuine opinion on the state of that field in 2026: who is winning, what is improving fastest, what is stagnating. That opinion is the part a raw tool list cannot give you and the part that makes the Atlas worth keeping.

Each category then points you by situation rather than alphabetically: new to this, start here; have this specific need, use that; want full control, go here. And it carries a comparison table, the whole field in one view, with difficulty, platforms, license, whether it is a studio pick, and a one-line take per tool. The table is where a reader who knows what they want finds it in seconds, and it is a format we have leaned on across our directory work because it respects the reader who is scanning, not reading.

Why we keep it in the open

The Atlas could be a private spreadsheet. We write it as something publishable on purpose, because the discipline of writing for a reader is what keeps the judgments honest. You cannot fill “stop here if” with a vague hand-wave when someone is going to read it and act on it. You cannot leave a verdict undated when the date is the whole credibility.

So when you read a Stridenote build and we say a tool is in-production, that phrase has a definition behind it and a dated note proving it. That is the framework. Same questions for every tool, same ladder for every verdict, every judgment written down with the date it was made.

Curious about these things. You should be too.

Harness your curiosity.

— Stridenote · № 005

Why a template at all

The questions we ask every tool

The status ladder

How the categories hold it together

Why we keep it in the open

More from studio.

From Subscription to Self-Hosted: A Studio Case Study

A Week of Coding With a Local Agent, No Cloud

The Cheapest Mac That Runs Serious Local AI