Three checks before adopting any AI tool

New AI tools arrive every week, and most of them are not worth your attention. We run three quick checks before adopting any tool. They take minutes and save weeks.

Written byH Hillary

Read time4 min

UpdatedJun 19, 2026

Filed underNotes · Opinion

Three checks before adopting any AI tool

A new AI tool shows up most weeks. Some of them are genuinely good. Most are a demo, a wrapper, or a thing that will be gone by next quarter. We cannot try all of them, and neither can you, so we run three quick checks first. Anything that fails a check does not get our time.

These are not deep evaluations. They are the gate before the evaluation. They take a few minutes, and they decide whether a tool is worth a real look.

Check one: can it run locally and keep your data?

The first question is where the work happens. Does the tool run on a machine we control, or does it ship our data to someone else’s server to function?

This is not purity for its own sake. A tool that runs locally keeps the data in the building, costs nothing per call, and does not stop working when a vendor has a bad day. A tool that only works by sending everything to a cloud endpoint inherits all the dependence we try to avoid, which is the whole reason we own our AI stack instead of renting it. We are not absolutist about it, but the burden of proof sits with the cloud-only tool. If it cannot run on our hardware and hold our data, it had better be doing something nothing local can. The test is concrete: pull the network cable. A tool that keeps working has passed check one, and a tool that goes dark was never really yours.

Check two: is it actively maintained and openly licensed?

The second question is whether the tool will still be here in a year, and whether we are allowed to depend on it.

Maintenance is easy to check. Look at the project. When was the last release? Are issues being answered, or is the repo a ghost town with a glossy homepage? An AI tool that has not shipped in a long while is a tool you are adopting at the moment it stops being supported.

Licensing is the other half. We read the actual license before we build anything real on top of a tool. Open weights is not the same as open source, a distinction we pull apart in why open weights is not open, and “free to try” is not the same as “free to use the way we intend to.” A clause we skip now is a wall we hit later. The license tells us, in a short document, exactly what we are allowed to do.

Check three: does it earn its place over what we already run?

The third question is the one most people skip. Even a good tool has to beat the tool it would replace.

We already run a stack that works: a writing model and a coding model on a single Apple M4 Pro with 48 GB of unified memory, served through Ollama. A new tool does not get adopted because it is new or interesting. It gets adopted because it does a real job better than what is already there, or does a job nothing in the stack can do at all. “Marginally nicer” is not a reason to add a dependency, retrain a habit, and carry one more thing that can break. The bar is not “is this good.” The bar is “is this worth displacing what already works.”

The real cost of a new tool is rarely the sticker price. It is the switching cost: the habit you retrain, the integration you wire, the failure mode you now have to learn. A tool that is ten percent better and doubles that surface area is a bad trade, and most exciting launches are exactly that trade in disguise.

Where do the three checks fit our evaluation process?

These three checks are not the whole process. They are the entrance to it. A tool that passes moves onto the path we use for everything: research first, then tested in real conditions, then in-production only once it has actually earned a place in real work. It is the same ladder behind how we evaluate local tools for the Atlas. Most tools never make it past research, and that is the point. The three checks keep the failures cheap, so the real evaluation is spent on tools that might actually last.

A worked example helps. A tool lands that promises faster local transcription. Check one passes, it runs offline. Check two is shaky: the last release was nine months ago and the issues sit unanswered. That single fact ends it before we ever install it. The few minutes spent reading the repo saved a week of building on something already abandoned.

When should you override the checklist?

Checklists can make you miss things. A tool that fails check one today, by being cloud-only, might still be the right call for a specific job where nothing local comes close. A young project that fails the maintenance check might be exactly the early bet worth making. The three checks are a default, not a law. They exist to save attention, not to replace judgment, and we override them on purpose when a tool clearly warrants it.

But most tools do not warrant the override. Most of them quietly fail one of the three checks, and the few minutes it takes to find out are the cheapest minutes in the whole evaluation.

The volume only grows from here. As the release pace of local AI tools climbs, a cheap, fast filter matters more, not less, because the cost of chasing every shiny launch compounds while the three checks stay the same few minutes. A studio that guards its attention this way spends it on the handful of tools that actually last, and skips the churn that eats everyone else.

Take the next AI tool that catches your eye. Run it through the three checks before you run it at all. The minutes you spend are the cheapest in the whole evaluation, and they are the ones that protect every hour after.

Check one: can it run locally and keep your data?

Check two: is it actively maintained and openly licensed?

Check three: does it earn its place over what we already run?

Where do the three checks fit our evaluation process?

When should you override the checklist?

More from notes.

Google releases DiffusionGemma, an open model that generates text up to 4x faster

Google DeepMind and partners open a $10M funding call for multi-agent AI safety

Anthropic launches Claude Corps, a $150M fellowship placing 1,000 people in nonprofits