A new AI tool shows up most weeks. Some of them are genuinely good. Most are a demo, a wrapper, or a thing that will be gone by next quarter. We cannot try all of them, and neither can you, so we run three quick checks first. Anything that fails a check does not get our time.
These are not deep evaluations. They are the gate before the evaluation. They take a few minutes, and they decide whether a tool is worth a real look.
Check one: can it run locally and keep your data?
The first question is where the work happens. Does the tool run on a machine we control, or does it ship our data to someone else’s server to function?
This is not purity for its own sake. A tool that runs locally keeps the data in the building, costs nothing per call, and does not stop working when a vendor has a bad day. A tool that only works by sending everything to a cloud endpoint inherits all the dependence we try to avoid. We are not absolutist about it, but the burden of proof sits with the cloud-only tool. If it cannot run on our hardware and hold our data, it had better be doing something nothing local can.
Check two: is it actively maintained and openly licensed?
The second question is whether the tool will still be here in a year, and whether we are allowed to depend on it.
Maintenance is easy to check. Look at the project. When was the last release? Are issues being answered, or is the repo a ghost town with a glossy homepage? An AI tool that has not shipped in a long while is a tool you are adopting at the moment it stops being supported.
Licensing is the other half. We read the actual license before we build anything real on top of a tool. Open weights is not the same as open source, and “free to try” is not the same as “free to use the way we intend to.” A clause we skip now is a wall we hit later. The license tells us, in a short document, exactly what we are allowed to do.
Check three: does it earn its place over what we already run?
The third question is the one most people skip. Even a good tool has to beat the tool it would replace.
We already run a stack that works. A new tool does not get adopted because it is new or interesting. It gets adopted because it does a real job better than what is already there, or does a job nothing in the stack can do at all. “Marginally nicer” is not a reason to add a dependency, retrain a habit, and carry one more thing that can break. The bar is not “is this good.” The bar is “is this worth displacing what already works.”
How this fits the ladder
These three checks are not the whole process. They are the entrance to it. A tool that passes moves onto the path we use for everything: research first, then tested in real conditions, then in-production only once it has actually earned a place in client work. Most tools never make it past research, and that is the point. The three checks keep the failures cheap, so the real evaluation is spent on tools that might actually last.
The honest counterpoint
Checklists can make you miss things. A tool that fails check one today, by being cloud-only, might still be the right call for a specific job where nothing local comes close. A young project that fails the maintenance check might be exactly the early bet worth making. The three checks are a default, not a law. They exist to save attention, not to replace judgment, and we override them on purpose when a tool clearly warrants it.
But most tools do not warrant the override. Most of them quietly fail one of the three checks, and the few minutes it takes to find out are the cheapest minutes in the whole evaluation.
Take the next AI tool that catches your eye. Run it through the three checks before you run it at all. Curious about these things. You should be too.
Harness your curiosity.
— Stridenote · № 009