For a long time the honest answer about local AI was “impressive, but not yet.” It ran, it just was not good enough to replace the tools you actually relied on. That sentence quietly stopped being true. For the work most of us do every day, a model on your own machine is now good enough, and the proof is not a benchmark. It is what we run.
So here is the setup that holds up, every day, with nothing leaving the building: a single Apple M4 Pro with 48 GB of unified memory, Ollama serving the models, and a small set of tools plugged into it.
Is local AI good enough to use daily now?
The short version: yes, for ordinary work. Ollama is the foundation, the engine everything else plugs into. One command pulls a model, and it serves that model locally so every other tool can reach it. There is no version of our day that does not touch it.
On top of it we run two specialists rather than one generalist: gemma4:31b for writing and glm-4.7-flash for coding. The split matters less than the habit, but two focused models clear a higher bar on the work we actually do than one model trying to be everything. We laid out the full reasoning in why we run our LLMs locally.
What do we run for coding locally?
OpenCode, the desktop app, pointed at a local model through Ollama. It reads our files, edits them, and runs commands in a normal app window, and it keeps working with the Wi-Fi off. This is the one that surprised us most. We expected to keep a cloud agent for “real” work and use local for toys. Instead the local setup carried the daily load, and the cloud became the exception we reach for only on genuinely hard problems.
That inversion is the whole story of the last year. The thing we assumed would be a fallback became the default, and the expensive option became the thing we open a few times a month instead of a few times an hour. Nothing about the work got worse; the floor simply rose until the local option cleared it.
What actually changed is unglamorous. Open models got smaller for the same quality, quantization improved so a capable model fits in 48 GB of memory, and Apple Silicon’s unified memory let the GPU reach all of it. None of that is a single breakthrough you can point to. It is a year of steady gains that crossed a threshold: the point where the local answer stopped being noticeably worse than the rented one for everyday tasks. Cross that line once, for the work you actually do, and there is little reason left to keep paying a monthly bill to sit on the other side of it.
What handles everyday questions and writing?
A local chat on top of Ollama handles drafting, rewriting, quick reasoning, and the hundred small questions a workday produces. None of it is sensitive enough to send anywhere, and now none of it has to be. The writing model turns a rough note into a clean paragraph, answers questions against our own files, and never once asks us to sign in or counts a token against a quota.
When a job grinds, batch transcription, an evaluation run, anything long, we switch to MLX, Apple’s framework, which runs faster on the same Mac. Convenience by default, speed when the clock is the bottleneck. The point is not one perfect tool; it is a small stack where each piece does one job well and all of it runs at home.
Strung together, a normal day looks like this. A recording becomes a searchable transcript before the meeting notes are cold. A rough draft becomes a clean one in the same window we wrote it in. A coding task gets read, edited, and tested by an agent that never phones home. A dozen small questions get answered without a tab, a login, or a quota. At no point did the work wait on a network or a vendor, and at no point did anything sensitive leave the desk.
When do you still reach for the cloud?
Good enough is not the same as best. On the hardest problems, a frontier cloud model is still ahead, and we will not pretend otherwise. If your work lives at that frontier all day, keep the cloud tool that earns it. We keep exactly one, for the dense reasoning jobs that come up rarely.
Be clear-eyed about the trade. Local asks you to own the upkeep: the occasional model update, the disk space, the rare afternoon spent fixing your own setup. In return it removes the meter, the rate limit, the outage, and the policy you did not write. For a studio that runs all day, that is a trade worth making for the ordinary work and worth declining for the rare hard problem. Knowing which is which is the whole skill.
But “the hardest problems all day” is not most people’s work, and it is not most of ours. Most work is ordinary, and ordinary is exactly what a local model handles now. The bar moved. The question stopped being “can it run locally” and became “is it good enough,” and for daily work the answer turned to yes. It also happens to cost nothing per use, the argument we make in full in stop paying monthly for AI, and it keeps the work on hardware you control rather than rented, the case in own your AI stack.
The way to know is not to read another comparison. It is to run the local version of something you do every day and notice that you did not miss the cloud. That is the whole shift, and you can test it this afternoon. If the local version handles a real task and you forget the cloud existed, you have your answer, and it did not come from a leaderboard.