Stridenalysis Research Jun 07, 2026

Ollama vs MLX vs Jan: Running Local Models on a Mac

Three ways to run a model on Apple Silicon: the foundation, the fast one, and the friendly one. We run all three for different jobs. Here is which fits yours.

Ollama vs MLX vs Jan: Running Local Models on a Mac

People ask which one to use as if there is a single winner. There is not. These three tools answer three different questions, and we run all three in the studio for different jobs on the same machines. The trick is knowing which question you are actually asking.

  • Ollama is the foundation: the easiest way to download, run, and serve models.
  • MLX is the fast one: Apple’s own framework, built for Apple Silicon, when speed is the priority.
  • Jan is the friendly one: a normal desktop app for people who will never open a terminal.

What each one really is

Ollama is a single binary and a small server. One command pulls a model, and it listens on http://localhost:11434 with an OpenAI-compatible API so other tools can plug in. It is the thing we install first on any new machine.

MLX is Apple’s machine-learning framework, written from scratch for Apple Silicon’s unified memory. Paired with the mlx-lm package, it is the fastest way to run a model on a Mac. It is Python-first, so using it well means a little code.

Jan is a desktop chat app that bundles its own runtime. Download it, pick a model from a Hub, and chat. No terminal, no Python, no Docker. It is the lowest-friction path to a first local model.

Speed: MLX leads, and it is not subtle

On Apple Silicon, MLX runs roughly 30 to 50% faster than the equivalent work elsewhere on the same Mac. Apple designed the chip and wrote the framework, so this is unsurprising, but the gap is real and it compounds. On a one-off chat, you will not care. On a batch transcription job, a model evaluation run, or anything you would let grind for an hour, that 30 to 50% is the difference between a coffee break and a lunch.

Ollama is plenty fast for interactive use. Jan, running its bundled runtime through an Electron app, carries the most overhead of the three, which is the cost of being the friendliest.

Friction: Jan leads, MLX trails

Jan is three clicks to a working model: open, pick from the Hub, chat. It is the tool we hand to someone who would otherwise never try local AI, because the barrier was never the model, it was the terminal.

Ollama is one command and forgiving. MLX is the most setup: Python environments, pip, and sometimes converting a model into the MLX format if it is not already on the mlx-community hub. That friction buys you speed and a fine-tuning pipeline, but it is friction.

The detail that bites: separate model stores

This trips people up, so plan for it. Jan’s bundled runtime keeps its own model store, separate from Ollama’s. A model you download in Jan does not appear in Ollama, and vice versa, unless you configure Jan to use Ollama as a remote backend. If disk space matters, do not download the same 4GB model twice. Either centralize on Ollama and point Jan at it, or accept two stores on purpose.

A few honest trade-offs

  • MLX is Mac-only and Python-first. A strength on Apple Silicon, a non-starter anywhere else, and it expects you to be comfortable with venvs and pip.
  • Jan is Electron. A few hundred megabytes of RAM go to the UI before the model loads. On a 16GB Mac running a 7B model, it gets tight.
  • Ollama’s memory is sticky. A loaded model stays in RAM until something bumps it. Use ollama ps to check.

Who should run what

Run Ollama if you want one tool that does almost everything and plugs into everything else. This is the default, and for 95% of our work it is the answer.

Add MLX if you are on Apple Silicon and you have jobs that run long enough for a 30 to 50% speedup to matter, or you want to fine-tune. It is our second-tier daily driver: Ollama when convenience wins, MLX when speed wins.

Choose Jan if you, or the person you are setting up, will not open a terminal. It is the friendliest first taste of local AI, and the one most likely to actually get installed.

What we run, and why

All three, deliberately. Ollama is the foundation on every machine. MLX comes out for batch work and evaluation runs where the clock matters. Jan is what we install for someone taking their first step, because a normal-looking app gets opened and a terminal command does not.

If you only take one thing: start with Ollama. It is the foundation the other two relate back to, and the one the rest of a local AI stack plugs into. Add MLX when speed becomes the bottleneck, and keep Jan around for the people you are bringing along.

Curious about these things. You should be too.

Harness your curiosity.

— Stridenote · № 004