Playbooks Tools Jun 07, 2026

How to Install Ollama on a Mac (Apple Silicon) the Right Way

Ollama is the first thing we install on any new machine in the studio. Here is the clean way to set it up on Apple Silicon, plus the handful of settings that save you grief later.

How to Install Ollama on a Mac (Apple Silicon) the Right Way

Ollama is the foundation. It is the first thing we install on a new machine in the studio, and the first thing we set up for anyone who wants to run AI on their own hardware. It turns “running a local model needs Python, virtual environments, and patience” into one binary and one command.

This guide is for Apple Silicon Macs, the M1 through M4 line, where Ollama is at its fastest. The steps are short. The value is in the few details that keep you out of trouble a week later.

What Ollama actually is

Two things, and the second one surprises people:

  1. A way to download and run open models with a single command.
  2. A small server. Once it is installed, Ollama listens on http://localhost:11434 and speaks the same API shape as the big cloud providers. Any tool that “talks ChatGPT” can point at it.

That second point is why Ollama is a foundation and not just a chat toy. Your editor, a documents app, a coding agent: they all plug into that one local endpoint.

Before you start

  • An Apple Silicon Mac (M1, M2, M3, M4 or newer).
  • 16GB of RAM recommended. 8GB runs small models but gets tight fast.
  • Roughly 10GB of free disk to start. Models are large, and they add up.

Step 1: Install

You have two clean options. Either is fine.

The app installer:

  1. Go to https://ollama.com/download.
  2. Download the macOS app and open it.
  3. It installs the background service and a small menu-bar presence.

Or, if you keep your tools in Homebrew:

brew install ollama

Both land in the same place. After install, Ollama runs as a background service, so there is usually nothing to launch. If you ever need to start it by hand:

ollama serve

Step 2: Pull your first model

Start small to confirm everything works, then grow.

ollama run llama3.2

The model downloads on first run (about 2GB), then drops you into a chat. Ask it something. Press Ctrl+D to exit.

To see what you have on disk:

ollama list

Step 3: Prove it is local

This is the test worth doing, because it is the whole point. With a model pulled, turn off your Wi-Fi, then:

ollama run llama3.2

Ask it a question. It answers. Nothing left your laptop. That is the moment local AI stops being an abstract idea.

Step 4: Use it as a server (the part people miss)

Ollama is already serving. You can hit it directly:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Write a haiku about the ocean.",
  "stream": false
}'

And because it is OpenAI-API compatible at http://localhost:11434/v1, you point other tools at that address and they treat your local model like any cloud model. This is how Ollama becomes the engine behind a coding agent, an editor plugin, or a chat UI.

The settings that save you grief

Most guides stop at “it works.” These are the details we wish everyone knew on day one.

  • Models live in ~/.ollama/models/. They are big, and this folder grows quietly into tens of gigabytes. Check it occasionally and remove models you no longer use with ollama rm <model>.
  • Memory is sticky. Once a model loads, it stays in RAM until something bumps it. Run ollama ps to see what is currently loaded. If your Mac feels slow, this is usually why.
  • Right-size the model to your RAM. A 7B model is comfortable on 16GB. Larger models want 32GB or more. Pulling a 70B model on a 16GB Mac will technically download and then crawl.
  • Apple Silicon is the fast path. You do not need to configure a GPU. Apple Silicon’s unified memory is doing the work, which is exactly why these Macs are good at this.

Which models to keep around

A practical starter set for a 16GB Mac:

  • A small general model for quick chat and drafting.
  • One coding model (for example qwen2.5-coder:7b) if you write code.
  • One slightly larger general model for when quality matters more than speed.

Browse the full library at https://ollama.com/library and pull tags by name with ollama pull <model>.

When to reach past Ollama

For 95% of local work, Ollama is the answer. Two cases where we reach further:

  • Raw speed on Apple Silicon. When a job will grind for an hour, MLX runs 30 to 50% faster on the same Mac. We cover that comparison in a separate analysis.
  • Serving many users at once. That is a vLLM job, not an Ollama one.

For everything else, which is almost everything, Ollama wins on the strength of one-command install, an OpenAI-compatible API, and active development.

What you can build from here

You now own the foundation. The next steps all reuse this one endpoint: a polished chat UI, a coding agent that runs in a normal app window, AI autocomplete in your editor, or a tool that lets you ask questions of your own documents. Each is a short follow-on Playbook, and each starts with the model you just pulled.

That is local AI on your laptop, installed the right way. Curious about these things. You should be too.

Harness your curiosity.

— Stridenote · № 002