You are three hours into a refactor, the cloud AI credits run dry, and the model starts refusing your requests because you hit the rate limit. Or maybe you work on proprietary code that cannot leave your machine, and every “we do not train on your data” disclaimer reads differently when legal gets involved. A local AI coding agent solves both problems at once. The stack is OpenCode plus Ollama, and it runs entirely on your Mac with no API costs and no data leaving your hardware.
OpenCode is an open-source AI coding agent with over 140,000 GitHub stars as of April 2026, according to the OpenCode GitHub repository. It connects to 75-plus LLM providers through its provider system, including local models served by Ollama. Ollama, the open-source LLM runner with 170,000-plus GitHub stars, handles model downloading, quantization, and serving behind an OpenAI-compatible API at http://localhost:11434. Together, they form a private, zero-cost coding assistant that works offline.
What do you need before starting?
A modern Mac with Apple Silicon is the ideal host. An M1, M2, M3, or M4 chip with at least 16 GB of unified memory gives you room to run a 7B-parameter model alongside your editor and browser. An 8 GB Mac works with smaller models in the 1B-3B range, but the coding results will be limited. Machines with 32 GB or 48 GB can run 14B-35B models that produce higher-quality completions and tool-use behavior.
Your operating system should be macOS 14 Sonoma or later. Ollama requires it for the native Metal GPU acceleration on Apple Silicon, and the OpenCode TUI benefits from a modern terminal emulator like WezTerm, Alacritty, or Ghostty.
You also need a shell with Homebrew installed or the ability to run curl scripts. Both tools offer Homebrew formulas, and both provide one-line install scripts that work on any Unix system. No Python, no Node.js runtime, no Docker daemon is required for the core setup, though OpenCode can also be installed via npm if you prefer.
How do you install Ollama and pull a coding model?
Ollama installs in under a minute. Open a terminal and run:
brew install ollama
Or use the official install script:
curl -fsSL https://ollama.com/install.sh | sh
After installation, verify it works:
ollama --version
You should see a version number like 0.30.0 or later. As of June 2026, Ollama ships with an MLX backend on Apple Silicon that delivers up to 1.93x decode speed improvement over the previous Metal backend, according to Ollama’s March 2026 benchmark post. The MLX backend activates automatically on M-series Macs and uses Apple’s unified memory architecture directly, eliminating the CPU-to-GPU copy overhead that older inference engines incurred.
Start the Ollama server in the background:
ollama serve
In a second terminal window, pull a coding-focused model. For a 16 GB Mac, start here:
ollama pull qwen3.5:8b
This downloads Qwen 3.5 8B, a model that balances code quality with memory use. The download is about 4.5 GB for the Q4_K_M quantization. It fits cleanly in 16 GB of unified memory alongside macOS and your editor.
For machines with 32 GB or more, Qwen 3.6 35B-A3B (a mixture-of-experts model with 3 billion active parameters) delivers significantly better coding results, or Llama 4 Scout at 109B MoE for the largest setups. The ollama pull command works the same way for any of these.
Once the model finishes downloading, run a quick test:
ollama run qwen3.5:8b "Write a Python function that merges two sorted lists."
If you see output streaming back, Ollama is working and your GPU is engaged. You can confirm GPU usage by opening Activity Monitor and checking the GPU History graph while a response generates.
How do you install OpenCode on your Mac?
OpenCode installs through multiple paths. Homebrew is the simplest:
brew install opencode
This installs the opencode binary and the ripgrep dependency at roughly 157 MB total, according to the computingforgeeks.com setup guide. Confirm it works:
opencode --version
A version like 1.2.20 or later confirms the installation.
An alternative path is the universal install script:
curl -fsSL https://opencode.ai/install | bash
This places the binary at ~/.opencode/bin/opencode. Add it to your PATH if the installer does not do it automatically:
export PATH="$HOME/.opencode/bin:$PATH"
Append that line to your ~/.zshrc to make it permanent.
If you prefer Node.js toolchains, OpenCode also ships through npm:
npm install -g opencode-ai
All three methods produce the same Go binary. Pick the one that matches your package manager habits.
How do you connect OpenCode to your local Ollama models?
OpenCode needs to know about your local Ollama provider before it can use it. There are two approaches.
The fast path uses Ollama’s launch command, introduced in Ollama v0.15 and available in current versions:
ollama launch opencode
This single command installs OpenCode if missing, wires Ollama as the provider, and launches the OpenCode TUI with your local models available, according to the Ollama integration docs. It passes the configuration inline through the OPENCODE_CONFIG_CONTENT environment variable.
The manual path gives you more control. Create or edit the OpenCode config file at ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama-local": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama Local",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"qwen3.5:8b": { "name": "Qwen 3.5 8B" }
}
}
}
}
This defines a custom provider named ollama-local that points at Ollama’s OpenAI-compatible endpoint. The @ai-sdk/openai-compatible package handles the protocol translation.
After saving the config, run OpenCode:
opencode
Inside the TUI, press / to open the command menu and select the ollama-local / qwen3.5:8b model. You can now ask OpenCode to read files, write code, run shell commands, and make edits. The agent will use your local model for every request, with zero data leaving your machine.
For models with larger context requirements, ensure Ollama’s context window is set high enough. OpenCode recommends a minimum of 64,000 tokens for reliable code generation, per the Ollama integration documentation. Set it per model:
ollama run qwen3.5:8b --context-size 65536
How do you pick the right model for your hardware?
Model selection is the single biggest factor in whether a local coding agent feels fast or frustrating. The table below maps hardware tiers to recommended models and expected behavior.
| Mac unified memory | Recommended model | Coding quality | Notes |
|---|---|---|---|
| 8 GB | Qwen 3.5 4B or Gemma 4 9B | Basic completions | Limited tool-use reliability. Works for simple scripts. |
| 16 GB | Qwen 3.5 8B or DeepSeek Coder V2 Lite | Good | Best balance of speed and capability for most users. |
| 24 GB | Qwen 3.6 14B or Gemma 4 27B | Very good | Handles multi-file refactors. Tool calls work well. |
| 32 GB | Llama 4 Scout 109B (MoE) or Qwen 3.6 35B-A3B | Excellent | Active parameters stay low due to MoE. Smooth experience. |
| 48 GB+ | DeepSeek Coder V3 236B MoE or Llama 4 Maverick | Production-grade | Full agentic workflows, large context windows. |
The sweet spot for most developers in 2026 is a 16 GB or 24 GB Mac running Qwen 3.5 8B or Qwen 3.6 14B. These models fit comfortably in memory, stream responses at readable speed on Apple Silicon, and handle the tool-use patterns that OpenCode relies on for file editing and shell execution.
Models smaller than 7B tend to struggle with OpenCode’s tool-calling patterns, producing malformed function calls or losing track of the conversation state. The improvement from 7B to 14B is larger than the improvement from 14B to 35B for most coding tasks, so an 8B model is often a better investment than squeezing a 3B model onto underpowered hardware.
How do you go further with custom configuration?
Once the basic setup works, you can tune the experience.
OpenCode supports Plan mode and Build mode, toggled with the Tab key. Plan mode disables file editing and lets you review the agent’s proposed approach before it touches any code. This is useful when you want to verify the strategy before committing to changes.
OpenCode also integrates with Language Server Protocol (LSP), giving the agent access to type definitions, symbol references, and diagnostics from your project’s language server. Enable it in opencode.json:
"lsp": {
"enabled": true
}
With LSP active, OpenCode can resolve import paths, find unused variables, and avoid type mismatches before writing code.
For air-gapped or enterprise environments, you can verify the full stack works without internet by unplugging your network cable after installing the model and OpenCode binary. The agent continues to function offline, reading and editing files against the local model with no external connections required.
A hybrid setup is also possible. Configure OpenCode with both an Ollama local provider and a cloud provider. Use the local model for quick edits, boilerplate, and exploration, and switch to the cloud model when the task demands a larger context window or more sophisticated reasoning. The /model command in the TUI switches between them instantly.
The local AI coding agent stack has crossed a threshold in 2026. OpenCode at version 1.2 and Ollama with MLX on Apple Silicon deliver a coding experience that is fast enough to be useful, private enough to pass security review, and cheap enough to run indefinitely. The models that matter most for code all run locally now with tool-use support: Qwen 3.5, DeepSeek Coder, and Gemma 4. The gap between local and cloud for everyday coding tasks has narrowed to the point where the trade-off is no longer about capability but about context window size and raw generation speed. Those gaps will close in the next model generation. For now, a 16 GB Mac running Qwen 3.5 8B through OpenCode is a viable daily driver that sends zero bytes to anyone else’s server.