Local AI coding agent setup with OpenCode and Ollama

You type opencode into a terminal, and an AI agent appears. No browser tab, no monthly subscription, no API key to paste. It reads your files, runs shell commands, and suggests edits. Every token stays on your machine.

That setup is possible today with two open-source tools. OpenCode is a terminal-first AI coding agent with more than 11,500 GitHub stars, according to Josphat Mutai’s guide on ComputingForGeeks. Ollama is the most popular local LLM runtime, hitting 52 million monthly downloads in Q1 2026, per an analysis by AlexCloudStar. Together they give developers a private, offline alternative to cloud-based coding assistants like Claude Code or GitHub Copilot.

This walkthrough covers the full setup: what hardware you need, how to install both tools, how to wire them together, and which models actually work for real coding work.

Why run a local AI coding agent?

Most AI coding tools today send your code to someone else’s servers. Every prompt, every file you ask the agent to read, every diff it generates travels over the internet. For developers working on proprietary software, healthcare data, or anything under regulatory compliance, that is a problem.

A local setup fixes all of it. The model runs on your machine. The agent never phones home. There are no API costs, no rate limits, and no per-token billing. You can run 100 requests or 10,000 requests for the same flat hardware investment.

The performance gap between local and cloud models has narrowed. A year ago, running a coding agent locally meant accepting low-quality responses from models too small for complex reasoning. That changed in late 2025 and early 2026. Open-weight models like Qwen2.5-Coder and DeepSeek Coder now score competitively on coding benchmarks, and mixture-of-experts architectures let larger models run on consumer hardware by activating only a fraction of their parameters per token.

Writing for OpenReplay, the team described OpenCode as “an open-source coding agent that runs in your terminal, connects to the models and providers you already use, and gets out of the way.” That is the core value. No vendor lock-in. No data leaving your network.

What do you need before starting?

The hardware requirements depend on the model you want to run, not on OpenCode itself. OpenCode is a lightweight Go binary. Ollama is the resource-intensive piece.

For a workable local setup, you need:

16 GB of unified memory or RAM for 7B to 14B parameter models
24 GB or more for 30B+ parameter models or mixture-of-experts models
Apple Silicon (M-series) for the best consumer experience, since Metal acceleration is built into Ollama with no driver setup required
macOS, Linux, or Windows via WSL2 as the operating system

OpenCode supports over 75 AI providers, per the ComputingForGeeks setup guide, but for a fully local pipeline you only need one: Ollama. The install process takes about ten minutes for both tools.

If you are on an M4 Pro Mac with 48 GB of unified memory (the machine running StrideNote Studio), you can comfortably run models in the 14B to 35B range. On a machine with 16 GB, stick to 7B to 9B models for usable response times.

How do you install OpenCode?

OpenCode installs through several methods. The fastest is the universal install script:

curl -fsSL https://opencode.ai/install | bash

This downloads the correct binary for your architecture and places it at ~/.opencode/bin/opencode. If the script does not add it to your PATH, do that manually in your shell config file:

export PATH="$HOME/.opencode/bin:$PATH"

On macOS, Homebrew is also an option. The OpenCode tap stays current with each release:

brew install anomalyco/tap/opencode

The official Homebrew formula (brew install opencode) exists but updates less frequently. If you use npm, the command is:

npm install -g opencode-ai

After installation, verify it works:

opencode --version

The first time you run opencode in a project directory, it creates a local configuration file and prompts you to choose a model provider. By default it looks for common API keys in your environment (OPENAI_API_KEY, ANTHROPIC_API_KEY) and connects automatically. For a local setup, you ignore those defaults and configure Ollama manually.

How do you install and configure Ollama?

Ollama runs models locally and exposes them through an OpenAI-compatible API. The install is a single command on macOS and Linux:

curl -fsSL https://ollama.com/install.sh | sh

Or via Homebrew:

brew install ollama && brew services start ollama

Once Ollama is running (it listens on http://localhost:11434 by default), pull a coding-oriented model. The Qwen2.5-Coder series is the current best pick for local coding work:

ollama pull qwen2.5-coder:14b

For machines with more memory, the 32B variant offers noticeably better reasoning:

ollama pull qwen2.5-coder:32b

One critical detail: Ollama defaults to a 4,096-token context window. OpenCode’s system prompt and tool definitions alone consume a large portion of that, and tool-calling breaks when the context is too small. You need to create a custom Modelfile to expand it:

FROM qwen2.5-coder:14b
PARAMETER num_ctx 32768
PARAMETER temperature 0.2

Then build and use the custom model:

ollama create qwen2.5-coder:14b-opencode -f Modelfile

Set the temperature low. Coding models perform best at 0.1 to 0.3, where output is deterministic and structured.

How do you connect OpenCode to Ollama?

OpenCode does not expose an Ollama option in its built-in provider list. It only shows Ollama Cloud by default. To point it at your local instance, you create or edit the OpenCode configuration file.

The global config lives at ~/.config/opencode/opencode.json. If you want per-project settings, place opencode.json in the root of your project directory instead.

Here is the configuration block that wires OpenCode to your local Ollama instance:

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/qwen2.5-coder:14b-opencode",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama Local",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:14b-opencode": {
          "name": "Qwen 2.5 Coder 14B OpenCode"
        }
      }
    }
  }
}

Two things to check if the connection does not work. First, the baseURL must end in /v1. That is the OpenAI-compatible endpoint path. Without it, OpenCode cannot find the model. Second, set the main model and the small_model to the same value. If you configure a different small model, Ollama loads both into GPU memory, causing contention and latency spikes.

Once the config file is in place, launch OpenCode in your project directory:

opencode

The terminal UI appears. If the connection succeeds, the model name shows in the header bar and you can start sending prompts. Test it with a simple request like “list the files in this directory and explain the project structure.”

Which models work best for local coding?

Not every model works well with OpenCode. The agent relies on tool-calling: it needs to read files, write edits, run commands, and interpret results. A model that cannot follow structured function-calling patterns will produce hallucinations instead of working code.

The safe choices in mid-2026 are:

Qwen2.5-Coder 14B is the minimum viable model for agentic coding. It handles single-file edits, refactoring, and explanation tasks well on 16 GB machines.
Qwen2.5-Coder 32B is the current quality leader for consumer hardware. It manages multi-file changes, architectural reasoning, and complex debugging. Requires 24 GB or more.
DeepSeek Coder V2 16B (instruct version) is a strong alternative with good tool-calling discipline.
Gemma 4 26B (mixture-of-experts) offers a good quality-to-memory ratio on Apple Silicon, with only a fraction of parameters active per token.

Avoid models smaller than 9B parameters for agentic tasks. They struggle with OpenCode’s tool-use patterns, producing broken edits and incorrect file paths. Also avoid thinking or reasoning models (like DeepSeek R1 or QwQ) unless you are willing to wait. They generate hundreds of hidden reasoning tokens per response, turning a 10-second interaction into a 60-second one.

Ollama’s model library has hundreds of options, but for OpenCode specifically, the winning strategy is simple: pick the largest coding-specialized model your hardware can run, extend the context window to 32K, set temperature low, and use the same model for both main and compaction tasks.

A local AI coding agent is no longer an experiment. OpenCode and Ollama together give you a functional, private alternative to cloud coding assistants. The tools work today. The models are good enough for daily use. The only cost is the hardware you already own.

The workflow is different from typing into ChatGPT or Claude. You are still at a terminal, still running commands, still reading diffs. But the model now operates inside your project, sees your full codebase, and edits files directly. For developers who value privacy, offline capability, and zero API bills, that is a significant shift worth making now.

Why run a local AI coding agent?

What do you need before starting?

How do you install OpenCode?

How do you install and configure Ollama?

How do you connect OpenCode to Ollama?

Which models work best for local coding?

More from playbooks.

How to run Gemma 4 12B locally on a Mac with Ollama

OpenClaw with a local model: a private AI assistant

ComfyUI with local LLMs: a practical Mac workflow