Continue.dev and Ollama: local AI autocomplete in VS Code

GitHub Copilot is excellent and cloud-only. Every keystroke goes to Microsoft, there is a subscription, and you do not get to choose the model. Continue.dev gives developers the same in-editor experience, inline completions and chat and code actions, pointed at any backend you like. Point it at a model running on your own machine and you get Copilot’s feel with none of its conditions.

That is the setup we run. Every Stridenote developer uses Continue.dev with a local Ollama backend for everyday work, and switches to a cloud model only when a problem genuinely demands it. The framing that lands every time we teach it: this is Copilot, but yours. Here is how to build it.

What does this Continue.dev setup give you?

Ollama serving a coding model locally.
The Continue extension installed in VS Code.
Inline completions as you type and a chat panel that knows your code, all answered by a model on your disk.

No Copilot subscription. No code leaving your laptop.

How do you set up local AI autocomplete in VS Code?

Before you start

You need VS Code (or a JetBrains IDE; the steps below are for VS Code) and a machine with at least 16GB of RAM. You also need a local LLM backend. If you do not have Ollama yet, Step 1 covers it (or see the full Ollama install).

One thing to decide early: the right model for your hardware. A 32B model on 16GB of RAM is misery. Start with a 7B coder model for chat, and if you want snappy inline completion, a small fast model is the better choice for that job. More on the two-model split below.

Step 1: install Ollama and pull a coding model

Ollama downloads and serves the model. On a Mac:

brew install ollama

On Windows: winget install Ollama.Ollama. On Linux: curl -fsSL https://ollama.com/install.sh | sh. Or take the installer from https://ollama.com/download. Ollama then runs as a background service on http://localhost:11434.

Pull a coding model:

ollama pull qwen2.5-coder:7b

That is a capable coding model that runs comfortably on 16GB. For a separate, faster completion model, you can also pull a smaller one. And for codebase search, Continue wants an embeddings model:

ollama pull nomic-embed-text

Confirm everything landed:

ollama list

Continue will only see models that appear here, so pull before you connect.

Step 2: install the Continue extension

Open VS Code.
Press Cmd+Shift+X (Ctrl+Shift+X on Windows and Linux) to open Extensions.
Search for “Continue”.
Install the official extension by Continue Dev, Inc. Be exact: there are lookalikes.
Restart VS Code if it prompts you.

A setup wizard appears on first run. It can feel intimidating; take the defaults and tune later.

Step 3: point Continue at your local model

In the wizard:

Choose Ollama as the provider. It auto-detects if Ollama is running.
Choose a model. Continue lists your installed Ollama models. Pick the coder model you pulled, for example qwen2.5-coder:7b.
Optionally select an embeddings model for codebase search: nomic-embed-text.

For full control, Continue’s config lives at ~/.continue/config.yaml. This file is the unsung hero of the whole setup: model presets, a cloud fallback, and codebase context settings, all in one version-controllable place. A working starting point:

name: My Continue Config
models:
  - name: Local Qwen Coder
    provider: ollama
    model: qwen2.5-coder:7b
  - name: Claude (when needed)
    provider: anthropic
    model: claude-sonnet-4-5
    apiKey: <env var or secret>
embeddingsProvider:
  provider: ollama
  model: nomic-embed-text

The local model is your default. The cloud entry is there for the occasional hard problem, switchable per session from the model picker.

Step 4: prove it works

Open a real codebase, not a toy file. Then run through the three things Continue does.

Chat. Press Cmd+L (Ctrl+L) to open the chat panel and ask:

Explain the entry point of this codebase.

A good answer means the local model and the connection are working.

Inline edit. Select a function, press Cmd+I (Ctrl+I), and describe a change, for example “refactor this to use async/await.” Continue shows a diff. Accept or reject it.

Inline completion. Type a new function signature, pause, and watch it suggest the body as you write. If completion is off, toggle it in settings.

Throughout, watch your network indicator. It stays quiet. The model is on your machine.

Why use two models for completion and chat?

The single most useful thing to understand here: completion and chat want different models. Inline completion fires constantly and needs to be fast, so a small model (1.5B to 3B) keeps it responsive. Chat happens deliberately and benefits from a stronger model, so a 7B or larger coder model is the better choice there. Configure both in config.yaml and assign each to its role. This is the difference between completions that feel sluggish and completions that nearly match cloud Copilot.

What are the common Continue.dev problems?

Picking the wrong model for your hardware. A 32B model on 16GB of RAM will crawl. Start with 7B coder models and only go bigger if your machine has the headroom.
One model doing both jobs. See the two-model trick above. A single large model makes completion feel slow; a single small model makes chat feel weak.
Codebase context needs the embeddings model. Without nomic-embed-text (or another embeddings model), “where is X defined” style questions cannot search your repo.
Models do not appear. Continue only lists what Ollama has. Run ollama list and pull anything missing.
Slash commands hide good features. /edit, /comment, /test and others are powerful but easy to miss. Read the docs at https://docs.continue.dev to find them.
A .continue folder in a repo can carry project-specific overrides. Handy for teams, occasionally surprising when settings differ between projects.

The honest trade-off is latency. On modest hardware, a large local model for completion can feel laggy. The two-model split is the fix, not a workaround. And if you are on a JetBrains IDE, know that its Continue plugin lags the VS Code one on features, though it is improving.

Where to go next

This reuses the same Ollama backend as the rest of a local AI stack, so you have already done the foundational work. From here, the natural next step is a coding agent that can read your whole repo, make edits, and commit them, pointed at the same model. That is its own Playbook.

You now have Copilot’s experience in your editor, answered by a model on your own machine, with no subscription and no code leaving your laptop.