OpenClaw with a local model: a private AI assistant

OpenClaw with a local model: a private AI assistant

You type openclaw onboard in a terminal, answer five questions, and within minutes a persistent AI assistant lives on your machine. It does not live in a browser tab. It does not forget your conversation when you close a window. It sits in the background, connected to WhatsApp, Telegram, iMessage, and Slack, and it answers when you message it. OpenClaw reached more than 379,000 GitHub stars by June 2026, making it the most-starred software project on the platform according to the GitHub repository. The project, created by Peter Steinberger, went from zero to that number in under seven months. Much of the interest comes from one feature: the ability to run the whole thing with a local model, keeping every conversation, every file, and every tool call on your own hardware.

What is OpenClaw and what does it do?

OpenClaw is a self-hosted gateway that connects messaging platforms to AI agents. You install it on your own machine, Mac, Linux, or Windows, and it runs as a background daemon. The gateway manages sessions, tool execution, channel connections, and model routing. When you send a message from WhatsApp, the gateway receives it, passes it to the configured AI model, executes any tools the agent needs, and sends the response back.

The project is MIT-licensed, meaning there is no hosted version and no subscription. You own the full stack. The OpenClaw documentation describes the personal assistant setup as a self-hosted gateway, and the project lists support for more than twenty messaging channels including WhatsApp, Telegram, Slack, Discord, Signal, iMessage, and Matrix. The gateway exposes a local HTTP endpoint at http://127.0.0.1:18789 by default, and companion apps for macOS, iOS, and Android give you a native interface for managing the assistant.

OpenClaw is architecturally different from cloud AI products. ChatGPT and Claude are reactive: you open a tab, type a prompt, get an answer, close the tab. OpenClaw is proactive. It can run heartbeat checks, execute scheduled cron tasks, monitor inboxes, and send you briefings. The agent loop runs continuously, not only when you are typing.

How do you run a local model with OpenClaw?

The gateway does not contain an AI model itself. It calls out to one, either a cloud API or a local model server running on your machine. OpenClaw supports five main local backends: Ollama, LM Studio, the ds4 provider for DeepSeek V4 Flash on macOS Metal, MLX for Apple Silicon, and vLLM or SGLang for high-throughput GPU serving. The local models documentation recommends LM Studio as the best starting point for most users, with Ollama as the closest second for those who prefer a CLI workflow.

Ollama is the most accessible path for first-time local setups. The Ollama blog published a setup guide showing the single-command install path: ollama launch openclaw handles both installing the gateway and connecting it to a local or cloud model. For local-only setups, you pull a model first with ollama pull gemma4 or ollama pull qwen3-coder, then route the gateway to that local instance.

The key configuration detail is the model provider block in ~/.openclaw/openclaw.json. A minimal local setup with Ollama looks like this:

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://localhost:11434",
        apiKey: "ollama-local",
        api: "ollama",
      },
    },
  },
  agents: {
    defaults: {
      model: { primary: "ollama/qwen3-coder" },
    },
  },
}

The documentation warns against using the /v1 OpenAI-compatible endpoint for Ollama because tool calling breaks silently on that path. The native Ollama API at port 11434 without the path suffix is the correct target.

How do local models protect privacy and data ownership?

The strongest argument for running OpenClaw with a local model is data sovereignty. When you point the gateway at a cloud API like Claude or GPT, every message, every file read, and every tool output crosses a network boundary to someone else’s server. The Ollama blog tutorial notes that local models work out of the box without additional plugins and that the entire setup keeps conversations and code private.

With a local model, nothing leaves your machine. The gateway, the model, and all tool execution stay on your hardware. This matters for anyone who handles sensitive documents, works with proprietary code, or simply prefers not to have their conversation history stored on a third-party server. OpenClaw stores its workspace at ~/.openclaw/workspace/ by default, including agent memory, session state, and channel history. You can read, edit, back up, or delete every byte.

The privacy benefit comes with a trade-off. Local models are smaller than cloud models, and smaller models are more vulnerable to prompt injection. The OpenClaw security documentation is explicit about this: “small or aggressively quantized cards truncate context and leak safety.” Running local means accepting that the model has weaker guardrails than a flagship cloud model. You mitigate this by keeping agent permissions narrow and enabling compaction to limit the blast radius of any single injection.

What hardware do you need for a local assistant?

Running a capable local model requires real hardware. The OpenClaw local models page recommends at least two maxed-out Mac Studios or an equivalent GPU rig for a comfortable agent loop. That is an expensive bar, but the practical floor is lower for lighter use.

A single 24 GB GPU works for simpler prompts at higher latency. On Apple Silicon, a Mac with 48 GB or 64 GB of unified memory can run models in the 20B to 30B parameter range using Ollama or LM Studio. Sam Black, writing on Towards Data Science in June 2026, documented a working setup on a Mac Mini using Qwen 3.5-9B via llama.cpp, achieving 20 to 70 tokens per second on a 16 GB Mac.

The practical consideration is context length. OpenClaw requires at least 64K tokens of context to complete multi-step agent tasks. Many quantized local models advertise large context windows but degrade under load. You need to test your specific model and hardware combination before relying on it for daily work. The recommended path is to configure a hybrid setup: a local model as the primary with a cloud fallback, using OpenClaw’s models.mode: "merge" setting so the assistant stays responsive when the local model is too slow or overloaded.

How do you set up your first local OpenClaw assistant?

The fastest path takes about fifteen minutes. You need a machine with Node.js 22 or later, a local model server (Ollama or LM Studio), and a messaging account to test with.

Step one is installing the gateway. The recommended command is npm install -g openclaw@latest followed by openclaw onboard --install-daemon. The onboard wizard walks through provider selection, model discovery, and channel configuration. For a local-only setup, you select Ollama or LM Studio during onboarding, point it at your local server, and the wizard auto-discovers available models.

Step two is pulling a model that works with agent tasks. The Ollama recommended models include qwen3-coder for coding, glm-4.7-flash as a balanced general-purpose option, and gpt-oss:20b for a mid-range option with strong reasoning.

Step three is testing the assistant. You message the connected channel, like WhatsApp, Telegram, or iMessage, and the gateway should respond. The default configuration uses per-sender sessions, so each person who messages the assistant gets their own context.

Step four is tuning. OpenClaw defaults to a heartbeat every 30 minutes with a proactive prompt. You will want to adjust or disable this until you trust the setup. The config key is agents.defaults.heartbeat.every: "0m" to turn it off. You also set channels.whatsapp.allowFrom to restrict incoming messages to specific phone numbers.

What security considerations matter for daily use?

OpenClaw’s explosive growth has been accompanied by security warnings. Researchers identified more than 135,000 publicly exposed instances by February 2026, according to reporting by The New Stack. The core risk is that OpenClaw has shell access, filesystem access, and browser session access by default. A misconfigured gateway exposed to the internet is a remote code execution vector.

The project’s security model relies on sandboxing. The default configuration runs tools on the host for the main session, but group channels can be sandboxed using Docker, SSH, or OpenShell backends. The config key agents.defaults.sandbox.mode: "non-main" restricts non-primary sessions to isolated environments.

When running with a local model, provider-side safety filters do not apply. Cloud providers like Anthropic and OpenAI run their own guardrails on every request. Local models skip those entirely. This means you are responsible for setting the agent’s persona and permissions to match your risk tolerance. The SOUL.md file in the workspace defines the assistant’s instructions, and the tool policy block in the config controls which tools the agent can use and which directories it can access.

For most single-user setups on a personal laptop behind a home network, the risk is manageable. The gateway binds to localhost by default, and channel connections use outbound-only protocols like WhatsApp web or Telegram bot polling. The danger comes from exposing the gateway port to the internet or connecting it to shared group channels without sandboxing.

OpenClaw with a local model is not a replacement for Claude or GPT on complex reasoning tasks. A 9B parameter model running on a Mac Mini cannot match a 1-trillion-parameter cloud model on deep analysis. But for the tasks people actually use an assistant for, like calendar checks, email summaries, reminders, and file organization, a local model performs indistinguishably from a cloud one. The difference is that your data stays on your desk, and the monthly API bill drops to zero. That trade-off is why 379,000 developers starred the project, and it is why local AI assistants are becoming the default choice for anyone who values privacy over peak performance.


Disclosure: StrideNote Studio is a research and media production studio that evaluates local AI tools as part of its workflow.

Share this
X Facebook LinkedIn Email