Playbooks Research Jun 07, 2026

Run a Fully Local, Free Deep-Research Engine

A research agent that searches, reads, and writes a cited report with nothing leaving your machine: a local model on Ollama, a self-hosted search backend, and a tool built for offline operation. Here is how the pieces fit together.

Run a Fully Local, Free Deep-Research Engine

A single model call answers a question. It does not run a research project. Deep-research tools are the ones that plan: they break a question into sub-questions, search many sources, read what they find, and write a cited report you can fact-check. The catch is that almost all of them assume a cloud model, a cloud search API, or both. Your research question, the thing that often reveals what you are actually working on, leaves the building.

This Playbook builds the version where it does not. Three pieces: a local model served by Ollama, a self-hosted search backend, and a research agent designed from the start to run offline. The result is a research loop that runs entirely on your own infrastructure, with no API keys and no per-report fee. For privacy-sensitive work, the claim is concrete: the question itself never reached a third party.

What you will end up with

  • Ollama serving an open model locally, as the engine that does the synthesis.
  • A self-hostable meta-search backend for the web side of the search.
  • Local Deep Research wired to both, turning a question into a cited report on your own machine.

We will also look briefly at two alternatives, GPT-Researcher and DeerFlow, so you can pick the right one for your situation rather than the first one you find.

Before you start

You need a Mac, Windows, or Linux machine. The agent itself is light; the model is what wants resources. Plan for at least 8GB of RAM for a small local model, and more for a stronger one. A GPU is recommended for the local model, though not strictly required.

You also need Python 3 and a terminal. And critically, you need a local model already running. If you do not have Ollama set up yet, do that first; the rest of this guide assumes it.

The one piece this guide does not install from scratch is the search backend. The fully-local design pairs the agent with a self-hostable meta-search engine (SearXNG, in the Stridenote stack) that runs on your own server and returns results without sending your queries to a third party. If you already self-host one, you are ready. If you do not, treat standing it up as a separate setup, and see the search note in step 3.

Step 1: Have Ollama serving a model

Ollama is the engine that runs and serves the local model. If it is not installed, the short version on a Mac is:

# macOS
brew install ollama
ollama run llama3.2

On Linux, curl -fsSL https://ollama.com/install.sh | sh installs it and sets up a service. On Windows, winget install Ollama.Ollama does the same. Confirm a model is present:

ollama list

One thing to internalize before you go further: in a fully-local research loop, the local model is the ceiling. With a strong model the reports are solid; with a weak one they are shallow. In the studio we drive this with Nemotron for exactly that reason. Ollama serves an OpenAI-compatible API at http://localhost:11434/v1, which is how every tool below talks to it.

Step 2: Install Local Deep Research

Local Deep Research is the agent built specifically for “nothing ever leaves your machine.” Install it into its own Python environment.

# create and activate a virtual environment
python3 -m venv ldr-env
source ldr-env/bin/activate

# install
pip install local-deep-research

On Windows, activate with ldr-env\Scripts\activate instead. There is also a Docker path if you prefer it:

git clone https://github.com/LearningCircuit/local-deep-research
cd local-deep-research
docker compose up -d

A GPU override compose file is available for accelerated local inference.

Step 3: Wire it fully local

This is the whole point of the tool, and it is two endpoints.

  • The model: point Local Deep Research at Ollama, at http://localhost:11434.
  • The search: point it at your self-hosted search backend.

For the search endpoint, you are pointing the agent at the local meta-search instance you run. [CONFIRM: exact SearXNG base URL and the setting that enables JSON/programmatic output for your instance.] The Atlas note flags this directly: the search backend must allow programmatic access for the agent to read results, and a stock instance may need that turned on. Beyond the web, Local Deep Research can also reach arXiv, PubMed, Semantic Scholar, Wikipedia, and GitHub directly.

With both endpoints reachable, the research loop runs entirely on your machine.

Step 4: Run your first research question

Start the web UI:

# the web UI serves on http://localhost:5000
python -m local_deep_research.web

Open http://localhost:5000, submit a question, and watch it search and synthesize. There is also a REST API and a Python client if you want to embed it in your own scripts later. A good first question:

Compare the leading open source local LLM runtimes in 2026
on speed, ease of use, and platform support. Cite sources.

Prove it works

The test that matters here is not just “did a report appear.” It is “did the work stay local.”

  • Watch the network indicator. Submit a question and watch traffic. With the model on Ollama and search on your self-hosted backend, the research loop runs on your machine. The strong claim, the one that lands for anyone handling sensitive topics, is that the question itself never left your network.
  • Read the citations. The output is a cited report, not a bare answer. Open a citation or two and confirm they point at real sources. A report you can fact-check beats an answer you have to trust blindly.
  • Confirm both endpoints are live. If results never come back, one of the two endpoints is unreachable. Check that Ollama responds and that the search backend is returning machine-readable results.

When a cited report comes back and the network stayed quiet, you have a private research engine running on your own hardware, for free.

Two alternatives, and when to pick them

Local Deep Research is the most local-pure of the three, which is why it leads here. Two others are worth knowing.

GPT-Researcher is the most proven tool in this category and the easiest to embed, since it is an importable Python library, not just an app.

pip install gpt-researcher

It works with any OpenAI-compatible API, so you point it at Ollama by setting the model config to http://localhost:11434/v1. The catch for a local-first stance: its default web search is Tavily, a cloud API. You can swap that out, but it takes more wiring. If a cloud search query is acceptable (the query goes out, synthesis stays local) GPT-Researcher is excellent, especially for research dossiers behind articles and scripts. If “nothing leaves the machine” is the rule, Local Deep Research is the cleaner pick.

DeerFlow is ByteDance’s broader harness: it researches, but it also codes and creates, with sub-agents and persistent memory. It is heavier, needing Node.js 22+, Python 3.12+, and Docker, with no 60-second path; budget 15 to 30 minutes.

git clone https://github.com/bytedance/deer-flow
cd deer-flow
make setup
docker compose up -d

Point its model config at Ollama at http://localhost:11434/v1 for a local setup. Reach for DeerFlow when the task is more than research, a multi-step “research then produce” workflow. For pure cited research, the other two are more focused.

Trade-offs and gotchas

  • The local model is the ceiling. A fully-local loop with a small model gives shallow reports. Use a strong local model for anything real.
  • Two endpoints to configure. The model and the search backend both have to be reachable. A silent failure usually means one of them is not.
  • The search backend needs programmatic access. A self-hosted meta-search instance may need JSON or API output explicitly enabled before the agent can read its results.
  • These are younger projects. Local Deep Research and DeerFlow are less battle-tested than GPT-Researcher, and DeerFlow’s v2.0 is a recent rewrite. Pin a version for workflows that matter.
  • Academic sources rate-limit. Heavy use of arXiv or PubMed can hit limits. Pace large research runs.

Our verdict, in short: this stack is the most exciting in the category for a studio that already runs a local model and a self-hosted search backend, because it turns those two into a complete, private, zero-cost research engine. The honest caveat is the same one that runs through all three tools: the reports are only as good as the model driving them. We test Local Deep Research first precisely because the search plumbing is already in place, and we compare report quality against GPT-Researcher and DeerFlow before handing any of them a production slot.

Where to go next

The natural extension is researching over your own documents, not just the web. Local Deep Research can pull in private documents through LangChain retrievers, and the cleaner those documents are, the better the result. Run a folder of PDFs through Marker first to get clean markdown, then point the research engine at them. That keeps even your source material on-machine. We cover the PDF-to-markdown step in a separate Playbook.

You now have a deep-research engine that plans, searches, reads, and cites, running on your own hardware without anything leaving your network. Curious about these things. You should be too.

Harness your curiosity.

— Stridenote · № 012