The "deep research" feature — point an agent at a hard question, walk away, come back to a long, cited report — was a closed product a year ago. (If you're fuzzy on the category itself, start with what "deep agents" actually are.) Now it's a crowded open-source category, and several of the implementations are good enough to put real work through. The pattern underneath them is almost always the same: a planner decomposes the question into sub-questions, a set of searchers gather sources in parallel, the agent reads and recurses on promising leads, and a writer synthesizes a report with citations. What differs — and what you should choose on — is the stack, where it runs, and how much of that loop is exposed to you as configuration.

Here are seven worth knowing, from the smallest reference to the leaderboard-toppers.

The standalone tools#

Planner/executor deep-research agent: a planner writes sub-questions, parallel execution agents gather web and local-document sources, a publisher synthesizes a cited report. From the Tavily team.

The most-starred standalone in the category, and the most "just run it." Its Deep Research mode is a recursive tree exploration with configurable depth, breadth, and concurrency — a full run lands around five minutes and well under a dollar on a small reasoning model. If you want a deep-research tool rather than a deep-research framework, start here.

Multi-agent task-automation framework on CAMEL-AI; a top open-source performer on the GAIA agent benchmark (~69% avg). Accepted to NeurIPS 2025.
★ 20kPythoncamel-ai/owl

OWL's clever move is lazy browser use: it decides per step whether a cheap tool (search, code execution, an Arxiv or GitHub toolkit) is enough and only spins up a real browser when a page genuinely needs interaction. Browsers are the slowest, most expensive, most failure-prone tool in any research agent — treating them as a last resort is why OWL is both fast and near the top of GAIA.

The hackable bases#

Model-agnostic deep-research agent on LangGraph with four independently swappable model roles (summarize, research, compress, write); Tavily plus MCP and native provider search.

The standout feature is honesty about architecture. The repo keeps its older designs — plan-and-execute, supervisor-researcher multi-agent — in src/legacy/ and shows the current single-loop design beats them on DeepResearch Bench. You get a base you can A/B real architectural choices in, not just a black box that happens to work.

The canonical minimal reference: iteratively generates queries, scrapes results, and recurses deeper on findings — deliberately kept under 500 lines.
★ 19kTypeScriptdzhng/deep-research

Read this one before you build anything. Its whole behavior is governed by two explicit knobs — depth (how many times it recurses, default 2) and breadth (how many parallel queries per level, default 4). That's the single most important design idea in the category made literal: the breadth-versus-depth tradeoff that controls both your bill and your report quality should be a config value you set, not an emergent property of a prompt.

The specialists#

Fully local deep-research assistant running entirely on Ollama or LM Studio — summarize, find the knowledge gap, refine the query, repeat — with no cloud LLM dependency.

The privacy pick. Nothing leaves the machine: the model is local and search backends are pluggable (DuckDuckGo, SearXNG, Tavily, Perplexity). Because it ships as a LangGraph Studio graph, the gap-detection loop is inspectable node-by-node — useful when a local model goes off the rails and you need to see where.

A deployable full-stack clone of OpenAI Deep Research (Next.js, Vercel AI SDK, shadcn/ui) built on Firecrawl's extract+search, shipping with auth and storage.

Most repos here are scripts; this is an app. It leans on Firecrawl's extract with JSON-schema-validated outputs to turn scraped pages into typed data, and supports reasoning models across providers. Reach for it when the deliverable is a product, not a notebook.

Hugging Face's open replication of Deep Research, built as a code agent: it writes and executes Python to express multi-step actions instead of emitting JSON tool calls.

The most interesting architectural bet. By having the agent write code to orchestrate its tools, smolagents' open replication reached 55% pass@1 on GAIA's validation set and topped the open leaderboard — against roughly 67% for OpenAI's original. That gap is the clearest published measure of how far open deep-research has come, and how far it still has to go.


If you only do one thing with this list: clone dzhng/deep-research, read all 500 lines, and notice how much of "deep research" is just a recursion with two well-chosen knobs. Then pick the heavier repo that matches your stack. And benchmark before you trust any of them — DeepResearch Bench (100 PhD-level tasks, scored for both report quality and citation support) exists precisely because a confident, well-formatted report is the easiest thing in the world for an agent to fake. For the how, see our guide to evaluating a deep-research agent.