I spend most of my time these days working with LLM agents. Not chatting with them. Working with them. I treat them like a junior dev team: I set architectural direction, write design specs, review code, and dive into complex areas directly. The agent handles the volume work, the test scaffolding, the boilerplate. This arrangement works well for coding but I noticed issues with research. Different sessions led to different research and a lot of repetition and outbound requests as well as large context blocks.
The problem is straightforward. When an agent needs to look something up, it reaches for web search or tries to recover it from training data. Web search is a slot machine. Sometimes you get what you need. Sometimes not so much. There are sites catering to llm research but even they often just dump a multi-meg md file and call it good. The agent has no way to evaluate the quality of what it found, no way to verify it against the actual docs, and you have no visibility into what it searched for or what it got back. Training data is worse because it's stale and you can't tell when the agent is confidently wrong.
I built doctrove to help 1.
What it is
doctrove is a local documentation store 1. It mirrors LLM-targeted content from websites, indexes it for full-text search, tracks changes with git, and exposes everything through MCP tools 2 that the agent calls directly during its work. The agent searches local docs instead of the open web. You can see exactly what it found. The docs are versioned and refreshable.
The key design constraint: replace the free-form, invisible research process with a deterministic, visible, maintainable document store that both you and the agent use together.
The llms.txt ecosystem
There's a growing convention where sites publish /llms.txt files 3, structured content specifically formatted for LLM consumption. Stripe does it. Supabase does it. Deno, Vercel, the MCP spec itself. These files link to companion documents that cover specific topics in depth.
doctrove discovers this content automatically. Point it at a domain and it probes well-known paths, parses companion links from llms.txt, checks sitemaps for markdown and text files, and detects documentation platforms like Docusaurus or MkDocs. For sites that don't have llms.txt but serve HTML docs, it cleans and converts the HTML to markdown, stripping nav chrome, sidebars, and JavaScript framework artifacts.
doctrove discover https://stripe.com # see what's available
doctrove grab https://stripe.com # mirror it locally
That's it. The content is now local, indexed, and available to any MCP-connected agent.
Why local matters
Three reasons.
Consistency. When the agent searches your doctrove workspace, it gets the same results every time for the same query. No ranking algorithm changes, no sponsored results, no network failures. The content is files on disk.
Visibility. I built a companion tool called eventrelay 4 that gives me a real-time dashboard of what the agent is doing. Every MCP tool call, every search query, every result set. When the agent searches for "authentication" and gets 34 hits across three sites, I can see that. When it reads a specific section of the MCP transport spec, I can see that too. This changed how I work with agents. Instead of trusting that the agent did good research, I can verify it.
Accumulation. The workspace gets better over time. Every sync is a git commit. Summaries written by agents persist across sessions. Category corrections stick. When you refresh a site, ETag caching skips unchanged files and git records what actually changed. After a few weeks on a project, the doctrove workspace becomes a curated knowledge base shaped by actual usage.
How agents use it
The MCP interface 2 exposes 20 tools designed for hierarchical drill-down. This is deliberate. Agent context windows are expensive. You don't want the agent reading entire files when it needs one section.
The workflow goes: trove_catalog to find which site covers a topic. trove_search with category and path filters to find relevant files. trove_outline to see the heading structure and section sizes. trove_read with a section parameter to read just what's needed. If the agent reads something large, trove_summarize caches a summary so the next agent (or the next session) can decide whether to read the full content without spending the tokens.
This is the opposite of how agents normally do research. Instead of one big web search that returns whatever it returns, the agent navigates a structured store with predictable results. Each step narrows the scope. The agent learns what's available and builds on prior work.
The feedback loop
Two tools let the agent improve the store as it works.
trove_tag overrides the automatically assigned category for a file. The categorizer uses path patterns and body heuristics to assign one of 11 categories (api-reference, tutorial, guide, spec, changelog, and so on). It gets it right most of the time. When it doesn't, the agent fixes it. The correction persists across re-syncs.
trove_summarize stores a 2-5 sentence summary that appears in search results and outlines. This is the accumulation mechanism. An agent that reads a 50KB API reference can leave behind a summary that saves the next agent from reading it at all.
Both of these are deterministic. They write to the index. They survive refreshes. They're visible in the workspace. No magic, no hidden state.
Determinism over cleverness
A design principle that runs through the whole project: prefer deterministic processes and heuristics over anything that depends on LLM inference at runtime.
Content discovery is rule-based. Check well-known paths. Parse companion links. Probe sitemaps. Detect platforms by HTML signatures. None of this requires an LLM call. The categorizer uses path pattern matching with body analysis as fallback. FTS5 search uses SQLite 5 with path boosting, not vector embeddings. The content pipeline is a fixed sequence: fetch, detect JavaScript shells, clean HTML, convert to markdown, rewrite links, compare with existing content, store.
This means the system is predictable. When something goes wrong you can trace exactly what happened. When you add a new site you know what discovery will find. When you search you understand why results rank the way they do.
The LLM's role is at a higher level: deciding what to search for, choosing which section to read, writing summaries, correcting categories. The infrastructure underneath is conventional software.
Architecture
doctrove is a Go library first, CLI and MCP server second. The engine package is the primary API surface. Everything is behind interfaces with dependency injection via functional options. Ten swappable components: HTTP fetcher, content discoverer, syncer, indexer, version store, event emitter, content processor, categorizer, summarizer, and discovery providers.
Storage is plain files on disk, git for versioning, SQLite FTS5 for search. No database server. No external services. The workspace is self-contained and shareable.
This matters for extensibility. The library design with functional options means you can swap components without forking. Replace the HTTP fetcher with one that uses Playwright for JavaScript-heavy sites. Plug in an LLM-based categorizer if the rule-based one doesn't fit your domain. Add discovery providers for doc aggregators or package registries. The engine doesn't care.
I structured it this way because I've been working on a premium layer (protrove) that imports doctrove as a library and extends it with additional capabilities. Library-first design makes that clean.
Context7 integration
Not every project's documentation lives at a website with llms.txt. Context7 6 maintains a community-curated collection of library documentation formatted for LLM consumption. With an API key configured, doctrove resolves bare library names to Context7 content:
doctrove grab react
doctrove grab stripe-node
This content gets stored under synthetic domains and categorized separately. It fills the gap for libraries that haven't adopted llms.txt yet.
What I actually use this for
Day to day, my workflow looks like this. I'm working on a project that uses the MCP protocol. Instead of hoping the agent knows the spec from training data, I grab the actual current spec:
doctrove grab https://modelcontextprotocol.io
Now when the agent needs to check how streamable HTTP transport works, or what the elicitation protocol looks like, it searches local docs that match the current published spec. Not training data from six months ago. Not a random Stack Overflow answer.
I have eventrelay running, so I can see the agent's research in real time. If it's searching for the wrong things, I redirect it. If it's reading irrelevant files, I know immediately. The agent's research process is no longer a black box.
Over time, the workspace accumulates summaries and category corrections from multiple sessions. Files that were hard to navigate get summarized. Pages that were miscategorized get fixed. The documentation gets better with use.
The development process
I built doctrove the same way I build everything now: as the architect working with an LLM agent team. I wrote the design spec and DESIGN.md. I made the architectural decisions: library-first, functional options, interface segregation, three-layer architecture. I reviewed every significant piece of code. I dove directly into the complex parts: the content pipeline, the FTS5 indexing, the discovery orchestration.
The agent handled the volume: implementing CLI commands that mirror the MCP tools, writing table-driven tests, building out the Cobra command structure, wiring up the event emission. Straightforward work that follows established patterns.
This is the model I think works. The human sets direction and handles complexity. The agent handles throughput. Both work in the same codebase with full visibility into what the other is doing.
doctrove exists because I wanted that same model for research. The agent does the searching, reading, and summarizing. I see what it's doing and steer when needed. The workspace we build together persists and improves.
Getting started
make install
make init-workspace
doctrove mcp-config # copy the output to your agent's MCP config
Add some documentation sources, let the agent search them instead of the web, and watch what happens to the quality of its research.
The project is MIT licensed and on GitHub at github.com/dmoose/doctrove 1.