LLM-native documentation primitives
Context
The Phase 1 docs hardening initiative (feat/docs-hardening) brought the
beef monorepo’s Astro Starlight site to 131 pages, owner-tagged, CI-gated,
and grounded with path:line citations. The site is now reliable for
humans and CI.
The remaining gap is LLM consumption. Today, an agent fetching the docs must scrape Astro HTML, lose frontmatter context, parse markdown by hand, and has no programmatic way to:
- enumerate which doc owns which subject,
- find which TODO(@owner) markers gate which questions,
- distinguish “Customer (Chargebee)” from “User (auth)”,
- verify a
path:linecitation is still valid againstapps/*, - decide when to defer to a codeowner instead of guessing.
Eight options were on the table (see prompt). The decision is which minimum-viable set delivers LLM-native consumption without rewriting prose and without coupling agents to any specific vendor or transport.
Decision
Ship the following primitives on feat/docs-llm-native, amended by the
Foxy360 embedding change:
-
Knowledge-base export (Option 3) — the floor primitive.
scripts/docs-build-kb-export.mjswalksdocs/src/content/docs/**/*.md, chunks bodies on H2 boundaries, and emits:apps/sirloin/internal/app/foxy360/docskb/kb/index.jsonl— one JSON object per chunk with stable fields (doc_id,doc_path,doc_url,title,description,domain,owner,status,last_reviewed,stability,chunk_id/index/total,section,body,code_refs,todos,checksum).apps/sirloin/internal/app/foxy360/docskb/kb/manifest.json— corpus + index + manifest checksums, full doc index with TODO and code_ref aggregations.apps/sirloin/internal/app/foxy360/docskb/kb/glossary.jsonl— term map fromoverview/glossary.md, with parens-disambiguator domain (e.g."Customer (Chargebee)"→term: "Customer",domain: "Chargebee").
-
Foxy360-embedded MCP tools — the generated KB is embedded into Sirloin’s Foxy360 MCP runtime and exposed through authenticated, viewer-accessible read-only tools:
foxy360_docs_search,foxy360_docs_get,foxy360_docs_resolve_glossary, andfoxy360_docs_manifest. The static Astro site no longer serves unauthenticated LLM export files. -
Frontmatter validator (Option 2, narrowed) —
scripts/docs-check-frontmatter.mjs. Hard-fails on missingtitleand on owner tags outside the project allowlist (@law,@pawel,@marty,@vlad,@zen,unassigned). Soft-warns on missingdescription,status,doc_type,last_reviewed,domainsso existing docs aren’t blocked, but new docs feel pressure to comply. -
Read-only local MCP server (Option 4) under
tools/docs-mcp/. TypeScript, stdio transport, no external dependencies beyond@modelcontextprotocol/sdk. Tools:search_docs,get_doc,list_owners,get_code_refs,resolve_glossary. Reads the generated JSONL files atapps/sirloin/internal/app/foxy360/docskb/kb/directly — no daemon, no index server. Ships with a.mcp.jsonfragment teammates can copy in. -
Claude skill (Option 5) at
.claude/skills/beef-docs/. Teaches an LLM how to use the KB correctly: defer to TODO(@owner) instead of guessing, citepath:linewhen claiming code reality, refuse on owner-judgment questions. -
Citation linter (Option 6) —
scripts/docs-check-citations.mjs. Scans forpath:lineandpath:start-endreferences against the live tree, asserts file exists and line numbers fit. 147 valid citations across 119 unique refs at the time of this ADR; the linter will be enforced going forward. -
Glossary JSONL (Option 8) — folded into deliverable 1; emitted by the same build script.
CI adds three new jobs in .github/workflows/docs-quality.yml, mirroring
the existing proto-sync-check pattern:
kb-export-build— builds the KB export in scratch and runs the round-trip smoke test.frontmatter-check— hard requirements above.citation-check— brokenpath:linecitations.
The four pre-existing gates (markdown-lint, astro-build,
adr-template-check, proto-sync-check, docs-required-check) remain
untouched. Total CI gates: 4 → 7 (counting the optional scheduled
link-check, 5 → 8).
Skipped this iteration
- Option 7 — Stable slugs / canonical URL audit. Starlight derives
slugs from filenames; the corpus is fresh and slugs are unlikely to
churn before Phase 2 service deep-dives finish. A wholesale alias
table risks paving over the wrong patterns. Raised as
TODO(@law): re-audit slugs after Phase 2 deep dives stabilise the service/* tree.
Consequences
Easier
- Authenticated Claude/Cowork/mobile/desktop clients can consume docs
through Foxy360 MCP tools. Local agents can use
tools/docs-mcp/or direct JSONL reads. No HTML parsing required. - Owner-aware refusal: when an agent encounters a TODO(@owner), it knows to ask the human or quote the open question rather than fabricate a resolution.
- Path:line citations stay honest: CI fails when a doc cites a stale line.
- The MCP server lets compatible agents query semantically without loading the whole corpus into context.
Harder
- Sirloin image builds and local
apps/sirloinmake targets generate the KB before compiling, so docs edits no longer require committing generated JSON files. Directgo test/go buildinvocations outside those paths may neednode scripts/docs-build-kb-export.mjsfirst. - Schema is now load-bearing. Bumping
schema_versionrequires updatingdocs/src/content/docs/standards/llm-consumption.mdand every consumer (MCP server, skill, downstream agents). - The MCP server adds a small TypeScript artifact under
tools/. It is opt-in (teammates wire it via.mcp.jsonif they want).
Risks
- Foxy360 availability now gates hosted Claude access to the KB. Local agents can still generate and query the JSONL export directly.
- Owner tag drift: if the team adopts a new tag, the validator’s allowlist must be updated atomically across the script and the docs.
- The export is a snapshot; agents that cache it must respect
manifest.json#corpus_checksumto detect staleness.
Operational follow-ups
- TODO(@law): rotate Railway + Axiom MCP tokens (carry-over from the Phase 3 sweep findings — unrelated to this ADR but flagged here for the next ops sweep).
- TODO(@law): re-audit Starlight slugs after Phase 2 deep dives stabilise.
Alternatives Considered
- Vendor-coupled approach (e.g. ship a Mintlify/ReadMe LLM endpoint). Rejected: locks the team into a specific docs vendor, doesn’t survive a stack change, and the export must exist anyway as the source of truth.
- Skip the KB export and rely on static LLM text files. Rejected: static text files are either stale uploads or unauthenticated static site artifacts. Without the JSONL primitive, the MCP tools have nothing to serve and the skill has nothing to cite.
- Build a server-side semantic index (HNSW, sqlite-vss) and ship a REST endpoint. Rejected for now: adds infra, hosting, and a moving target. The JSONL + a small in-memory scorer in the MCP server is enough for 1101 chunks. Revisit if corpus grows past ~10k chunks or if cross-team consumption demands a live endpoint.
- Reuse
proto-syncstyle for docs without a manifest. Rejected: proto-sync stores the checksum inline in the generated doc; the KB export has too many output files for that to work cleanly. The manifest centralises the checksum.
Filename: 2026-05-05-llm-native-docs.md