Knowledge, Learning & PublishingHub8 min read15 sources
Compiled Knowledge Systems
A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.
What to use this for
What is source-backed research in an AI workflow?
A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.
3 key takeaways
- Personal Knowledge Systems
- AI-Assisted Content Systems
- Agent Memory & Context Systems
Best for
Readers trying to answer: What is source-backed research in an AI workflow?
Related next read
Source backing
15 source notes support this synthesis.
A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.
Visual navigation Use the cluster tools to review this hub as a navigable system, not only as prose: - Compiled Knowledge Cluster Dashboard - Compiled Knowledge Cluster - Local visuals
!Illustrative compiled knowledge loop
*Generated conceptual illustration, not source evidence: raw captures become a curated synthesis map, then loop back through dashboards, validation, and file-back.*
- 01ARaw evidence → BIngest
- 02B → CCompiled wiki
- 03C → DHub and concept pages
- 04D → EQuery from synthesis first
- 05E → FUseful new synthesis
- 06F → GFile back into wiki
- 07G → C
- 08C → HLint and curation
View source diagram
flowchart TD
A["Raw evidence"] --> B["Ingest"]
B --> C["Compiled wiki"]
C --> D["Hub and concept pages"]
D --> E["Query from synthesis first"]
E --> F["Useful new synthesis"]
F --> G["File back into wiki"]
G --> C
C --> H["Lint and curation"]
H --> I["Merge, demote, archive, relink"]
I --> CWhy this matters
The vault should not behave like a RAG bucket, a clipping dump, or a graph visualization demo. Its value comes from compounding editorial synthesis.
The core move is simple: instead of making the model rediscover the same ideas from raw sources every time, the system writes durable pages that preserve the best conceptual work. Later queries start from the maintained wiki, then fall back to raw evidence when needed.
The newest Codex workflow sources add an operational version of the same pattern: messy data and scattered feedback should be turned into reviewable artifacts rather than overwritten or summarized away. A good compiled knowledge system preserves raw evidence, creates a cleaned or synthesized copy, records confidence and source links, and leaves the next action visible.
A newer Codex skills source adds the maintenance loop this vault now uses: when a compile workflow works, turn the thread, commands, scripts, and quality bar into a skill so future ingest is consistent. A newer world-knowledge paper adds an adjacent research pattern: agents can explore a site or environment and compile a guidebook before later tasks need it.
A newer Karpathy interview source reinforces the original reason this hub exists. A model can recompile documents into a wiki that becomes easier for both humans and later agents to read. The point is not to replace understanding with storage; it is to maintain a persistent synthesis layer so each later question starts from accumulated structure instead of raw rediscovery.
This page organizes:
- LLM Memory
- Second Brain Systems
- Personal Knowledge Systems
- AI-Assisted Content Systems
- Chief of Staff Agents
- Agent Memory & Context Systems
Core thesis
Knowledge systems become useful when they separate evidence from synthesis.
| Layer | Purpose | Failure if weak |
|---|---|---|
| Raw evidence | Preserve provenance and detail | The wiki drifts or becomes unverifiable |
| Compiled wiki | Preserve durable synthesis | Every query starts from scratch |
| Hubs | Give readers a map | The vault becomes a flat page list |
| Concept pages | Explain ideas worth revisiting | Pages become source summaries |
| Supporting pages | Keep narrow material findable | Details overwhelm the main path |
| Operating schema | Teach agents how to maintain the system | Curation becomes inconsistent |
The wiki should improve human understanding, not merely increase recall. Source ingestion is valuable when it changes the durable map: a hub gets sharper, a concept gets better boundaries, a contradiction is recorded, or a useful answer becomes a file-backed artifact.
Implementation model
1. Ingest should update the map, not just add pages
New sources should usually improve existing pages. A new first-class page is justified only when the source reveals a durable concept, workflow, or case that cannot be handled inside the current map.
The raw/ folder is intentionally an inbox plus evidence layer. Sources should remain there after successful compile; the manifest, not file movement, determines whether a source is new, changed, already represented, compiled, quarantined, or deferred.
That creates a simple daily rhythm:
- 01ADrop source into raw → BAudit manifest status
- 02B → C{New or changed?}
- 03C →|No| DLeave source as evidence
- 04C →|Yes| ECompile into wiki synthesis
- 05E → FUpdate source notes and manifest
- 06F → GLint links, raw refs, assets, and orphans
- 07G → D
View source diagram
flowchart TD
A["Drop source into raw"] --> B["Audit manifest status"]
B --> C{"New or changed?"}
C -->|No| D["Leave source as evidence"]
C -->|Yes| E["Compile into wiki synthesis"]
E --> F["Update source notes and manifest"]
F --> G["Lint links, raw refs, assets, and orphans"]
G --> DThis preserves Karpathy's simple source/wiki split while still supporting incremental scheduled ingest.
2. Hubs should curate
A hub is not a folder index. It should:
- state the domain thesis
- show a visual map
- name the best reading path
- explain why adjacent pages matter
- route narrow details through supporting-page sections in the index
3. Concepts should earn their place
A concept page should answer:
What does this help the reader understand that raw sources alone do not make obvious?
If a page mostly preserves a single source, it belongs as reference or case material.
4. Retrieval is not enough
Retrieval is valuable for recall, precision, and evidence. But the compiled layer adds something retrieval alone does not:
- reconciled claims
- cross-source synthesis
- durable terminology
- reading paths
- editorial judgment
- remembered conclusions
5. The system should learn from use
Strong knowledge systems have learning loops:
- ingest new sources
- answer questions from the wiki first
- file back strong answers
- lint weak links and stale claims
- merge or demote pages when structure gets flat
- promote repeatable maintenance workflows into skills
6. Feedback synthesis is a compile operation
The feedback-synthesis source is a practical example of compilation outside a wiki:
- gather feedback from Slack, issues, surveys, notes, or exports
- cluster repeated themes
- retain source links or IDs
- mark confidence and ambiguity
- separate product follow-up from engineering follow-up
- draft the next artifact only after review
That is the same pattern this vault should use for raw sources. The output should not be a prettier inbox. It should be a reviewable synthesis layer with provenance.
7. Cleaning data is preservation plus transformation
The messy-data source adds a useful rule for all compiled systems:
Keep the original unchanged; write the improved copy separately.
For this vault, that means:
raw/stays immutable evidencewiki/carries enriched synthesis- reports explain what changed
- uncertain or low-confidence material stays visible instead of silently disappearing
The spreadsheet-cleaning example makes that rule more concrete. A good derivative artifact should preserve source row IDs when possible, normalize only the requested fields, and emit a short data-quality note that distinguishes:
- rows changed confidently
- rows removed as duplicates or summary noise
- rows that could not be normalized cleanly
That distinction matters because transformation without an uncertainty trail turns maintenance into silent corruption.
8. Successful workflows should compile into skills
The reusable-skills source adds a meta-rule for this vault:
If an agent workflow becomes useful and repeatable, compile the workflow itself.
That means preserving:
- the trigger
- the source material the agent should read
- scripts and commands that should be reused
- examples of good outputs
- validation checks
- failure corrections discovered in later runs
This is why /kb-compile belongs as a skill-backed command rather than as a loose prompt. It turns maintenance from heroic one-off editing into an improving operating procedure.
9. Environment guidebooks are compiled knowledge artifacts
The world-knowledge paper describes a pattern close to a domain-specific mini-wiki: an agent explores a website, groups pages, summarizes useful facts, and produces a guidebook that later tasks can use.
The useful rule for this vault is:
- raw browsing or crawling is evidence
- the guidebook is compiled knowledge
- later task execution should start from the guidebook, then fall back to fresh inspection when the guidebook is insufficient
That same pattern applies to codebases, websites, Obsidian vaults, vendor docs, and local workflows.
Visual and editorial standard
Use visuals when they explain structure:
- Mermaid maps for systems and workflows
- tables for distinctions and decision rules
- screenshots for real interfaces or evidence artifacts
- source images for infrastructure, mission systems, or product surfaces
- generated conceptual images for teaching aids and memory hooks, with clear "not evidence" provenance
Avoid decorative imagery. A beautiful knowledge base is one where structure, priority, and evidence are immediately legible.
Visual reading cues
Read the image as the mental model for this whole page:
| Visual cue | What it means in this page |
|---|---|
| Messy source pile | Capture can stay broad because raw/ preserves evidence. |
| Narrow synthesis gate | Compile is an editorial decision, not a bulk summarization step. |
| Linked map | The wiki becomes useful when hubs, concepts, and supporting pages explain relationships. |
| Dashboard and loop | Views, lint, and file-back keep the system alive after the first compile. |
Failure modes
- One page per source.
- A flat index that lists everything.
- Link density without a reading path.
- Graph aesthetics without editorial judgment.
- Ingest that only appends and never consolidates.
- Treating source notes as enough when the page lacks insight.
- Forgetting that
raw/is evidence andwiki/is synthesis. - Overwriting or hiding the messy source instead of preserving it beside the cleaned synthesis.
- Leaving successful maintenance workflows trapped in chat instead of compiling them into skills.
- Treating an environment guidebook as ground truth after the environment changes.
- Cleaning operational data in place with no derivative artifact or uncertainty note.
Read next
- Agent Memory & Context Systems for memory, retrieval, compaction, and persistence.
- Personal Knowledge Systems for local vault and domain-specific system design.
- Second Brain Systems for operator support across long work histories.
- AI-Assisted Content Systems for publishing and connection-finding workflows.
- Agent Execution Systems for turning maintenance loops into repeatable agent workflows.
Answers
Frequently asked
- What is source-backed research in an AI workflow?
- A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.
- How should personal knowledge systems support AI workflows?
- A useful personal knowledge system gives AI tools durable context, source-backed summaries, reusable patterns, and clear boundaries between private evidence and public synthesis.
- What is a key takeaway about Compiled Knowledge Systems?
- Personal Knowledge Systems
Evidence
Source Notes
- S01`raw/Karpathy's 400,000-Word Obsidian Wiki Has Zero RAG Infrastructure.md` - anchor source for the raw/wiki/schema architecture, ingest-query-lint-file-back loop, and `index.md`/`log.md` operating files.
- S02`raw/Creating a Second Brain with Claude Code.md` - local archive, semantic plus keyword retrieval, profile files, distilled context, hooks, tool connectors, and multi-timescale learning.
- S03`raw/PASK Toward Intent-Aware Proactive Agents with Long-Term Memory.md` - proactive memory layers and calibrated intervention.
- S04`raw/Part 2 Your Second Brain System (Done For You).md` - domain-specific vaults and operational second-brain framing.
- S05`raw/Skill Graphs > SKILL.md` - graph-native model of small composable files linked through prose and frontmatter.
- S06`raw/tobiqmd mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local.md` - local retrieval and contextual search hierarchy.
- S07Historical source note: Thread by @krishdotdev (raw file currently missing from vault) - skepticism about graph-as-display and emphasis on usefulness over visual novelty.
- S08`raw/I Post Every Day. No Team. No Agency. Just Obsidian + Claude. Here Is the Exact System..md` - solo publishing system built from markdown capture, connection-finding, brief generation, voice-conditioned drafting, and performance feedback.
- S09`raw/Clean and prepare messy data Codex use cases.md` - added the preservation-plus-transformation rule: keep the original unchanged, produce a cleaned derivative, preserve row identifiers when possible, and attach a short data-quality note for changed, removed, and uncertain rows.
- S10`raw/Turn feedback into actions Codex use cases.md` - added feedback synthesis as a compile pattern: cluster cross-source signals, preserve evidence links, mark confidence, and create a reviewable artifact.
- S11`raw/Save workflows as skills Codex use cases.md` - added the rule that successful recurring Codex workflows should be compiled into reusable skills with commands, examples, scripts, and validation.
- S12`raw/Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration.md` - added environment guidebooks as compiled knowledge artifacts produced through exploration and used during later task execution.
- S13`raw/README.md` - represented the raw-folder operating convention: `raw/` stays as capture inbox and evidence layer, while `ingest-manifest.json` carries incremental compile state and prevents unchanged sources from being reprocessed.
- S14`raw/Andrej Karpathy From Vibe Coding to Agentic Engineering.md` - reinforced the LLM-maintained wiki pattern as persistent compilation rather than RAG rediscovery, and the idea that the wiki should improve human understanding.
- S15`assets/knowledge/compiled-knowledge-loop-2026-05-04.png` - generated conceptual illustration added on 2026-05-04 to make the raw-to-wiki-to-file-back loop easier to learn visually; illustrative only, not evidence.