How should personal knowledge systems support AI workflows?

A useful personal knowledge system gives AI tools durable context, source-backed summaries, reusable patterns, and clear boundaries between private evidence and public synthesis.

Knowledge, Learning & PublishingHub8 min read15 sources

Compiled Knowledge Systems

A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.

What to use this for

What is source-backed research in an AI workflow?

A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.

3 key takeaways

Personal Knowledge Systems
AI-Assisted Content Systems
Agent Memory & Context Systems

Best for

Readers trying to answer: What is source-backed research in an AI workflow?

Why this matters

The vault should not behave like a RAG bucket, a clipping dump, or a graph visualization demo. Its value comes from compounding editorial synthesis.

The core move is simple: instead of making the model rediscover the same ideas from raw sources every time, the system writes durable pages that preserve the best conceptual work. Later queries start from the maintained wiki, then fall back to raw evidence when needed.

The newest Codex workflow sources add an operational version of the same pattern: messy data and scattered feedback should be turned into reviewable artifacts rather than overwritten or summarized away. A good compiled knowledge system preserves raw evidence, creates a cleaned or synthesized copy, records confidence and source links, and leaves the next action visible.

A newer Codex skills source adds the maintenance loop this vault now uses: when a compile workflow works, turn the thread, commands, scripts, and quality bar into a skill so future ingest is consistent. A newer world-knowledge paper adds an adjacent research pattern: agents can explore a site or environment and compile a guidebook before later tasks need it.

A newer Karpathy interview source reinforces the original reason this hub exists. A model can recompile documents into a wiki that becomes easier for both humans and later agents to read. The point is not to replace understanding with storage; it is to maintain a persistent synthesis layer so each later question starts from accumulated structure instead of raw rediscovery.

This page organizes:

Core thesis

Knowledge systems become useful when they separate evidence from synthesis.

Layer	Purpose	Failure if weak
Raw evidence	Preserve provenance and detail	The wiki drifts or becomes unverifiable
Compiled wiki	Preserve durable synthesis	Every query starts from scratch
Hubs	Give readers a map	The vault becomes a flat page list
Concept pages	Explain ideas worth revisiting	Pages become source summaries
Supporting pages	Keep narrow material findable	Details overwhelm the main path
Operating schema	Teach agents how to maintain the system	Curation becomes inconsistent

The wiki should improve human understanding, not merely increase recall. Source ingestion is valuable when it changes the durable map: a hub gets sharper, a concept gets better boundaries, a contradiction is recorded, or a useful answer becomes a file-backed artifact.

Implementation model

1. Ingest should update the map, not just add pages

New sources should usually improve existing pages. A new first-class page is justified only when the source reveals a durable concept, workflow, or case that cannot be handled inside the current map.

The raw/ folder is intentionally an inbox plus evidence layer. Sources should remain there after successful compile; the manifest, not file movement, determines whether a source is new, changed, already represented, compiled, quarantined, or deferred.

That creates a simple daily rhythm:

Workflow diagramSteps inferred from diagram markup

01ADrop source into raw → BAudit manifest status
02B → C{New or changed?}
03C →|No| DLeave source as evidence
04C →|Yes| ECompile into wiki synthesis
05E → FUpdate source notes and manifest
06F → GLint links, raw refs, assets, and orphans
07G → D

View source diagram

flowchart TD
  A["Drop source into raw"] --> B["Audit manifest status"]
  B --> C{"New or changed?"}
  C -->|No| D["Leave source as evidence"]
  C -->|Yes| E["Compile into wiki synthesis"]
  E --> F["Update source notes and manifest"]
  F --> G["Lint links, raw refs, assets, and orphans"]
  G --> D

This preserves Karpathy's simple source/wiki split while still supporting incremental scheduled ingest.

2. Hubs should curate

A hub is not a folder index. It should:

state the domain thesis
show a visual map
name the best reading path
explain why adjacent pages matter
route narrow details through supporting-page sections in the index

3. Concepts should earn their place

A concept page should answer:

What does this help the reader understand that raw sources alone do not make obvious?

If a page mostly preserves a single source, it belongs as reference or case material.

4. Retrieval is not enough

Retrieval is valuable for recall, precision, and evidence. But the compiled layer adds something retrieval alone does not:

reconciled claims
cross-source synthesis
durable terminology
reading paths
editorial judgment
remembered conclusions

5. The system should learn from use

Strong knowledge systems have learning loops:

ingest new sources
answer questions from the wiki first
file back strong answers
lint weak links and stale claims
merge or demote pages when structure gets flat
promote repeatable maintenance workflows into skills

6. Feedback synthesis is a compile operation

The feedback-synthesis source is a practical example of compilation outside a wiki:

gather feedback from Slack, issues, surveys, notes, or exports
cluster repeated themes
retain source links or IDs
mark confidence and ambiguity
separate product follow-up from engineering follow-up
draft the next artifact only after review

That is the same pattern this vault should use for raw sources. The output should not be a prettier inbox. It should be a reviewable synthesis layer with provenance.

7. Cleaning data is preservation plus transformation

The messy-data source adds a useful rule for all compiled systems:

Keep the original unchanged; write the improved copy separately.

For this vault, that means:

raw/ stays immutable evidence
wiki/ carries enriched synthesis
reports explain what changed
uncertain or low-confidence material stays visible instead of silently disappearing

The spreadsheet-cleaning example makes that rule more concrete. A good derivative artifact should preserve source row IDs when possible, normalize only the requested fields, and emit a short data-quality note that distinguishes:

rows changed confidently
rows removed as duplicates or summary noise
rows that could not be normalized cleanly

That distinction matters because transformation without an uncertainty trail turns maintenance into silent corruption.

8. Successful workflows should compile into skills

The reusable-skills source adds a meta-rule for this vault:

If an agent workflow becomes useful and repeatable, compile the workflow itself.

That means preserving:

the trigger
the source material the agent should read
scripts and commands that should be reused
examples of good outputs
validation checks
failure corrections discovered in later runs

This is why /kb-compile belongs as a skill-backed command rather than as a loose prompt. It turns maintenance from heroic one-off editing into an improving operating procedure.

9. Environment guidebooks are compiled knowledge artifacts

The world-knowledge paper describes a pattern close to a domain-specific mini-wiki: an agent explores a website, groups pages, summarizes useful facts, and produces a guidebook that later tasks can use.

The useful rule for this vault is:

raw browsing or crawling is evidence
the guidebook is compiled knowledge
later task execution should start from the guidebook, then fall back to fresh inspection when the guidebook is insufficient

That same pattern applies to codebases, websites, Obsidian vaults, vendor docs, and local workflows.

Visual and editorial standard

Use visuals when they explain structure:

Mermaid maps for systems and workflows
tables for distinctions and decision rules
screenshots for real interfaces or evidence artifacts
source images for infrastructure, mission systems, or product surfaces
generated conceptual images for teaching aids and memory hooks, with clear "not evidence" provenance

Avoid decorative imagery. A beautiful knowledge base is one where structure, priority, and evidence are immediately legible.

Visual reading cues

Read the image as the mental model for this whole page:

Visual cue	What it means in this page
Messy source pile	Capture can stay broad because `raw/` preserves evidence.
Narrow synthesis gate	Compile is an editorial decision, not a bulk summarization step.
Linked map	The wiki becomes useful when hubs, concepts, and supporting pages explain relationships.
Dashboard and loop	Views, lint, and file-back keep the system alive after the first compile.

Failure modes

One page per source.
A flat index that lists everything.
Link density without a reading path.
Graph aesthetics without editorial judgment.
Ingest that only appends and never consolidates.
Treating source notes as enough when the page lacks insight.
Forgetting that raw/ is evidence and wiki/ is synthesis.
Overwriting or hiding the messy source instead of preserving it beside the cleaned synthesis.
Leaving successful maintenance workflows trapped in chat instead of compiling them into skills.
Treating an environment guidebook as ground truth after the environment changes.
Cleaning operational data in place with no derivative artifact or uncertainty note.

Frequently asked

What is source-backed research in an AI workflow?: A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.
How should personal knowledge systems support AI workflows?: A useful personal knowledge system gives AI tools durable context, source-backed summaries, reusable patterns, and clear boundaries between private evidence and public synthesis.
What is a key takeaway about Compiled Knowledge Systems?: Personal Knowledge Systems

Evidence

Source Notes

S01`raw/Karpathy's 400,000-Word Obsidian Wiki Has Zero RAG Infrastructure.md` - anchor source for the raw/wiki/schema architecture, ingest-query-lint-file-back loop, and `index.md`/`log.md` operating files.
S02`raw/Creating a Second Brain with Claude Code.md` - local archive, semantic plus keyword retrieval, profile files, distilled context, hooks, tool connectors, and multi-timescale learning.
S03`raw/PASK Toward Intent-Aware Proactive Agents with Long-Term Memory.md` - proactive memory layers and calibrated intervention.
S04`raw/Part 2 Your Second Brain System (Done For You).md` - domain-specific vaults and operational second-brain framing.
S05`raw/Skill Graphs > SKILL.md` - graph-native model of small composable files linked through prose and frontmatter.
S06`raw/tobiqmd mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local.md` - local retrieval and contextual search hierarchy.
S07Historical source note: Thread by @krishdotdev (raw file currently missing from vault) - skepticism about graph-as-display and emphasis on usefulness over visual novelty.
S08`raw/I Post Every Day. No Team. No Agency. Just Obsidian + Claude. Here Is the Exact System..md` - solo publishing system built from markdown capture, connection-finding, brief generation, voice-conditioned drafting, and performance feedback.
S09`raw/Clean and prepare messy data Codex use cases.md` - added the preservation-plus-transformation rule: keep the original unchanged, produce a cleaned derivative, preserve row identifiers when possible, and attach a short data-quality note for changed, removed, and uncertain rows.
S10`raw/Turn feedback into actions Codex use cases.md` - added feedback synthesis as a compile pattern: cluster cross-source signals, preserve evidence links, mark confidence, and create a reviewable artifact.
S11`raw/Save workflows as skills Codex use cases.md` - added the rule that successful recurring Codex workflows should be compiled into reusable skills with commands, examples, scripts, and validation.
S12`raw/Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration.md` - added environment guidebooks as compiled knowledge artifacts produced through exploration and used during later task execution.
S13`raw/README.md` - represented the raw-folder operating convention: `raw/` stays as capture inbox and evidence layer, while `ingest-manifest.json` carries incremental compile state and prevents unchanged sources from being reprocessed.
S14`raw/Andrej Karpathy From Vibe Coding to Agentic Engineering.md` - reinforced the LLM-maintained wiki pattern as persistent compilation rather than RAG rediscovery, and the idea that the wiki should improve human understanding.
S15`assets/knowledge/compiled-knowledge-loop-2026-05-04.png` - generated conceptual illustration added on 2026-05-04 to make the raw-to-wiki-to-file-back loop easier to learn visually; illustrative only, not evidence.

Knowledge, Learning & PublishingHub8 min read15 sources

Compiled Knowledge Systems

A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.

What to use this for

What is source-backed research in an AI workflow?

A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.

3 key takeaways

Personal Knowledge Systems
AI-Assisted Content Systems
Agent Memory & Context Systems

Best for

Readers trying to answer: What is source-backed research in an AI workflow?

Why this matters

The vault should not behave like a RAG bucket, a clipping dump, or a graph visualization demo. Its value comes from compounding editorial synthesis.

This page organizes:

Core thesis

Knowledge systems become useful when they separate evidence from synthesis.

Layer	Purpose	Failure if weak
Raw evidence	Preserve provenance and detail	The wiki drifts or becomes unverifiable
Compiled wiki	Preserve durable synthesis	Every query starts from scratch
Hubs	Give readers a map	The vault becomes a flat page list
Concept pages	Explain ideas worth revisiting	Pages become source summaries
Supporting pages	Keep narrow material findable	Details overwhelm the main path
Operating schema	Teach agents how to maintain the system	Curation becomes inconsistent

Implementation model

1. Ingest should update the map, not just add pages

New sources should usually improve existing pages. A new first-class page is justified only when the source reveals a durable concept, workflow, or case that cannot be handled inside the current map.

That creates a simple daily rhythm:

Workflow diagramSteps inferred from diagram markup

01ADrop source into raw → BAudit manifest status
02B → C{New or changed?}
03C →|No| DLeave source as evidence
04C →|Yes| ECompile into wiki synthesis
05E → FUpdate source notes and manifest
06F → GLint links, raw refs, assets, and orphans
07G → D

View source diagram

flowchart TD
  A["Drop source into raw"] --> B["Audit manifest status"]
  B --> C{"New or changed?"}
  C -->|No| D["Leave source as evidence"]
  C -->|Yes| E["Compile into wiki synthesis"]
  E --> F["Update source notes and manifest"]
  F --> G["Lint links, raw refs, assets, and orphans"]
  G --> D

This preserves Karpathy's simple source/wiki split while still supporting incremental scheduled ingest.

2. Hubs should curate

A hub is not a folder index. It should:

state the domain thesis
show a visual map
name the best reading path
explain why adjacent pages matter
route narrow details through supporting-page sections in the index

3. Concepts should earn their place

A concept page should answer:

What does this help the reader understand that raw sources alone do not make obvious?

If a page mostly preserves a single source, it belongs as reference or case material.

4. Retrieval is not enough

Retrieval is valuable for recall, precision, and evidence. But the compiled layer adds something retrieval alone does not:

reconciled claims
cross-source synthesis
durable terminology
reading paths
editorial judgment
remembered conclusions

5. The system should learn from use

Strong knowledge systems have learning loops:

ingest new sources
answer questions from the wiki first
file back strong answers
lint weak links and stale claims
merge or demote pages when structure gets flat
promote repeatable maintenance workflows into skills

6. Feedback synthesis is a compile operation

The feedback-synthesis source is a practical example of compilation outside a wiki:

gather feedback from Slack, issues, surveys, notes, or exports
cluster repeated themes
retain source links or IDs
mark confidence and ambiguity
separate product follow-up from engineering follow-up
draft the next artifact only after review

That is the same pattern this vault should use for raw sources. The output should not be a prettier inbox. It should be a reviewable synthesis layer with provenance.

7. Cleaning data is preservation plus transformation

The messy-data source adds a useful rule for all compiled systems:

Keep the original unchanged; write the improved copy separately.

For this vault, that means:

raw/ stays immutable evidence
wiki/ carries enriched synthesis
reports explain what changed
uncertain or low-confidence material stays visible instead of silently disappearing

rows changed confidently
rows removed as duplicates or summary noise
rows that could not be normalized cleanly

That distinction matters because transformation without an uncertainty trail turns maintenance into silent corruption.

8. Successful workflows should compile into skills

The reusable-skills source adds a meta-rule for this vault:

If an agent workflow becomes useful and repeatable, compile the workflow itself.

That means preserving:

the trigger
the source material the agent should read
scripts and commands that should be reused
examples of good outputs
validation checks
failure corrections discovered in later runs

This is why /kb-compile belongs as a skill-backed command rather than as a loose prompt. It turns maintenance from heroic one-off editing into an improving operating procedure.

9. Environment guidebooks are compiled knowledge artifacts

The useful rule for this vault is:

raw browsing or crawling is evidence
the guidebook is compiled knowledge
later task execution should start from the guidebook, then fall back to fresh inspection when the guidebook is insufficient

That same pattern applies to codebases, websites, Obsidian vaults, vendor docs, and local workflows.

Visual and editorial standard

Use visuals when they explain structure:

Mermaid maps for systems and workflows
tables for distinctions and decision rules
screenshots for real interfaces or evidence artifacts
source images for infrastructure, mission systems, or product surfaces
generated conceptual images for teaching aids and memory hooks, with clear "not evidence" provenance

Avoid decorative imagery. A beautiful knowledge base is one where structure, priority, and evidence are immediately legible.

Visual reading cues

Read the image as the mental model for this whole page:

Visual cue	What it means in this page
Messy source pile	Capture can stay broad because `raw/` preserves evidence.
Narrow synthesis gate	Compile is an editorial decision, not a bulk summarization step.
Linked map	The wiki becomes useful when hubs, concepts, and supporting pages explain relationships.
Dashboard and loop	Views, lint, and file-back keep the system alive after the first compile.

Failure modes

One page per source.
A flat index that lists everything.
Link density without a reading path.
Graph aesthetics without editorial judgment.
Ingest that only appends and never consolidates.
Treating source notes as enough when the page lacks insight.
Forgetting that raw/ is evidence and wiki/ is synthesis.
Overwriting or hiding the messy source instead of preserving it beside the cleaned synthesis.
Leaving successful maintenance workflows trapped in chat instead of compiling them into skills.
Treating an environment guidebook as ground truth after the environment changes.
Cleaning operational data in place with no derivative artifact or uncertainty note.

Frequently asked

What is source-backed research in an AI workflow?: A compiled knowledge system turns raw capture into durable understanding by maintaining a smaller, linked synthesis layer that improves through ingest, curation, query, and file-back.
How should personal knowledge systems support AI workflows?: A useful personal knowledge system gives AI tools durable context, source-backed summaries, reusable patterns, and clear boundaries between private evidence and public synthesis.
What is a key takeaway about Compiled Knowledge Systems?: Personal Knowledge Systems

Evidence

Source Notes

S01`raw/Karpathy's 400,000-Word Obsidian Wiki Has Zero RAG Infrastructure.md` - anchor source for the raw/wiki/schema architecture, ingest-query-lint-file-back loop, and `index.md`/`log.md` operating files.
S02`raw/Creating a Second Brain with Claude Code.md` - local archive, semantic plus keyword retrieval, profile files, distilled context, hooks, tool connectors, and multi-timescale learning.
S03`raw/PASK Toward Intent-Aware Proactive Agents with Long-Term Memory.md` - proactive memory layers and calibrated intervention.
S04`raw/Part 2 Your Second Brain System (Done For You).md` - domain-specific vaults and operational second-brain framing.
S05`raw/Skill Graphs > SKILL.md` - graph-native model of small composable files linked through prose and frontmatter.
S06`raw/tobiqmd mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local.md` - local retrieval and contextual search hierarchy.
S07Historical source note: Thread by @krishdotdev (raw file currently missing from vault) - skepticism about graph-as-display and emphasis on usefulness over visual novelty.
S08`raw/I Post Every Day. No Team. No Agency. Just Obsidian + Claude. Here Is the Exact System..md` - solo publishing system built from markdown capture, connection-finding, brief generation, voice-conditioned drafting, and performance feedback.
S09`raw/Clean and prepare messy data Codex use cases.md` - added the preservation-plus-transformation rule: keep the original unchanged, produce a cleaned derivative, preserve row identifiers when possible, and attach a short data-quality note for changed, removed, and uncertain rows.
S10`raw/Turn feedback into actions Codex use cases.md` - added feedback synthesis as a compile pattern: cluster cross-source signals, preserve evidence links, mark confidence, and create a reviewable artifact.
S11`raw/Save workflows as skills Codex use cases.md` - added the rule that successful recurring Codex workflows should be compiled into reusable skills with commands, examples, scripts, and validation.
S12`raw/Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration.md` - added environment guidebooks as compiled knowledge artifacts produced through exploration and used during later task execution.
S13`raw/README.md` - represented the raw-folder operating convention: `raw/` stays as capture inbox and evidence layer, while `ingest-manifest.json` carries incremental compile state and prevents unchanged sources from being reprocessed.
S14`raw/Andrej Karpathy From Vibe Coding to Agentic Engineering.md` - reinforced the LLM-maintained wiki pattern as persistent compilation rather than RAG rediscovery, and the idea that the wiki should improve human understanding.
S15`assets/knowledge/compiled-knowledge-loop-2026-05-04.png` - generated conceptual illustration added on 2026-05-04 to make the raw-to-wiki-to-file-back loop easier to learn visually; illustrative only, not evidence.

Compiled Knowledge Systems

What is source-backed research in an AI workflow?

Why this matters

Core thesis

Implementation model

1. Ingest should update the map, not just add pages

2. Hubs should curate

3. Concepts should earn their place

4. Retrieval is not enough

5. The system should learn from use

6. Feedback synthesis is a compile operation

7. Cleaning data is preservation plus transformation

8. Successful workflows should compile into skills

9. Environment guidebooks are compiled knowledge artifacts

Visual and editorial standard

Visual reading cues

Failure modes

Read next

Frequently asked

Source Notes

Compiled Knowledge Systems

What is source-backed research in an AI workflow?

Why this matters

Core thesis

Implementation model

1. Ingest should update the map, not just add pages

2. Hubs should curate

3. Concepts should earn their place

4. Retrieval is not enough

5. The system should learn from use

6. Feedback synthesis is a compile operation

7. Cleaning data is preservation plus transformation

8. Successful workflows should compile into skills

9. Environment guidebooks are compiled knowledge artifacts

Visual and editorial standard

Visual reading cues

Failure modes

Read next

Frequently asked

Source Notes

What is source-backed research in an AI workflow?

Why this matters

Core thesis

Implementation model

1. Ingest should update the map, not just add pages

2. Hubs should curate

3. Concepts should earn their place

4. Retrieval is not enough

5. The system should learn from use

6. Feedback synthesis is a compile operation

7. Cleaning data is preservation plus transformation

8. Successful workflows should compile into skills

9. Environment guidebooks are compiled knowledge artifacts

Visual and editorial standard

Visual reading cues

Failure modes

Read next

Frequently asked

Related Pages

AI-Assisted Content Systems

Agent Execution Systems

Agent Memory & Context Systems

Agent Skills

Agentic Engineering

Chief of Staff Agents

LLM Memory

Personal Knowledge Systems

Second Brain Systems

Source Notes

What is source-backed research in an AI workflow?

Why this matters

Core thesis

Implementation model

1. Ingest should update the map, not just add pages

2. Hubs should curate

3. Concepts should earn their place

4. Retrieval is not enough

5. The system should learn from use

6. Feedback synthesis is a compile operation

7. Cleaning data is preservation plus transformation

8. Successful workflows should compile into skills

9. Environment guidebooks are compiled knowledge artifacts

Visual and editorial standard

Visual reading cues

Failure modes

Read next

Frequently asked

Related Pages

AI-Assisted Content Systems

Agent Execution Systems

Agent Memory & Context Systems

Agent Skills

Agentic Engineering

Chief of Staff Agents

LLM Memory

Personal Knowledge Systems

Second Brain Systems

Source Notes