AI, Agents & SoftwareReference4 min read1 sources
Context Compaction
Context compaction is the problem of preserving enough intermediate state for a model to keep reasoning effectively without carrying the full cost of its entire visible history at every step.
What to use this for
What should readers understand about Context Compaction?
Context compaction is the problem of preserving enough intermediate state for a model to keep reasoning effectively without carrying the full cost of its entire visible history at every step.
3 key takeaways
- models can learn to segment their own reasoning into coherent blocks
- they can compress completed blocks into dense "mementos"
- they can continue reasoning from those compressed records instead of from the full prior trace
Best for
Readers exploring ai, agents & software through what should readers understand about context compaction?
Related next read
Source backing
1 source notes support this synthesis.
Supports
Context compaction is the problem of preserving enough intermediate state for a model to keep reasoning effectively without carrying the full cost of its entire visible history at every step.
Why this matters
Long reasoning traces, agent trajectories, and tool-heavy workflows all run into the same bottleneck: useful state accumulates faster than it can be kept in hot context economically. Most current systems handle that pressure externally through summarization, retrieval, restart loops, or orchestration logic. The stronger claim in this source is that context management can also become a learned model skill.
That matters because compaction sits between short-lived context and long-term memory. It is not full historical memory, but it is also not just prompt trimming. It is a way of carrying forward the right conclusions from a trajectory while discarding the full cost of every intermediate step.
Core thesis
The source argues that context compaction should be treated as a first-class capability:
- models can learn to segment their own reasoning into coherent blocks
- they can compress completed blocks into dense "mementos"
- they can continue reasoning from those compressed records instead of from the full prior trace
- some of the important state survives not only in the visible summary tokens, but also in the model's internal KV representations
This makes compaction both a systems problem and a training problem.
Framework / model
1. External compaction and learned compaction are different
Many current systems compact context from outside the model:
- summarizing earlier turns
- restarting with condensed context
- retrieving only selected history
- wrapping the model in orchestration logic
The source's stronger move is to teach the model to do some of this itself during generation rather than relying entirely on external scaffolding.
2. The Memento loop
The learned loop in the source has four parts:
- segment the reasoning trace into coherent blocks
- produce a dense memento for the completed block
- evict the earlier block from active attention / KV cache
- continue from the retained mementos plus the current live block
This creates a sawtooth context profile: context grows during active reasoning, then drops when the completed block is compacted and evicted.
3. Compaction preserves different things at different layers
The source makes an especially useful distinction:
- visible memento tokens preserve explicit compressed conclusions
- retained KV states preserve additional implicit information from the original block
This means context is not carried forward only through text. Some relevant state survives through internal representations computed while the block was still visible.
4. Compaction quality depends on structure, not just brevity
A useful memento is not a generic summary. It has to preserve:
- decisive intermediate values
- formulas
- methods
- conclusions that future steps depend on
The key design idea is state compression, not prose shortening.
5. Compaction is adjacent to memory, but not identical to memory
Context compaction helps with long trajectories inside or near a session. It does not automatically solve:
- durable long-term recall across sessions
- provenance across many revisions
- forgetting policy
- user-level memory governance
It is best understood as one layer in a larger memory stack.
Important examples / reference points
- The Memento work shows that models can learn segmentation and compaction behavior with ordinary training on the right traces instead of needing fully bespoke architecture.
- The strongest empirical result is not only KV reduction, but the discovery that restart-based compaction can lose a meaningful implicit information channel.
- The vLLM implementation matters because it shows compaction is not just a paper abstraction; serving infrastructure and RL pipelines both depend on how eviction is implemented.
- The agent implication is especially important: long CLI or tool-use trajectories naturally produce compaction opportunities at each action-observation block.
Failure modes / limitations
Treating compaction as equivalent to long-term memory
Compaction helps preserve local trajectory state, but it does not replace durable memory systems with provenance and retrieval policy.
Compressing for readability instead of future reasoning
A compact summary can sound good while omitting the exact information the next block needs.
Restarting away the implicit state channel
If the system rebuilds context only from summary text, it may lose signal that survived in internal representations during the original forward pass.
Assuming infrastructure details are secondary
How masking, eviction, and serving are implemented changes whether the theoretical benefits survive contact with production workloads.
Practical implications
For memory-system design
- treat context compaction as a distinct layer from long-term memory
- decide explicitly what should stay raw, what should be compacted, and what should be retrieved later
- preserve provenance where possible so compaction does not become silent drift
For agent builders
- long-running agents need compaction mechanisms, whether external, learned, or both
- action-observation loops are natural compaction boundaries
- internal compaction may reduce cost pressure, but it does not eliminate the need for harness design
For model and infrastructure work
- training data quality matters because compaction is a state-preservation task, not generic summarization
- evaluation should measure both cost savings and downstream reasoning quality
- serving infrastructure must respect the difference between logical masking and actual state retention
Tensions / open questions
- How much context compaction should live inside the model versus inside the harness?
- When does learned compaction outperform external summarize-and-restart schemes?
- How far can the implicit KV-information channel be relied on or evaluated?
- What compaction strategy works best for long-running terminal agents rather than benchmark reasoning traces?
- How should compaction interact with long-term memory, retrieval, and forgetting policy?
Answers
Frequently asked
- What should readers understand about Context Compaction?
- Context compaction is the problem of preserving enough intermediate state for a model to keep reasoning effectively without carrying the full cost of its entire visible history at every step.
- What makes agent memory useful?
- Agent memory is useful when it preserves decisions, constraints, source trails, and reusable context in a form that improves future work without inventing authority or hiding uncertainty.
- What is a key takeaway about Context Compaction?
- models can learn to segment their own reasoning into coherent blocks
Evidence
Source Notes
- S01`raw/Memento Teaching LLMs to Manage Their Own Context.md` - anchor source on learned context compaction, block-wise "mementos," the dual information stream through summary tokens plus KV state, and the infrastructure implications for agent-like long trajectories.