Skip to content

Commit 12a873e

Browse files
authored
Merge pull request #42 from yigitkonur/codex/parser-fidelity-hardening
[codex] parser fidelity hardening
2 parents 4ec6c57 + 2e58cf5 commit 12a873e

142 files changed

Lines changed: 11879 additions & 281 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,10 +88,15 @@ continues resume abc123 --in gemini
8888

8989
# Or pass flags through to the destination tool:
9090
continues resume abc123 --in codex --yolo --search --add-dir /tmp
91+
92+
# Or print the exact handoff prompt without launching the target tool:
93+
continues resume abc123 --in codex --debug-prompt
9194
```
9295

9396
`continues` maps common flags (model, sandbox, auto-approve, extra dirs) to the target tool's equivalent. Anything it doesn't recognize gets passed through as-is.
9497

98+
`--debug-prompt` is for handoff inspection and testing. It writes the handoff file as usual, then prints the exact prompt that would be passed to the target agent and exits without launching it.
99+
95100
### Scripting & CI
96101

97102
```bash
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Parser Documentation Master Overview
2+
3+
Access date: 2026-04-15
4+
5+
This directory is now a complete parser-research package for `continues`. The five context folders cover storage format, message schema, tool-call fidelity, direct-access recipes, and handoff-pointer design across all 16 canonical tools defined in `src/types/tool-names.ts`.
6+
7+
## What The Research Now Establishes
8+
9+
The current product problem is not that `continues` fails to read sessions at all. The deeper issue is that the parser layer mixes three concerns that need to be separated:
10+
11+
1. schema truth
12+
2. readability-oriented summarization
13+
3. handoff navigation
14+
15+
The current codebase is strongest at summarization. It is weaker at surfacing raw-storage truth and weaker still at showing the receiving agent how to go deeper.
16+
17+
Across the five contexts, the strongest cross-cutting finding is consistent:
18+
19+
- the current four-preset top-level model in `src/config/verbosity.ts` is the wrong user-facing abstraction
20+
- the handoff needs a top-of-document technical pointer block
21+
- exact upstream tool/function names should survive the extraction pipeline
22+
- assistant-message reconstruction needs to be treated as a parser problem, not a markdown-rendering problem
23+
- several tool adapters are materially stale versus current upstream storage reality
24+
25+
## Highest-Risk Drift Versus Current Code
26+
27+
These are the most urgent mismatches repeatedly confirmed across contexts:
28+
29+
| Tool | Main drift | Why it matters |
30+
| --- | --- | --- |
31+
| Gemini | Current upstream code writes JSONL append logs, while `src/parsers/gemini.ts` still targets older JSON session objects. | Discovery, message reconstruction, tool-call recovery, and pointer design are all at risk. |
32+
| Kiro | Public docs now describe SQLite-backed session storage under `~/.kiro/`, while `src/parsers/kiro.ts` targets legacy JSON under app-support paths. | The parser may be aimed at the wrong product surface or a stale backend. |
33+
| Antigravity | Current parser targets `code_tracker` records, while current evidence points toward other conversation stores and/or tracker formats. | Session discovery and transcript fidelity are highly provisional. |
34+
| Qwen Code | Current official repo/runtime evidence points to different runtime roots than the parser assumes. | `continues` may miss sessions entirely on current installs. |
35+
| Crush | Current parser assumes a global DB path, but current evidence points to project-local `.crush/crush.db`. | Discovery and session selection can be wrong. |
36+
| Amp | Current parser assumptions about thread storage are weakly documented and likely incomplete. | Pointer design and reliability are low-confidence. |
37+
| Kilo Code | Evidence conflicts between legacy VS Code `ui_messages.json` task storage and newer DB-backed/OpenCode-style backends. | The current parser may be tied to a legacy branch only. |
38+
| Cursor | Transcript fidelity is intentionally partial in some first-party statements. | Any handoff should warn about transcript completeness limits. |
39+
40+
## Strongest Product-Level Conclusions
41+
42+
### 1. Simplify the user-facing contract, but do not collapse every concern into one toggle
43+
44+
The user-facing problem is not “how many samples should I see.” It is “can I continue immediately, and can I go deeper when I need to.” The local research strongly supported a two-mode design, but the adversarial internet verification found that major CLIs more often separate:
45+
46+
- a human-readable session/transcript surface
47+
- a machine-readable or inspection-oriented output surface
48+
- explicit session/checkpoint/raw inspection commands
49+
50+
So the safer conclusion is:
51+
52+
- keep moving away from `minimal | standard | verbose | full` as the main user-facing contract
53+
- but do not assume `default | full` alone is the final answer
54+
- keep room for a separate machine/inspection axis such as structured output or explicit inspect modes
55+
56+
### 2. Add raw-storage orientation early, but keep it concise and navigational
57+
58+
The local research favored a top-of-document technical pointer block. The external verification supports that only if the block is treated as navigational metadata rather than the universal primary payload.
59+
60+
What should appear early is:
61+
62+
- where the real raw source lives
63+
- what backend it uses
64+
- what session handle is canonical
65+
- how to inspect deeper data immediately
66+
- how trustworthy the parser view is
67+
68+
The pointer block should be short in the default human-facing view and richer in deeper/debug/inspection views. Transcript and summary content should remain first-class, not demoted into a secondary concern for every user.
69+
70+
### 3. Preserve raw fidelity alongside summarized categories
71+
72+
`SummaryCollector` and the grouped markdown sections are useful, but they should not be the only truth carried forward. The next redesign should preserve:
73+
74+
- exact upstream tool/function names
75+
- argument carrier location
76+
- result carrier location
77+
- raw-fidelity warning when outputs are omitted or transcript completeness is partial
78+
79+
The external verification also found a useful refinement:
80+
81+
- exact upstream names should survive in raw/debug/full views
82+
- normalized labels can still remain in the default human summary where they improve readability
83+
84+
### 4. Treat assistant-message reconstruction as tool-specific
85+
86+
Recent-message trimming is unsafe when it assumes one uniform turn model. The research repeatedly showed that assistant output may be split across:
87+
88+
- assistant text records
89+
- assistant tool-call records
90+
- user tool-result carrier records
91+
- append-log mutations or rewinds
92+
- structured part tables in SQLite
93+
94+
The redesign needs tool-specific reconstruction before truncation.
95+
96+
## What Is Now Complete
97+
98+
- `storage-format/`: verified or challenged current storage-root assumptions
99+
- `message-schema/`: documented assistant-message and chronology models
100+
- `tool-call-map/`: mapped exact tool-call encoding and fidelity losses
101+
- `access-recipes/`: documented how to inspect deeper raw data directly
102+
- `handoff-pointers/`: translated evidence into default/full pointer-block recommendations
103+
- `prompts/2026-04-15-cto-session-schema-audit.md`: research mission prompt
104+
105+
## What Still Needs To Happen
106+
107+
- implement the redesign in code
108+
- validate unresolved tools from live local captures
109+
- update parser help text and registry metadata to stop presenting legacy paths as canonical
110+
111+
Use `01-parser-redesign-backlog.md` for implementation order and `99-open-questions.md` for unresolved validation targets.

0 commit comments

Comments
 (0)