Skip to content

Commit 4699fd6

Browse files
authored
Merge pull request #40 from nlile/fix/codex-large-jsonl-readline-crash
fix(codex): handle huge session files
2 parents 490944b + 0f1b853 commit 4699fd6

6 files changed

Lines changed: 265 additions & 78 deletions

File tree

.github/instructions/parsers.instructions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ Both must be registered in `src/parsers/registry.ts` with all `ToolAdapter` fiel
3737

3838
## JSONL Streaming
3939

40-
- Stream JSONL with `readline.createInterface` — never `fs.readFileSync` for session files
41-
- Use helpers from `src/utils/jsonl.ts` (`readJsonlFile`, `scanJsonlHead`) when applicable
40+
- Stream JSONL with helpers from `src/utils/jsonl.ts` (`readJsonlFile`, `scanJsonlHead`) when applicable
41+
- Never use `fs.readFileSync` for session files
4242
- Keep only the last ~10 messages in `recentMessages` — do not accumulate the entire conversation
4343

4444
## Tool Summarizer

AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Agent behavior instructions for `continues` — the cross-tool AI session handof
1717
- **Logging**: use `logger` from `src/logger.ts` for all diagnostic output. Never use bare `console.log`/`console.warn`/`console.error` in library code. TUI display goes through `@clack/prompts` or `chalk` via the display layer.
1818
- **Error types**: throw typed errors from `src/errors.ts` on user-facing paths, not bare `new Error()`.
1919
- **Tool activity**: use `SummaryCollector` from `src/utils/tool-summarizer.ts` in every parser. Do not build `ToolUsageSummary[]` arrays manually.
20-
- **JSONL**: stream with `readline.createInterface`, never `fs.readFileSync` + `split('\n')`.
20+
- **JSONL**: use the shared streaming helpers in `src/utils/jsonl.ts`, never `fs.readFileSync` + `split('\n')`.
2121
- **SQLite** (OpenCode, Crush parsers): use built-in `node:sqlite` — do not add third-party SQLite dependencies.
2222
- **Biome rules in force**: `noEmptyBlockStatements` (error), `noUnusedImports` (error), `useConst` (error). Empty `catch {}` blocks fail the linter; use `catch (err) { logger.debug(...) }` instead.
2323

@@ -45,7 +45,7 @@ All five steps are required. Missing any one is a bug. See `CLAUDE.md` for detai
4545

4646
- **Writing to tool storage directories** — the tool is read-only. Any write to `~/.claude/`, `~/.codex/`, etc. is a severe bug.
4747
- **`exec()` with string interpolation** — always use `spawn()` with an argument array in `resume.ts`. Session IDs and paths can contain shell metacharacters.
48-
- **`fs.readFileSync`/`fs.writeFileSync` in parsers** — these block the event loop. Use async fs APIs or `readline`.
48+
- **`fs.readFileSync`/`fs.writeFileSync` in parsers** — these block the event loop. Use async fs APIs or shared streaming helpers.
4949
- **Duplicating parser-helpers**`cleanSummary`, `extractRepoFromCwd`, `homeDir` live in `src/utils/parser-helpers.ts`. Import them; do not reimplement.
5050
- **Hardcoding tool names** — derive from `TOOL_NAMES` or `SessionSource`. Never write `if (tool === 'claude' || tool === 'codex' || ...)`.
5151
- **Importing `node:sqlite` outside OpenCode/Crush parsers** — SQLite is only needed for those two tools. Do not spread this dependency.

REVIEW.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Review guidelines for the `continues` CLI tool — a read-only session parser an
1616
- Use `process.exitCode = N` instead of `process.exit(N)`.
1717
- Biome handles linting and formatting — do not introduce ESLint or Prettier configs.
1818
- Parser functions must return `Promise<UnifiedSession[]>` and `Promise<SessionContext>` respectively. Both must be registered in `src/parsers/registry.ts`.
19-
- JSONL parsing must stream with `readline.createInterface` — do not load entire files into memory with `fs.readFileSync`.
19+
- JSONL parsing must use the shared streaming helpers in `src/utils/jsonl.ts` where possible — do not load entire files into memory with `fs.readFileSync`.
2020
- Use the `SummaryCollector` class from `src/utils/tool-summarizer.ts` for tool activity summaries. Do not manually build summary arrays.
2121
- Shared helpers (`cleanSummary`, `extractRepoFromCwd`, `homeDir`) live in `src/utils/parser-helpers.ts`. Do not duplicate these in individual parsers.
2222
- Error hierarchy: use typed errors from `src/errors.ts` (`ParseError`, `SessionNotFoundError`, `ToolNotAvailableError`, `UnknownSourceError`, `IndexError`, `StorageError`) rather than bare `throw new Error()` for user-facing error paths.

src/__tests__/shared-utils.test.ts

Lines changed: 76 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
* Covers: jsonl, fs-helpers, content, tool-extraction, parser-helpers additions.
44
*/
55

6-
import * as fs from 'fs';
7-
import * as os from 'os';
8-
import * as path from 'path';
6+
import * as fs from 'node:fs';
7+
import * as os from 'node:os';
8+
import * as path from 'node:path';
99
import { afterEach, describe, expect, it } from 'vitest';
1010
import type { ConversationMessage } from '../types/index.js';
1111
import { classifyToolName } from '../types/tool-names.js';
@@ -66,6 +66,15 @@ describe('readJsonlFile', () => {
6666
expect(result).toHaveLength(2);
6767
});
6868

69+
it('skips oversized lines without buffering them', async () => {
70+
const dir = makeTmpDir();
71+
const file = path.join(dir, 'test.jsonl');
72+
fs.writeFileSync(file, '{"ok":true}\n{"payload":"abcdefghijklmnopqrstuvwxyz"}\n{"ok":false}\n');
73+
74+
const result = await readJsonlFile<{ ok: boolean }>(file, { maxLineChars: 12 });
75+
expect(result).toEqual([{ ok: true }, { ok: false }]);
76+
});
77+
6978
it('returns empty array for non-existent file', async () => {
7079
const result = await readJsonlFile('/tmp/nonexistent-file.jsonl');
7180
expect(result).toEqual([]);
@@ -108,6 +117,44 @@ describe('scanJsonlHead', () => {
108117
await scanJsonlHead('/tmp/nonexistent.jsonl', 10, () => 'continue');
109118
// No error thrown
110119
});
120+
121+
it('keeps scanning after skipping an oversized line', async () => {
122+
const dir = makeTmpDir();
123+
const file = path.join(dir, 'test.jsonl');
124+
fs.writeFileSync(file, '{"i":0}\n{"payload":"abcdefghijklmnopqrstuvwxyz"}\n{"i":2}\n');
125+
126+
const visited: number[] = [];
127+
await scanJsonlHead(
128+
file,
129+
5,
130+
(parsed) => {
131+
visited.push((parsed as { i: number }).i);
132+
return 'continue';
133+
},
134+
{ maxLineChars: 10 },
135+
);
136+
137+
expect(visited).toEqual([0, 2]);
138+
});
139+
140+
it('stops at the configured byte limit without visiting partial lines', async () => {
141+
const dir = makeTmpDir();
142+
const file = path.join(dir, 'test.jsonl');
143+
fs.writeFileSync(file, '{"i":0}\n{"i":1}\n');
144+
145+
const visited: number[] = [];
146+
await scanJsonlHead(
147+
file,
148+
5,
149+
(parsed) => {
150+
visited.push((parsed as { i: number }).i);
151+
return 'continue';
152+
},
153+
{ maxBytes: 12 },
154+
);
155+
156+
expect(visited).toEqual([0]);
157+
});
111158
});
112159

113160
describe('getFileStats', () => {
@@ -120,6 +167,15 @@ describe('getFileStats', () => {
120167
expect(stats.lines).toBe(3);
121168
expect(stats.bytes).toBeGreaterThan(0);
122169
});
170+
171+
it('counts a final unterminated line', async () => {
172+
const dir = makeTmpDir();
173+
const file = path.join(dir, 'test.jsonl');
174+
fs.writeFileSync(file, '{"a":1}\n{"a":2}');
175+
176+
const stats = await getFileStats(file);
177+
expect(stats.lines).toBe(2);
178+
});
123179
});
124180

125181
// ── fs-helpers.ts ────────────────────────────────────────────────────────────
@@ -604,7 +660,9 @@ describe('extractAnthropicToolData structured data', () => {
604660
const messages: AnthropicMessage[] = [
605661
{
606662
role: 'assistant',
607-
content: [{ type: 'tool_use', id: 'tu1', name: 'Read', input: { file_path: '/src/app.ts', offset: 10, limit: 50 } }],
663+
content: [
664+
{ type: 'tool_use', id: 'tu1', name: 'Read', input: { file_path: '/src/app.ts', offset: 10, limit: 50 } },
665+
],
608666
},
609667
];
610668

@@ -685,7 +743,9 @@ describe('extractAnthropicToolData structured data', () => {
685743
},
686744
{
687745
role: 'user',
688-
content: [{ type: 'tool_result', tool_use_id: 'tu1', content: 'Found 3 matches\nsrc/a.ts\nsrc/b.ts\nsrc/c.ts' }],
746+
content: [
747+
{ type: 'tool_result', tool_use_id: 'tu1', content: 'Found 3 matches\nsrc/a.ts\nsrc/b.ts\nsrc/c.ts' },
748+
],
689749
},
690750
];
691751

@@ -724,7 +784,14 @@ describe('extractAnthropicToolData structured data', () => {
724784
const messages: AnthropicMessage[] = [
725785
{
726786
role: 'assistant',
727-
content: [{ type: 'tool_use', id: 'tu1', name: 'mcp__github__list_issues', input: { repo: 'test/repo', state: 'open' } }],
787+
content: [
788+
{
789+
type: 'tool_use',
790+
id: 'tu1',
791+
name: 'mcp__github__list_issues',
792+
input: { repo: 'test/repo', state: 'open' },
793+
},
794+
],
728795
},
729796
{
730797
role: 'user',
@@ -750,7 +817,9 @@ describe('extractAnthropicToolData structured data', () => {
750817
},
751818
{
752819
role: 'user',
753-
content: [{ type: 'tool_result', tool_use_id: 'tu1', content: 'Permission denied\nexit code 1', is_error: true }],
820+
content: [
821+
{ type: 'tool_result', tool_use_id: 'tu1', content: 'Permission denied\nexit code 1', is_error: true },
822+
],
754823
},
755824
];
756825
const { summaries } = extractAnthropicToolData(messages);

src/parsers/codex.ts

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ const CODEX_HOME_DIR = process.env.CODEX_HOME || path.join(homeDir(), '.codex');
3030
const CODEX_SESSIONS_DIR = path.join(CODEX_HOME_DIR, 'sessions');
3131
const CODEX_ARCHIVED_SESSIONS_DIR = path.join(CODEX_HOME_DIR, 'archived_sessions');
3232

33+
const MAX_EXACT_LINE_COUNT_BYTES = 1024 * 1024;
34+
const MAX_METADATA_SCAN_BYTES = 1024 * 1024;
35+
3336
/**
3437
* Find all Codex session files recursively
3538
*/
@@ -51,26 +54,34 @@ async function parseSessionInfo(filePath: string): Promise<{
5154
let meta: CodexSessionMeta | null = null;
5255
let firstUserMessage = '';
5356

54-
await scanJsonlHead(filePath, 150, (parsed) => {
55-
const msg = parsed as Record<string, unknown>;
57+
await scanJsonlHead(
58+
filePath,
59+
150,
60+
(parsed) => {
61+
const msg = parsed as Record<string, unknown>;
5662

57-
if (msg.type === 'session_meta' && !meta) {
58-
meta = msg as unknown as CodexSessionMeta;
59-
}
63+
if (msg.type === 'session_meta' && !meta) {
64+
meta = msg as unknown as CodexSessionMeta;
65+
}
6066

61-
if (!firstUserMessage && msg.type === 'event_msg') {
62-
const payload = msg.payload as Record<string, unknown> | undefined;
63-
if (payload?.type === 'user_message') {
64-
firstUserMessage = (payload.message as string) || '';
67+
if (!firstUserMessage && msg.type === 'event_msg') {
68+
const payload = msg.payload as Record<string, unknown> | undefined;
69+
if (payload?.type === 'user_message') {
70+
firstUserMessage = (payload.message as string) || '';
71+
}
6572
}
66-
}
6773

68-
if (!firstUserMessage && msg.type === 'message' && (msg as Record<string, unknown>).role === 'user') {
69-
firstUserMessage = typeof msg.content === 'string' ? (msg.content as string) : '';
70-
}
74+
if (!firstUserMessage && msg.type === 'message' && (msg as Record<string, unknown>).role === 'user') {
75+
firstUserMessage = typeof msg.content === 'string' ? (msg.content as string) : '';
76+
}
7177

72-
return 'continue';
73-
});
78+
if (meta && firstUserMessage) {
79+
return 'stop';
80+
}
81+
return 'continue';
82+
},
83+
{ maxBytes: MAX_METADATA_SCAN_BYTES },
84+
);
7485

7586
return { meta, firstUserMessage };
7687
}
@@ -103,8 +114,11 @@ export async function parseCodexSessions(): Promise<UnifiedSession[]> {
103114
if (!parsed) continue;
104115

105116
const { meta, firstUserMessage } = await parseSessionInfo(filePath);
106-
const stats = await getFileStats(filePath);
107117
const fileStats = fs.statSync(filePath);
118+
const stats =
119+
fileStats.size > MAX_EXACT_LINE_COUNT_BYTES
120+
? { lines: 0, bytes: fileStats.size }
121+
: await getFileStats(filePath);
108122

109123
const cwd = meta?.payload?.cwd || '';
110124
const gitUrl = meta?.payload?.git?.repository_url;

0 commit comments

Comments
 (0)