feat(copilot): track GitHub Copilot JetBrains IDE usage#608
Open
NihalJain wants to merge 4 commits into
Open
Conversation
## What & why
The JetBrains Copilot plugin (IntelliJ, PyCharm, RubyMine, …) stores its
chat/agent sessions under `~/.config/github-copilot/<ide>/<kind>/<storeId>/` —
a location none of the existing Copilot sources (CLI JSONL, VS Code chat
sessions/transcripts, OTel SQLite) read. As a result all JetBrains Copilot
usage was silently uncounted in every CodeBurn report. This adds a reader for
that store so those sessions are discovered, priced, and attributed to the
right project.
## How it works
- **Reader.** The store's session content is a Nitrite `.db` — an H2 MVStore of
Java-serialized documents. It is scanned as `latin1` for byte-offset
stability: no Java deserializer, no new dependency, and it is not SQLite so
`node:sqlite` is not involved.
- **Reply text.** Assistant replies live in nested-escaped
`{"__first__":{"type":"Subgraph"…}}` blobs. The text is recovered by
unescaping one level at a time and, at the depth where the Markdown record's
`data` field is a well-formed one-level-escaped JSON document, reading it
structurally — so a reply containing its own quotes is never truncated or
duplicated (which would otherwise inflate the estimate).
- **Tokens/cost.** The store records no token counts, so output tokens are
estimated from the reply text (`CHARS_PER_TOKEN = 4`, re-decoded
latin1→utf8 so multibyte replies count by codepoint) and every call is marked
`costIsEstimated`. Failed generations (error status, no reply) are billed $0.
- **Sessions.** One `.db` holds many chat tabs; turns are grouped back to their
conversation GUID so the UI shows one session per tab, deduped by reply
content per conversation.
- **Project attribution**, most authoritative first:
1. the plugin-recorded `projectName` field (JetBrains Copilot 1.12+), joined
across kind dirs by store id — the billable turns live in
`chat-agent-sessions`, but the label is usually written into the sibling
`chat-sessions`/`chat-edit-sessions` store. Read length-delimited and
re-decoded latin1→utf8 so non-ASCII repo names round-trip.
2. the `.git` repo root of a referenced `file://` path.
3. a generic `copilot-jetbrains` bucket when neither signal exists.
The conversation title is a chat-thread name, not a project, so it is kept
out of the project field and surfaced as the session label instead.
Override the JetBrains github-copilot root with
`CODEBURN_COPILOT_JETBRAINS_DIR`.
## Docs
- `docs/providers/copilot.md` — full JetBrains section (store layout, latin1
scan, reply extraction, projectName precedence + cross-kind join).
- `docs/providers/README.md` — Copilot storage updated to note the Nitrite .db.
## How to verify
- `npm test -- copilot` and `npx tsc --noEmit` (fixtures reproduce the real
nested-escaped .db framing, including quote- and multibyte-bearing replies).
- End to end against a real install:
`CODEBURN_CACHE_DIR=$(mktemp -d) node dist/cli.js status --provider copilot \
--period all --format menubar-json`
— JetBrains sessions appear By-Project under their real repo names.
- Set `CODEBURN_COPILOT_JETBRAINS_DIR` to a fixture root to parse a controlled
store without touching the real config dir.
…trix The README "Data location" support matrix listed GitHub Copilot as only the legacy CLI and VS Code transcript sources. Update the row to reflect all sources the provider actually reads — the OpenTelemetry `agent-traces.db` (preferred when present) and the JetBrains IDE Nitrite `.db` — and how the project is resolved. Links to docs/providers/copilot.md for the full detail.
030c51f to
ccc9deb
Compare
JetBrains Copilot has two turn shapes in the Nitrite .db: - ask mode — the reply is a `Markdown` record's `text`; - agent / plan mode (e.g. PyCharm agent sessions, `/plan …`) — the reply is the `reply` field of an `AgentRound` record, and the `Markdown` record instead holds the USER's prompt. extractResponseText only read Markdown, so agent-mode turns yielded no reply text: they were discovered (session/turn counts showed up) but priced at $0 because output tokens came out zero. On this machine that silently under-counted a PyCharm session ($0 → $0.35) and several IntelliJ agent turns. Determine the mode by the PRESENCE of an `AgentRound` record and read only that record's `reply` (collecting every non-empty round in a multi-round blob). Crucially, an agent blob whose reply is empty — a failed turn or a pure tool-call round — does NOT fall back to the Markdown record, so a user prompt is never mistaken for the assistant's output; such turns bill $0 as before. Ask-mode blobs (no AgentRound) keep reading Markdown. Plan mode's sidecar records — Thinking, PendingChanges (proposed diff, under `content`), AskQuestion, Notification, SubTurn, and file-read `text` results — are never read as output. Verified across all local stores: the two reply shapes never coexist in one blob, so the split is unambiguous. Tests: agent-mode reply extraction (ignoring the prompt Markdown), pure tool-call rounds → $0, multi-round collection, and a failed agent turn → $0. docs/providers/copilot.md documents both turn shapes and the ignored sidecar records.
…≤1.5.x)
JetBrains Copilot plugin ≤1.5.x (e.g. 1.5.59-243) stores all session turns
inside ONE large binary-framed outer Nitrite document, rather than the
per-turn {"__first__":{"type":"Subgraph",...}} blobs introduced in later
plugins (≥1.12.x, e.g. 1.12.1-251).
In the old format each assistant turn is a UUID-keyed Value entry whose
value field contains a JSON-string-escaped AgentRound record:
{"<uuid>":{"type":"Value","value":"{\"type\":\"AgentRound\",
\"data\":\"{...reply...}\"}"}, ...}
The extractResponseText depth-unescape loop already handles this one extra
level of escaping; the only gap was that extractJetBrainsDbTurns never fed
it the outer document — it only scanned for __first__/Subgraph blobs, which
the old plugin never writes.
Add a fallback that activates when the Subgraph scan produces zero turns but
'AgentRound' text is present in the raw file (old-format signal). It locates
the binary-framed outer document (UUID-keyed Value entry, hex matched
case-insensitively so an uppercase UUID does not fall through to $0), extracts
it with matchJsonObject, and passes it to extractResponseText. Because the outer
document holds every turn in one blob, this emits ONE session-level call per
document (all rounds' replies joined): cost/tokens are correct, only the
per-turn call-count granularity is coarser — an accepted tradeoff for legacy
data. MVStore keeps two identical collection copies; seenReplies dedupes them.
The fallback is guarded by turns.length === 0 so new-format sessions (whose
Subgraph scan succeeds) are completely unaffected and never double-counted.
Tests: old-format doc with multiple AgentRound rounds → 1 call whose token
count equals the two non-empty replies joined (the empty tool-call round is
excluded); an uppercase-UUID variant (fails without the case-insensitive
match); and a guard that new-format Subgraph turns are not double-counted.
docs/providers/copilot.md documents the old format and the one-call-per-session
limitation.
ee66693 to
cd07707
Compare
Contributor
Author
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


What & why
The JetBrains Copilot plugin (IntelliJ, PyCharm, RubyMine, …) stores its chat/agent sessions under
~/.config/github-copilot/<ide>/<kind>/<storeId>/— a location none of the existing Copilot sources (CLI JSONL, VS Code chat sessions/transcripts, OTel SQLite) read. As a result all JetBrains Copilot usage was silently uncounted in every CodeBurn report. This adds a reader for that store so those sessions are discovered, priced, and attributed to the right project.Fixes #211
How it works
.db— an H2 MVStore of Java-serialized documents. It is scanned aslatin1for byte-offset stability: no Java deserializer, no new dependency, and it is not SQLite sonode:sqliteis not involved.{"__first__":{"type":"Subgraph"…}}blobs. The text is recovered by unescaping one level at a time and, at the depth where the Markdown record'sdatafield is a well-formed one-level-escaped JSON document, reading it structurally — so a reply containing its own quotes is never truncated or duplicated (which would otherwise inflate the estimate).CHARS_PER_TOKEN = 4, re-decoded latin1→utf8 so multibyte replies count by codepoint) and every call is markedcostIsEstimated. Failed generations (error status, no reply) are billed $0..dbholds many chat tabs; turns are grouped back to their conversation GUID so the UI shows one session per tab, deduped by reply content per conversation.projectNamefield (JetBrains Copilot 1.12+), joined across kind dirs by store id — the billable turns live inchat-agent-sessions, but the label is usually written into the siblingchat-sessions/chat-edit-sessionsstore. Read length-delimited and re-decoded latin1→utf8 so non-ASCII repo names round-trip..gitrepo root of a referencedfile://path.copilot-jetbrainsbucket when neither signal exists. The conversation title is a chat-thread name, not a project, so it is kept out of the project field and surfaced as the session label instead.Override the JetBrains github-copilot root with
CODEBURN_COPILOT_JETBRAINS_DIR.Docs
docs/providers/copilot.md— full JetBrains section (store layout, latin1 scan, reply extraction, projectName precedence + cross-kind join).docs/providers/README.md— Copilot storage updated to note the Nitrite .db.How to verify
npm test -- copilotandnpx tsc --noEmit(fixtures reproduce the real nested-escaped .db framing, including quote- and multibyte-bearing replies).CODEBURN_CACHE_DIR=$(mktemp -d) node dist/cli.js status --provider copilot \ --period all --format menubar-json— JetBrains sessions appear By-Project under their real repo names.CODEBURN_COPILOT_JETBRAINS_DIRto a fixture root to parse a controlled store without touching the real config dir.Summary
Testing
npm testpassesnpm run buildsucceedsFor new providers only:
npm run dev -- todayshows correct costs and session counts for this providernpm run dev -- models --provider <name>shows correct model names and pricing