Skip to content

feat(llm): additive per-call USD cost meter#214

Open
ianu82 wants to merge 1 commit into
mainfrom
feat/nocloud/agent-runtime-cost
Open

feat(llm): additive per-call USD cost meter#214
ianu82 wants to merge 1 commit into
mainfrom
feat/nocloud/agent-runtime-cost

Conversation

@ianu82

@ianu82 ianu82 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Adds a maintained per-model price table (anton/core/llm/pricing.py, USD/1M tokens, prefix-matched; cache-aware) and compute_cost(...). Usage gains additive cache_write_tokens/cache_read_tokens/cost_usd (all defaulted, so every existing construction site stays valid); the Anthropic + OpenAI adapters populate them.

Purely additive — no agent-loop / control-flow change, no enforcement. Unknown models and zero/None tokens → $0.00 (never breaks a turn).

Intended use is operator-side / internal only (e.g. a future budget/ledger). Per the consuming product's policy, computed dollar cost is not surfaced to end users — the cowork-server harness forwards tokens consumed only and does not expose cost_usd. This PR just makes the capability available in the library.

Testing: +19 unit tests (tokens × table incl. cache/zero/None/unknown); full suite 1092 passed, 17 skipped.

Budget caps + plan object are deliberately later (control-flow) slices.

🤖 Generated with Claude Code

Turn the token counts Anton already records per LLM call into a dollar
figure a host can surface as "$ this turn / $ this task". Purely
additive telemetry: no budget object, no enforcement, no control-flow
change (those are explicitly later slices).

- New anton/core/llm/pricing.py: a maintained per-model price table
  (input/output/cache USD rates per 1M tokens, matched by model-ID
  prefix like the existing _CONTEXT_WINDOWS table) plus compute_cost().
  Unknown models price at 0.0 and None token counts are treated as 0,
  so an unpriced model or a missing usage field never breaks a turn
  (mirrors compute_context_pressure's defensive posture).
- Usage gains additive cache_write_tokens / cache_read_tokens / cost_usd
  fields, all defaulted so every existing construction site stays valid.
- Both providers populate cost_usd (and cache tokens where the SDK
  reports them) at all 6 Usage construction sites. Anthropic reports
  cache tokens separately, so they're summed; OpenAI folds cached tokens
  into prompt_tokens, so they're surfaced for telemetry but not double
  -priced. cost_usd rides on the existing StreamComplete/usage output.

Adds tests/test_pricing.py (tokens x price table -> expected USD,
including cache and zero/None/unknown-model cases).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant