feat(llm): additive per-call USD cost meter by ianu82 · Pull Request #214 · mindsdb/anton

ianu82 · 2026-06-24T16:52:53Z

Adds a maintained per-model price table (anton/core/llm/pricing.py, USD/1M tokens, prefix-matched; cache-aware) and compute_cost(...). Usage gains additive cache_write_tokens/cache_read_tokens/cost_usd (all defaulted, so every existing construction site stays valid); the Anthropic + OpenAI adapters populate them.

Purely additive — no agent-loop / control-flow change, no enforcement. Unknown models and zero/None tokens → $0.00 (never breaks a turn).

Intended use is operator-side / internal only (e.g. a future budget/ledger). Per the consuming product's policy, computed dollar cost is not surfaced to end users — the cowork-server harness forwards tokens consumed only and does not expose cost_usd. This PR just makes the capability available in the library.

Testing: +19 unit tests (tokens × table incl. cache/zero/None/unknown); full suite 1092 passed, 17 skipped.

Budget caps + plan object are deliberately later (control-flow) slices.

🤖 Generated with Claude Code

Turn the token counts Anton already records per LLM call into a dollar figure a host can surface as "$ this turn / $ this task". Purely additive telemetry: no budget object, no enforcement, no control-flow change (those are explicitly later slices). - New anton/core/llm/pricing.py: a maintained per-model price table (input/output/cache USD rates per 1M tokens, matched by model-ID prefix like the existing _CONTEXT_WINDOWS table) plus compute_cost(). Unknown models price at 0.0 and None token counts are treated as 0, so an unpriced model or a missing usage field never breaks a turn (mirrors compute_context_pressure's defensive posture). - Usage gains additive cache_write_tokens / cache_read_tokens / cost_usd fields, all defaulted so every existing construction site stays valid. - Both providers populate cost_usd (and cache tokens where the SDK reports them) at all 6 Usage construction sites. Anthropic reports cache tokens separately, so they're summed; OpenAI folds cached tokens into prompt_tokens, so they're surfaced for telemetry but not double -priced. cost_usd rides on the existing StreamComplete/usage output. Adds tests/test_pricing.py (tokens x price table -> expected USD, including cache and zero/None/unknown-model cases). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ianu82 mentioned this pull request Jun 24, 2026

feat(harness): surface tokens consumed on response.completed mindsdb/cowork-server#107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(llm): additive per-call USD cost meter#214

feat(llm): additive per-call USD cost meter#214
ianu82 wants to merge 1 commit into
mainfrom
feat/nocloud/agent-runtime-cost

ianu82 commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ianu82 commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ianu82 commented Jun 24, 2026 •

edited

Loading