Restore token usage metrics when events.jsonl omits session.shutdown.modelMetrics#43007
Conversation
…fields Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
events.jsonl omits session.shutdown.modelMetrics
|
✅ Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #43007 does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (61 additions detected). |
|
|
There was a problem hiding this comment.
Pull request overview
This PR fixes token usage metric extraction from Copilot events.jsonl so runs that emit per-event usage (but omit session.shutdown.modelMetrics) no longer report TokenUsage=0.
Changes:
- Added support for extracting token usage from per-event usage fields in
events.jsonlas a fallback when shutdown metrics are missing. - Extended the events.jsonl entry/data structs to capture additional usage token field shapes.
- Added regression tests covering fallback aggregation and shutdown-vs-fallback precedence.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/copilot_events_jsonl.go | Adds fallback token aggregation logic and supporting JSON fields for events.jsonl parsing. |
| pkg/cli/copilot_events_jsonl_test.go | Adds tests to validate fallback aggregation and precedence over shutdown totals. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
- Review effort level: Low
| metrics.TokenUsage = totalTokens | ||
| if metrics.TokenUsage == 0 && fallbackTokens > 0 { | ||
| metrics.TokenUsage = fallbackTokens | ||
| } |
There was a problem hiding this comment.
Fixed in the latest commit. Added sawShutdownModelMetrics := false and set it to true inside if entry.Data.ModelMetrics != nil within the session.shutdown case. The guard at the end now uses !sawShutdownModelMetrics && fallbackTokens > 0, so a present-but-empty (or present-but-zero) modelMetrics block correctly suppresses the fallback.
| if entry.Data.Usage != nil { | ||
| // ExtractJSONTokenUsage expects a full event/document map and knows how to | ||
| // parse usage when it appears at key "usage", so wrap the nested usage map. | ||
| if tokens := workflow.ExtractJSONTokenUsage(map[string]any{"usage": entry.Data.Usage}); tokens > 0 { | ||
| return tokens | ||
| } | ||
| } |
There was a problem hiding this comment.
Fixed in the latest commit. Replaced workflow.ExtractJSONTokenUsage with a direct extraction that reads only input_tokens (falling back to prompt_tokens) and output_tokens (falling back to completion_tokens), deliberately skipping cache_creation_input_tokens and cache_read_input_tokens. Extracted into a usageField helper to avoid duplicating the primary/alias lookup. Added a regression test (fallback excludes cache tokens from per-event usage) that verifies the cache fields are not counted.
🧪 Test Quality Sentinel Report✅ Test Quality Score: 100/100 — Excellent
📊 Metrics (2 tests)
Verdict
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "awmgmcpg"See Network Configuration for more information.
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 100/100. 0% implementation tests (threshold: 30%). Both new test cases target behavioral contracts (fallback path + priority rule), carry required build tag, no mock library usage, and test-to-production ratio is well under 2:1.
There was a problem hiding this comment.
Review: Restore token usage metrics fallback
Overall: logic is correct, but observability for the new fallback path is broken.
The fallback mechanism (totalTokens == 0 → fallbackTokens) is sound and the precedence tests cover the key scenarios well. One actual bug in logging needs to be fixed.
Bug: Log lines report stale totalTokens=0 in the fallback path
Both the debug log (line 326) and the verbose user-facing message (line 331) still print the local totalTokens variable. In the exact scenario this PR fixes (no session.shutdown.modelMetrics), totalTokens remains 0 even after the fallback populates metrics.TokenUsage. This means:
- Debug logs show
totalTokens=0for successful fallback runs. - Users running with
--verbosesee0 tokensin the summary line.
Fix: use metrics.TokenUsage in both places after the fallback assignment (inline comments added).
What looks good
- Precedence guarantee: shutdown totals always win when present.
- Four-source fallback covers the known field-name variants.
- New tests are focused and correctly anchor expected token counts.
extractFallbackEventTokensis isolated and easy to extend.
🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 36 AIC · ⌖ 6.5 AIC · ⊞ 4.9K
Comments that could not be inline-anchored
pkg/cli/copilot_events_jsonl.go:326
The summary log line (and the verbose message a few lines below) still uses the local totalTokens variable, which will be 0 in the fallback path this PR introduces. After the fallback assignment on line 298, the log will report totalTokens=0 even though metrics.TokenUsage was just set to fallbackTokens.
Consider using metrics.TokenUsage in the log (or logging fallbackTokens separately) to make observability accurate:
copilotEventsJSONLLog.Printf("Parsed events.jsonl: tur…
</details>
<details><summary>pkg/cli/copilot_events_jsonl.go:331</summary>
The `verbose` user-facing message also prints `totalTokens` (the local variable), which will be `0` in the fallback path. This would show the user `0 tokens` in verbose output even after the fallback populated `metrics.TokenUsage`. Change to `metrics.TokenUsage`:
```go
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(
fmt.Sprintf("Parsed events.jsonl: %d turns, %d tokens, %d tool calls",
turns, metrics.TokenUsage, len(toolCallMap))))@copilot please address this.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnosing-bugs, /tdd, and /codebase-design — requesting changes on one correctness issue and test-coverage gaps.
📋 Key Themes & Highlights
Key Themes
- Verbose log bug (correctness):
--verboseprintstotalTokens(the shutdown accumulator), notmetrics.TokenUsage, so it always shows0for sessions that take the fallback path — undoing the observability benefit of this fix. - Partial test coverage: Only the
data.usagefallback path is exercised; the other three paths (data.input_tokens, top-level snake_case, top-level camelCase) are untested.extractFallbackEventTokensalso lacks a direct unit test that documents its precedence rules. - Latent double-count risk:
extractFallbackEventTokensruns on every event includingsession.shutdown. TheTokenUsage == 0guard prevents a collision today, but skipping the shutdown event explicitly is structurally safer. - Naming:
AlternateUsageInputTokens/AlternateUsageOutputTokensdon't communicate that these are the camelCase JSON variants; the Go field names should reflect the distinction more clearly.
Positive Highlights
- ✅ Correct precedence design: shutdown metrics remain authoritative, fallback only activates when
TokenUsage == 0 - ✅ Good use of
workflow.ExtractJSONTokenUsageto reuse existing parsing logic rather than duplicating it - ✅ PR description is clear and includes a code snippet illustrating the guard — well done
- ✅ Regression test for precedence behaviour (shutdown + per-event both present) is exactly right
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 69 AIC · ⌖ 7.84 AIC · ⊞ 6.6K
Comment /matt to run again
Comments that could not be inline-anchored
pkg/cli/copilot_events_jsonl.go:332
[/diagnosing-bugs] The verbose log prints totalTokens (the shutdown-path accumulator), which is 0 when only the fallback path is active — so --verbose will always report 0 tokens for runs without session.shutdown.modelMetrics.
<details>
<summary>💡 Suggested fix</summary>
Change the verbose print to use metrics.TokenUsage, which holds the final resolved value:
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(
fmt.Sprintf("Parsed events.jsonl: %d turns, %d tokens…
</details>
<details><summary>pkg/cli/copilot_events_jsonl.go:221</summary>
**[/diagnosing-bugs]** `extractFallbackEventTokens` is called unconditionally for every event, including `session.shutdown`. If a shutdown event ever carries top-level `usage_input_tokens` / `usage_output_tokens` fields, those tokens will be double-counted: once in `totalTokens` (from `ModelMetrics`) and once in `fallbackTokens`.
<details>
<summary>💡 Suggested fix</summary>
Guard the fallback accumulation to skip the shutdown event, since its tokens are already handled by the dedicated `sess…
</details>
<details><summary>pkg/cli/copilot_events_jsonl_test.go:160</summary>
**[/tdd]** The two new test cases only exercise `data.usage` (the first fallback path). The three other paths added in this PR — `data.input_tokens`/`data.output_tokens`, top-level `usage_input_tokens`/`usage_output_tokens`, and top-level `usageInputTokens`/`usageOutputTokens` — have no coverage.
<details>
<summary>💡 Suggested additional test cases</summary>
Add table entries exercising each path independently, e.g.:
```go
{
name: "falls back to data.input_tokens / data.output_tokens",
…
</details>
<details><summary>pkg/cli/copilot_events_jsonl.go:49</summary>
**[/codebase-design]** The field names `AlternateUsageInputTokens` / `AlternateUsageOutputTokens` obscure what they represent. The JSON tags (`usageInputTokens` / `usageOutputTokens`) are already self-documenting; the Go names should mirror them.
<details>
<summary>💡 Suggested renaming</summary>
```go
UsageInputTokensCamel int `json:"usageInputTokens,omitempty"`
UsageOutputTokensCamel int `json:"usageOutputTokens,omitempty"`Or, simply suffix the existing names to encode the casing var…
pkg/cli/copilot_events_jsonl.go:338
[/tdd] extractFallbackEventTokens is package-private but has no direct unit test — it is only exercised indirectly through the parseEventsJSONLMetrics integration path. A direct unit test would make the precedence rules between the four fallback paths explicit and independently verifiable.
<details>
<summary>💡 Suggested unit test skeleton</summary>
func TestExtractFallbackEventTokens(t *testing.T) {
tests := []struct {
name string
entry copilotEventsJSONLEn…
</details>|
@copilot run pr-finisher skill |
…raction - Use sawShutdownModelMetrics boolean instead of TokenUsage==0 guard so a present-but-empty modelMetrics block is treated as authoritative - Replace workflow.ExtractJSONTokenUsage with direct input+output extraction (no cache tokens) via new usageField helper - Add tests: zero-but-present shutdown, cache exclusion from fallback Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
Great work on this fix! 🎉 The token metrics fallback logic is clearly reasoned — preserving The PR description is thorough (root cause, fallback sources, behavioral guardrail, and the inline code snippet), and the regression tests cover both the fallback aggregation path and the precedence behavior when shutdown totals are present. This looks ready for review.
|
Fleet-wide token metrics were reading as zero because
events.jsonlparsing only counted tokens fromsession.shutdown.modelMetrics. Runs that emitted per-event usage but no shutdown model metrics were therefore recorded asTokenUsage=0.Token extraction fallback path
parseEventsJSONLMetricsto preserve existing precedence (session.shutdown.modelMetricsfirst), then fall back to per-event usage fields when shutdown totals are absent.data.usage(input_tokens+output_tokens)data.input_tokens/data.output_tokensusage_input_tokens/usage_output_tokensusageInputTokens/usageOutputTokensBehavioral guardrail
Regression coverage