Restore token usage metrics when `events.jsonl` omits `session.shutdown.modelMetrics` by Copilot · Pull Request #43007 · github/gh-aw

Copilot · 2026-07-02T16:00:26Z

Fleet-wide token metrics were reading as zero because events.jsonl parsing only counted tokens from session.shutdown.modelMetrics. Runs that emitted per-event usage but no shutdown model metrics were therefore recorded as TokenUsage=0.

Token extraction fallback path
- Updated parseEventsJSONLMetrics to preserve existing precedence (session.shutdown.modelMetrics first), then fall back to per-event usage fields when shutdown totals are absent.
- Fallback sources now include:
  - data.usage (input_tokens + output_tokens)
  - direct nested data.input_tokens / data.output_tokens
  - top-level usage_input_tokens / usage_output_tokens
  - top-level usageInputTokens / usageOutputTokens
Behavioral guardrail
- Existing shutdown-based totals remain authoritative when present, avoiding double counting and preserving current semantics.
Regression coverage
- Added focused parser tests for:
  - fallback token aggregation when shutdown model metrics are missing
  - precedence behavior when both shutdown totals and per-event usage are present

metrics.TokenUsage = totalTokens // from session.shutdown.modelMetrics
if metrics.TokenUsage == 0 && fallbackTokens > 0 {
    metrics.TokenUsage = fallbackTokens // per-event usage fallback
}

…fields Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-07-02T20:22:01Z

✅ Test Quality Sentinel completed test quality analysis.

github-actions · 2026-07-02T20:22:05Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-07-02T20:22:07Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #43007 does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (61 additions detected).

github-actions · 2026-07-02T20:22:26Z

⚠️ PR Code Quality Reviewer failed during code quality review.

Copilot

Pull request overview

This PR fixes token usage metric extraction from Copilot events.jsonl so runs that emit per-event usage (but omit session.shutdown.modelMetrics) no longer report TokenUsage=0.

Changes:

Added support for extracting token usage from per-event usage fields in events.jsonl as a fallback when shutdown metrics are missing.
Extended the events.jsonl entry/data structs to capture additional usage token field shapes.
Added regression tests covering fallback aggregation and shutdown-vs-fallback precedence.

Show a summary per file

File	Description
pkg/cli/copilot_events_jsonl.go	Adds fallback token aggregation logic and supporting JSON fields for events.jsonl parsing.
pkg/cli/copilot_events_jsonl_test.go	Adds tests to validate fallback aggregation and precedence over shutdown totals.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 2
Review effort level: Low

Copilot · 2026-07-02T20:53:35Z

 	metrics.TokenUsage = totalTokens
+	if metrics.TokenUsage == 0 && fallbackTokens > 0 {
+		metrics.TokenUsage = fallbackTokens
+	}


Fixed in the latest commit. Added sawShutdownModelMetrics := false and set it to true inside if entry.Data.ModelMetrics != nil within the session.shutdown case. The guard at the end now uses !sawShutdownModelMetrics && fallbackTokens > 0, so a present-but-empty (or present-but-zero) modelMetrics block correctly suppresses the fallback.

Copilot · 2026-07-02T20:53:35Z

+	if entry.Data.Usage != nil {
+		// ExtractJSONTokenUsage expects a full event/document map and knows how to
+		// parse usage when it appears at key "usage", so wrap the nested usage map.
+		if tokens := workflow.ExtractJSONTokenUsage(map[string]any{"usage": entry.Data.Usage}); tokens > 0 {
+			return tokens
+		}
+	}


Fixed in the latest commit. Replaced workflow.ExtractJSONTokenUsage with a direct extraction that reads only input_tokens (falling back to prompt_tokens) and output_tokens (falling back to completion_tokens), deliberately skipping cache_creation_input_tokens and cache_read_input_tokens. Extracted into a usageField helper to avoid duplicating the primary/alias lookup. Added a regression test (fallback excludes cache tokens from per-event usage) that verifies the cache fields are not counted.

github-actions · 2026-07-02T20:29:29Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 100/100 — Excellent

Analyzed 2 test(s): 2 design, 0 implementation, 0 violation(s).

📊 Metrics (2 tests)

Metric	Value
Analyzed	2 (Go: 2, JS: 0)
✅ Design	2 (100%)
⚠️ Implementation	0 (0%)
Edge/error coverage	2 (100%)
Duplicate clusters	0
Inflation	No (test:prod = 20:41 ≈ 0.49:1)
🚨 Violations	0

Test	File	Classification	Issues
`falls back to per-event usage when session.shutdown model metrics are absent`	`copilot_events_jsonl_test.go:152`	design_test / behavioral_contract	None
`prefers session.shutdown totals when both shutdown and per-event usage exist`	`copilot_events_jsonl_test.go:161`	design_test / behavioral_contract	None

Verdict

✅ Passed. 0% implementation tests (threshold: 30%). Both new test cases cover non-happy-path behaviors (fallback path and priority-rule), are fully assertion-guarded with descriptive messages, the file carries the required //go:build !integration tag, and no mock libraries are used. Test-to-production line ratio is 0.49:1 (well under the 2:1 threshold).

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

awmgmcpg

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "awmgmcpg"

See Network Configuration for more information.

🧪 Test quality analysis by Test Quality Sentinel · 38 AIC · ⌖ 15.7 AIC · ⊞ 6.8K · ◷
_{Comment /review to run again}

github-actions

✅ Test Quality Sentinel: 100/100. 0% implementation tests (threshold: 30%). Both new test cases target behavioral contracts (fallback path + priority rule), carry required build tag, no mock library usage, and test-to-production ratio is well under 2:1.

github-actions

Review: Restore token usage metrics fallback

Overall: logic is correct, but observability for the new fallback path is broken.

The fallback mechanism (totalTokens == 0 → fallbackTokens) is sound and the precedence tests cover the key scenarios well. One actual bug in logging needs to be fixed.

Bug: Log lines report stale `totalTokens=0` in the fallback path

Both the debug log (line 326) and the verbose user-facing message (line 331) still print the local totalTokens variable. In the exact scenario this PR fixes (no session.shutdown.modelMetrics), totalTokens remains 0 even after the fallback populates metrics.TokenUsage. This means:

Debug logs show totalTokens=0 for successful fallback runs.
Users running with --verbose see 0 tokens in the summary line.

Fix: use metrics.TokenUsage in both places after the fallback assignment (inline comments added).

What looks good

Precedence guarantee: shutdown totals always win when present.
Four-source fallback covers the known field-name variants.
New tests are focused and correctly anchor expected token counts.
extractFallbackEventTokens is isolated and easy to extend.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 36 AIC · ⌖ 6.5 AIC · ⊞ 4.9K

Comments that could not be inline-anchored

pkg/cli/copilot_events_jsonl.go:326

The summary log line (and the verbose message a few lines below) still uses the local totalTokens variable, which will be 0 in the fallback path this PR introduces. After the fallback assignment on line 298, the log will report totalTokens=0 even though metrics.TokenUsage was just set to fallbackTokens.

Consider using metrics.TokenUsage in the log (or logging fallbackTokens separately) to make observability accurate:

copilotEventsJSONLLog.Printf(&quot;Parsed events.jsonl: tur…

</details>

<details><summary>pkg/cli/copilot_events_jsonl.go:331</summary>

The `verbose` user-facing message also prints `totalTokens` (the local variable), which will be `0` in the fallback path. This would show the user `0 tokens` in verbose output even after the fallback populated `metrics.TokenUsage`. Change to `metrics.TokenUsage`:

```go
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(
    fmt.Sprintf(&quot;Parsed events.jsonl: %d turns, %d tokens, %d tool calls&quot;,
        turns, metrics.TokenUsage, len(toolCallMap))))

@copilot please address this.

github-actions

Skills-Based Review 🧠

Applied /diagnosing-bugs, /tdd, and /codebase-design — requesting changes on one correctness issue and test-coverage gaps.

📋 Key Themes & Highlights

Key Themes

Verbose log bug (correctness): --verbose prints totalTokens (the shutdown accumulator), not metrics.TokenUsage, so it always shows 0 for sessions that take the fallback path — undoing the observability benefit of this fix.
Partial test coverage: Only the data.usage fallback path is exercised; the other three paths (data.input_tokens, top-level snake_case, top-level camelCase) are untested. extractFallbackEventTokens also lacks a direct unit test that documents its precedence rules.
Latent double-count risk: extractFallbackEventTokens runs on every event including session.shutdown. The TokenUsage == 0 guard prevents a collision today, but skipping the shutdown event explicitly is structurally safer.
Naming: AlternateUsageInputTokens / AlternateUsageOutputTokens don't communicate that these are the camelCase JSON variants; the Go field names should reflect the distinction more clearly.

Positive Highlights

✅ Correct precedence design: shutdown metrics remain authoritative, fallback only activates when TokenUsage == 0
✅ Good use of workflow.ExtractJSONTokenUsage to reuse existing parsing logic rather than duplicating it
✅ PR description is clear and includes a code snippet illustrating the guard — well done
✅ Regression test for precedence behaviour (shutdown + per-event both present) is exactly right

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 69 AIC · ⌖ 7.84 AIC · ⊞ 6.6K
_{Comment /matt to run again}

Comments that could not be inline-anchored

pkg/cli/copilot_events_jsonl.go:332

[/diagnosing-bugs] The verbose log prints totalTokens (the shutdown-path accumulator), which is 0 when only the fallback path is active — so --verbose will always report 0 tokens for runs without session.shutdown.modelMetrics.

<details>
<summary>💡 Suggested fix</summary>

Change the verbose print to use metrics.TokenUsage, which holds the final resolved value:

fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(
    fmt.Sprintf(&quot;Parsed events.jsonl: %d turns, %d tokens…

</details>

<details><summary>pkg/cli/copilot_events_jsonl.go:221</summary>

**[/diagnosing-bugs]** `extractFallbackEventTokens` is called unconditionally for every event, including `session.shutdown`. If a shutdown event ever carries top-level `usage_input_tokens` / `usage_output_tokens` fields, those tokens will be double-counted: once in `totalTokens` (from `ModelMetrics`) and once in `fallbackTokens`.

&lt;details&gt;
&lt;summary&gt;💡 Suggested fix&lt;/summary&gt;

Guard the fallback accumulation to skip the shutdown event, since its tokens are already handled by the dedicated `sess…

</details>

<details><summary>pkg/cli/copilot_events_jsonl_test.go:160</summary>

**[/tdd]** The two new test cases only exercise `data.usage` (the first fallback path). The three other paths added in this PR — `data.input_tokens`/`data.output_tokens`, top-level `usage_input_tokens`/`usage_output_tokens`, and top-level `usageInputTokens`/`usageOutputTokens` — have no coverage.

&lt;details&gt;
&lt;summary&gt;💡 Suggested additional test cases&lt;/summary&gt;

Add table entries exercising each path independently, e.g.:

```go
{
    name: &quot;falls back to data.input_tokens / data.output_tokens&quot;,
…

</details>

<details><summary>pkg/cli/copilot_events_jsonl.go:49</summary>

**[/codebase-design]** The field names `AlternateUsageInputTokens` / `AlternateUsageOutputTokens` obscure what they represent. The JSON tags (`usageInputTokens` / `usageOutputTokens`) are already self-documenting; the Go names should mirror them.

&lt;details&gt;
&lt;summary&gt;💡 Suggested renaming&lt;/summary&gt;

```go
UsageInputTokensCamel  int `json:&quot;usageInputTokens,omitempty&quot;`
UsageOutputTokensCamel int `json:&quot;usageOutputTokens,omitempty&quot;`

Or, simply suffix the existing names to encode the casing var…

pkg/cli/copilot_events_jsonl.go:338

[/tdd] extractFallbackEventTokens is package-private but has no direct unit test — it is only exercised indirectly through the parseEventsJSONLMetrics integration path. A direct unit test would make the precedence rules between the four fallback paths explicit and independently verifiable.

<details>
<summary>💡 Suggested unit test skeleton</summary>

func TestExtractFallbackEventTokens(t *testing.T) {
    tests := []struct {
        name  string
        entry copilotEventsJSONLEn…

</details>

pelikhan · 2026-07-02T20:37:31Z

@copilot run pr-finisher skill

…raction - Use sawShutdownModelMetrics boolean instead of TokenUsage==0 guard so a present-but-empty modelMetrics block is treated as authoritative - Replace workflow.ExtractJSONTokenUsage with direct input+output extraction (no cache tokens) via new usageField helper - Add tests: zero-but-present shutdown, cache exclusion from fallback Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Done

github-actions · 2026-07-02T21:27:11Z

Great work on this fix! 🎉 The token metrics fallback logic is clearly reasoned — preserving session.shutdown.modelMetrics as authoritative while gracefully handling the zero-token case with per-event usage fields is exactly the right approach.

The PR description is thorough (root cause, fallback sources, behavioral guardrail, and the inline code snippet), and the regression tests cover both the fallback aggregation path and the precedence behavior when shutdown totals are present. This looks ready for review.

Generated by ✅ Contribution Check · 184.6 AIC · ⌖ 12.9 AIC · ⊞ 6.3K · ◷

Initial plan

93be17b

Copilot AI assigned Copilot and pelikhan Jul 2, 2026

Copilot AI linked an issue Jul 2, 2026 that may be closed by this pull request

[deep-report] Re-open fleet-wide token_usage collection — TokenUsage=0 for ~13 consecutive days #42788

Closed

Copilot started work on behalf of pelikhan July 2, 2026 16:00 View session

Copilot AI and others added 2 commits July 2, 2026 16:13

fix: restore token usage extraction from events jsonl fallback usage …

ea123b3

…fields Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

test: cover events token fallback without session shutdown

48dfaeb

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Re-open investigation on fleet-wide token usage collection~~ Restore token usage metrics when events.jsonl omits session.shutdown.modelMetrics Jul 2, 2026

Copilot finished work on behalf of pelikhan July 2, 2026 16:23

Copilot AI requested a review from pelikhan July 2, 2026 16:23

pelikhan marked this pull request as ready for review July 2, 2026 20:21

Copilot AI review requested due to automatic review settings July 2, 2026 20:21

Copilot started reviewing on behalf of pelikhan July 2, 2026 20:21 View session

Copilot AI reviewed Jul 2, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jul 2, 2026

[aw] No-Op Runs #39849

Open

github-actions Bot approved these changes Jul 2, 2026

View reviewed changes

github-actions Bot requested changes Jul 2, 2026

View reviewed changes

github-actions Bot previously requested changes Jul 2, 2026

View reviewed changes

Copilot started work on behalf of pelikhan July 2, 2026 20:38 View session

Copilot finished work on behalf of pelikhan July 2, 2026 20:54

pelikhan merged commit dfac557 into main Jul 2, 2026
28 of 29 checks passed

pelikhan deleted the copilot/deep-report-reopen-token-usage-collection branch July 2, 2026 21:19

github-actions Bot mentioned this pull request Jul 2, 2026

[Contribution Check Report] Contribution Check — 2026-07-02 #42873

Open

Uh oh!

Conversation

Copilot AI commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

Copilot AI Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 2, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Review: Restore token usage metrics fallback

Bug: Log lines report stale totalTokens=0 in the fallback path

What looks good

Comments that could not be inline-anchored

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Comments that could not be inline-anchored

Uh oh!

pelikhan commented Jul 2, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jul 2, 2026 •

edited

Loading

github-actions Bot commented Jul 2, 2026 •

edited

Loading

github-actions Bot commented Jul 2, 2026 •

edited

Loading

github-actions Bot commented Jul 2, 2026 •

edited

Loading

github-actions Bot commented Jul 2, 2026 •

edited

Loading

Bug: Log lines report stale `totalTokens=0` in the fallback path