Skip to content

Restore token usage metrics when events.jsonl omits session.shutdown.modelMetrics#43007

Merged
pelikhan merged 4 commits into
mainfrom
copilot/deep-report-reopen-token-usage-collection
Jul 2, 2026
Merged

Restore token usage metrics when events.jsonl omits session.shutdown.modelMetrics#43007
pelikhan merged 4 commits into
mainfrom
copilot/deep-report-reopen-token-usage-collection

Conversation

Copilot AI commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Fleet-wide token metrics were reading as zero because events.jsonl parsing only counted tokens from session.shutdown.modelMetrics. Runs that emitted per-event usage but no shutdown model metrics were therefore recorded as TokenUsage=0.

  • Token extraction fallback path

    • Updated parseEventsJSONLMetrics to preserve existing precedence (session.shutdown.modelMetrics first), then fall back to per-event usage fields when shutdown totals are absent.
    • Fallback sources now include:
      • data.usage (input_tokens + output_tokens)
      • direct nested data.input_tokens / data.output_tokens
      • top-level usage_input_tokens / usage_output_tokens
      • top-level usageInputTokens / usageOutputTokens
  • Behavioral guardrail

    • Existing shutdown-based totals remain authoritative when present, avoiding double counting and preserving current semantics.
  • Regression coverage

    • Added focused parser tests for:
      • fallback token aggregation when shutdown model metrics are missing
      • precedence behavior when both shutdown totals and per-event usage are present
metrics.TokenUsage = totalTokens // from session.shutdown.modelMetrics
if metrics.TokenUsage == 0 && fallbackTokens > 0 {
    metrics.TokenUsage = fallbackTokens // per-event usage fallback
}

Copilot AI and others added 2 commits July 2, 2026 16:13
…fields

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Re-open investigation on fleet-wide token usage collection Restore token usage metrics when events.jsonl omits session.shutdown.modelMetrics Jul 2, 2026
Copilot AI requested a review from pelikhan July 2, 2026 16:23
@pelikhan pelikhan marked this pull request as ready for review July 2, 2026 20:21
Copilot AI review requested due to automatic review settings July 2, 2026 20:21
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #43007 does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (61 additions detected).

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

⚠️ PR Code Quality Reviewer failed during code quality review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes token usage metric extraction from Copilot events.jsonl so runs that emit per-event usage (but omit session.shutdown.modelMetrics) no longer report TokenUsage=0.

Changes:

  • Added support for extracting token usage from per-event usage fields in events.jsonl as a fallback when shutdown metrics are missing.
  • Extended the events.jsonl entry/data structs to capture additional usage token field shapes.
  • Added regression tests covering fallback aggregation and shutdown-vs-fallback precedence.
Show a summary per file
File Description
pkg/cli/copilot_events_jsonl.go Adds fallback token aggregation logic and supporting JSON fields for events.jsonl parsing.
pkg/cli/copilot_events_jsonl_test.go Adds tests to validate fallback aggregation and precedence over shutdown totals.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 2
  • Review effort level: Low

Comment on lines 297 to +300
metrics.TokenUsage = totalTokens
if metrics.TokenUsage == 0 && fallbackTokens > 0 {
metrics.TokenUsage = fallbackTokens
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit. Added sawShutdownModelMetrics := false and set it to true inside if entry.Data.ModelMetrics != nil within the session.shutdown case. The guard at the end now uses !sawShutdownModelMetrics && fallbackTokens > 0, so a present-but-empty (or present-but-zero) modelMetrics block correctly suppresses the fallback.

Comment on lines +339 to +345
if entry.Data.Usage != nil {
// ExtractJSONTokenUsage expects a full event/document map and knows how to
// parse usage when it appears at key "usage", so wrap the nested usage map.
if tokens := workflow.ExtractJSONTokenUsage(map[string]any{"usage": entry.Data.Usage}); tokens > 0 {
return tokens
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit. Replaced workflow.ExtractJSONTokenUsage with a direct extraction that reads only input_tokens (falling back to prompt_tokens) and output_tokens (falling back to completion_tokens), deliberately skipping cache_creation_input_tokens and cache_read_input_tokens. Extracted into a usageField helper to avoid duplicating the primary/alias lookup. Added a regression test (fallback excludes cache tokens from per-event usage) that verifies the cache fields are not counted.

@github-actions github-actions Bot mentioned this pull request Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 100/100 — Excellent

Analyzed 2 test(s): 2 design, 0 implementation, 0 violation(s).

📊 Metrics (2 tests)
Metric Value
Analyzed 2 (Go: 2, JS: 0)
✅ Design 2 (100%)
⚠️ Implementation 0 (0%)
Edge/error coverage 2 (100%)
Duplicate clusters 0
Inflation No (test:prod = 20:41 ≈ 0.49:1)
🚨 Violations 0
Test File Classification Issues
falls back to per-event usage when session.shutdown model metrics are absent copilot_events_jsonl_test.go:152 design_test / behavioral_contract None
prefers session.shutdown totals when both shutdown and per-event usage exist copilot_events_jsonl_test.go:161 design_test / behavioral_contract None

Verdict

Passed. 0% implementation tests (threshold: 30%). Both new test cases cover non-happy-path behaviors (fallback path and priority-rule), are fully assertion-guarded with descriptive messages, the file carries the required //go:build !integration tag, and no mock libraries are used. Test-to-production line ratio is 0.49:1 (well under the 2:1 threshold).

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • awmgmcpg

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "awmgmcpg"

See Network Configuration for more information.

🧪 Test quality analysis by Test Quality Sentinel · 38 AIC · ⌖ 15.7 AIC · ⊞ 6.8K ·
Comment /review to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. 0% implementation tests (threshold: 30%). Both new test cases target behavioral contracts (fallback path + priority rule), carry required build tag, no mock library usage, and test-to-production ratio is well under 2:1.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Restore token usage metrics fallback

Overall: logic is correct, but observability for the new fallback path is broken.

The fallback mechanism (totalTokens == 0 → fallbackTokens) is sound and the precedence tests cover the key scenarios well. One actual bug in logging needs to be fixed.

Bug: Log lines report stale totalTokens=0 in the fallback path

Both the debug log (line 326) and the verbose user-facing message (line 331) still print the local totalTokens variable. In the exact scenario this PR fixes (no session.shutdown.modelMetrics), totalTokens remains 0 even after the fallback populates metrics.TokenUsage. This means:

  • Debug logs show totalTokens=0 for successful fallback runs.
  • Users running with --verbose see 0 tokens in the summary line.

Fix: use metrics.TokenUsage in both places after the fallback assignment (inline comments added).

What looks good

  • Precedence guarantee: shutdown totals always win when present.
  • Four-source fallback covers the known field-name variants.
  • New tests are focused and correctly anchor expected token counts.
  • extractFallbackEventTokens is isolated and easy to extend.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 36 AIC · ⌖ 6.5 AIC · ⊞ 4.9K

Comments that could not be inline-anchored

pkg/cli/copilot_events_jsonl.go:326

The summary log line (and the verbose message a few lines below) still uses the local totalTokens variable, which will be 0 in the fallback path this PR introduces. After the fallback assignment on line 298, the log will report totalTokens=0 even though metrics.TokenUsage was just set to fallbackTokens.

Consider using metrics.TokenUsage in the log (or logging fallbackTokens separately) to make observability accurate:

copilotEventsJSONLLog.Printf(&quot;Parsed events.jsonl: tur</details>

<details><summary>pkg/cli/copilot_events_jsonl.go:331</summary>

The `verbose` user-facing message also prints `totalTokens` (the local variable), which will be `0` in the fallback path. This would show the user `0 tokens` in verbose output even after the fallback populated `metrics.TokenUsage`. Change to `metrics.TokenUsage`:

```go
fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(
    fmt.Sprintf(&quot;Parsed events.jsonl: %d turns, %d tokens, %d tool calls&quot;,
        turns, metrics.TokenUsage, len(toolCallMap))))

@copilot please address this.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnosing-bugs, /tdd, and /codebase-design — requesting changes on one correctness issue and test-coverage gaps.

📋 Key Themes & Highlights

Key Themes

  • Verbose log bug (correctness): --verbose prints totalTokens (the shutdown accumulator), not metrics.TokenUsage, so it always shows 0 for sessions that take the fallback path — undoing the observability benefit of this fix.
  • Partial test coverage: Only the data.usage fallback path is exercised; the other three paths (data.input_tokens, top-level snake_case, top-level camelCase) are untested. extractFallbackEventTokens also lacks a direct unit test that documents its precedence rules.
  • Latent double-count risk: extractFallbackEventTokens runs on every event including session.shutdown. The TokenUsage == 0 guard prevents a collision today, but skipping the shutdown event explicitly is structurally safer.
  • Naming: AlternateUsageInputTokens / AlternateUsageOutputTokens don't communicate that these are the camelCase JSON variants; the Go field names should reflect the distinction more clearly.

Positive Highlights

  • ✅ Correct precedence design: shutdown metrics remain authoritative, fallback only activates when TokenUsage == 0
  • ✅ Good use of workflow.ExtractJSONTokenUsage to reuse existing parsing logic rather than duplicating it
  • ✅ PR description is clear and includes a code snippet illustrating the guard — well done
  • ✅ Regression test for precedence behaviour (shutdown + per-event both present) is exactly right

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 69 AIC · ⌖ 7.84 AIC · ⊞ 6.6K
Comment /matt to run again

Comments that could not be inline-anchored

pkg/cli/copilot_events_jsonl.go:332

[/diagnosing-bugs] The verbose log prints totalTokens (the shutdown-path accumulator), which is 0 when only the fallback path is active — so --verbose will always report 0 tokens for runs without session.shutdown.modelMetrics.

<details>
<summary>💡 Suggested fix</summary>

Change the verbose print to use metrics.TokenUsage, which holds the final resolved value:

fmt.Fprintln(os.Stderr, console.FormatSuccessMessage(
    fmt.Sprintf(&quot;Parsed events.jsonl: %d turns, %d tokens</details>

<details><summary>pkg/cli/copilot_events_jsonl.go:221</summary>

**[/diagnosing-bugs]** `extractFallbackEventTokens` is called unconditionally for every event, including `session.shutdown`. If a shutdown event ever carries top-level `usage_input_tokens` / `usage_output_tokens` fields, those tokens will be double-counted: once in `totalTokens` (from `ModelMetrics`) and once in `fallbackTokens`.

&lt;details&gt;
&lt;summary&gt;💡 Suggested fix&lt;/summary&gt;

Guard the fallback accumulation to skip the shutdown event, since its tokens are already handled by the dedicated `sess…

</details>

<details><summary>pkg/cli/copilot_events_jsonl_test.go:160</summary>

**[/tdd]** The two new test cases only exercise `data.usage` (the first fallback path). The three other paths added in this PR — `data.input_tokens`/`data.output_tokens`, top-level `usage_input_tokens`/`usage_output_tokens`, and top-level `usageInputTokens`/`usageOutputTokens` — have no coverage.

&lt;details&gt;
&lt;summary&gt;💡 Suggested additional test cases&lt;/summary&gt;

Add table entries exercising each path independently, e.g.:

```go
{
    name: &quot;falls back to data.input_tokens / data.output_tokens&quot;,


</details>

<details><summary>pkg/cli/copilot_events_jsonl.go:49</summary>

**[/codebase-design]** The field names `AlternateUsageInputTokens` / `AlternateUsageOutputTokens` obscure what they represent. The JSON tags (`usageInputTokens` / `usageOutputTokens`) are already self-documenting; the Go names should mirror them.

&lt;details&gt;
&lt;summary&gt;💡 Suggested renaming&lt;/summary&gt;

```go
UsageInputTokensCamel  int `json:&quot;usageInputTokens,omitempty&quot;`
UsageOutputTokensCamel int `json:&quot;usageOutputTokens,omitempty&quot;`

Or, simply suffix the existing names to encode the casing var…

pkg/cli/copilot_events_jsonl.go:338

[/tdd] extractFallbackEventTokens is package-private but has no direct unit test — it is only exercised indirectly through the parseEventsJSONLMetrics integration path. A direct unit test would make the precedence rules between the four fallback paths explicit and independently verifiable.

<details>
<summary>💡 Suggested unit test skeleton</summary>

func TestExtractFallbackEventTokens(t *testing.T) {
    tests := []struct {
        name  string
        entry copilotEventsJSONLEn</details>

@pelikhan

pelikhan commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

…raction

- Use sawShutdownModelMetrics boolean instead of TokenUsage==0 guard so
  a present-but-empty modelMetrics block is treated as authoritative
- Replace workflow.ExtractJSONTokenUsage with direct input+output
  extraction (no cache tokens) via new usageField helper
- Add tests: zero-but-present shutdown, cache exclusion from fallback

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan merged commit dfac557 into main Jul 2, 2026
28 of 29 checks passed
@pelikhan pelikhan deleted the copilot/deep-report-reopen-token-usage-collection branch July 2, 2026 21:19
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Great work on this fix! 🎉 The token metrics fallback logic is clearly reasoned — preserving session.shutdown.modelMetrics as authoritative while gracefully handling the zero-token case with per-event usage fields is exactly the right approach.

The PR description is thorough (root cause, fallback sources, behavioral guardrail, and the inline code snippet), and the regression tests cover both the fallback aggregation path and the precedence behavior when shutdown totals are present. This looks ready for review.

Generated by ✅ Contribution Check · 184.6 AIC · ⌖ 12.9 AIC · ⊞ 6.3K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[deep-report] Re-open fleet-wide token_usage collection — TokenUsage=0 for ~13 consecutive days

3 participants