Skip to content

benchmarks: loc.js misses CRLF code fences and counts the whole response #339

Description

@Lakshya77089

loc.js extracts fenced code blocks with

/```[a-zA-Z0-9_+-]*
([\s\S]*?)```/g

There is no ? before the , so a CRLF fence (```js) never matches. With no block found, the metric falls back to counting the entire response, prose and fence lines included.

Its sibling extractor in correctness.js already handles this: /```(\w*) ? ([\s\S]*?)```/g. So on CRLF output the two benchmark files disagree: the correctness gate finds the code, the LOC metric counts everything.

Repro:

const loc = require('./benchmarks/loc.js');
const lf   = 'Here:
```js
const a = 1;
const b = 2;

Done.';
console.log(loc(lf).score); // 2 (correct)
console.log(loc(lf.replace(/
/g, '
')).score); // 6 (counts prose + fences)


Impact: any CRLF in the model output or anywhere the text is normalized to CRLF silently inflates the headline LOC number. Fix is to add `
?` to the fence regex so it matches `correctness.js`. PR incoming with a regression test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions