decode partial stdout/stderr in run_path TimeoutExpired branch#64
decode partial stdout/stderr in run_path TimeoutExpired branch#64HrachShah wants to merge 1 commit into
Conversation
When a subprocess call exceeds its timeout, subprocess.run returns a TimeoutExpired exception whose stdout/stderr attributes hold the raw bytes the OS had already delivered to the pipe before the kill — even when the call itself was configured with encoding='utf-8' for a uniform str return on the normal completion path. The previous implementation just forwarded those bytes straight into the RunResult, so the result's stdout was bytes and stderr was str (or vice-versa) on timeout, depending on which stream had buffered output. A caller that did 'result.stdout + other_str' would get TypeError on timeout only; on success the same code worked. The decode here matches the encoding passed to subprocess.run so the result is uniformly typed regardless of whether the subprocess finished cleanly or was killed by the timeout. Adds a test that runs a long-running script with a short timeout, captures partial stdout, and asserts RunResult.stdout is a str.
|
@HrachShah please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
| # normal completion path, but TimeoutExpired still carries the | ||
| # raw bytes captured from the pipe before decoding — produce | ||
| # str here so RunResult is uniformly typed. | ||
| partial_out = e.stdout.decode("utf-8", errors="replace") if e.stdout else "" |
There was a problem hiding this comment.
📍 python/vscode_common_python_lsp/runner.py:139
Critical, platform-asymmetric regression: this fix crashes on Windows. The Skeptic, Architect, and Advocate all verified by execution on Windows/CPython 3.14 that TimeoutExpired.stdout/.stderr are already str there — CPython's subprocess.run calls process.communicate() after kill on _mswindows, which decodes because encoding="utf-8" is set. So .decode(...) raises AttributeError: 'str' object has no attribute 'decode' whenever there is partial output (the exact case this PR targets). The if e.stdout guard only covers the empty case. Pre-fix Windows behavior was already correct str; this makes it worse. Decode defensively instead, e.g. def _as_str(v): return "" if not v else (v.decode("utf-8", errors="replace") if isinstance(v, bytes) else v) then RunResult(_as_str(e.stdout), _as_str(e.stderr), None). The POSIX half (errors="replace" for a multibyte char severed at the kill boundary) is sound and should be preserved.
[verified]
| assert isinstance(result.stdout, str) | ||
| assert isinstance(result.stderr, str) | ||
| assert "partial line" in result.stdout | ||
|
|
There was a problem hiding this comment.
📍 python/tests/test_runner.py:73
This test only exercises the POSIX path and will itself AttributeError on Windows (where TimeoutExpired.stdout is already str), so a POSIX-only CI gives false confidence about the platform the current patch breaks. Add coverage that injects a fake TimeoutExpired carrying str output (Windows reality) in addition to bytes (POSIX), so both legs of the normalization are pinned. Separately, the genuinely subtle errors="replace" choice for a multibyte sequence truncated at the kill boundary has no test — consider emitting an incomplete multi-byte sequence before the kill to lock in that behavior.
[verified]
| # stdout/stderr must be str, not bytes, regardless of whether | ||
| # the subprocess had produced any output before being killed | ||
| assert isinstance(result.stdout, str) | ||
| assert isinstance(result.stderr, str) |
There was a problem hiding this comment.
📍 python/tests/test_runner.py:94
Review-rule violation (tests-no-partial-asserts). The subprocess output is deterministic (print('partial line') with -u yields exactly "partial line\n", and universal-newline translation normalizes any \r\n), and none of the rule's non-determinism exceptions (timestamps, random IDs, PIDs) apply. Replace assert "partial line" in result.stdout with assert result.stdout == "partial line\n" for an exact assertion.
[verified]
| # normal completion path, but TimeoutExpired still carries the | ||
| # raw bytes captured from the pipe before decoding — produce | ||
| # str here so RunResult is uniformly typed. | ||
| partial_out = e.stdout.decode("utf-8", errors="replace") if e.stdout else "" |
There was a problem hiding this comment.
📍 python/vscode_common_python_lsp/runner.py:135
The comment says "same encoding that was passed to subprocess.run," but the decode hardcodes the "utf-8" literal independently from the encoding="utf-8" argument three lines up — the two would silently desync if either changes or if run_path ever gains an encoding parameter. Prefer deriving the decode encoding from a single source. Also consider applying the same normalization to the use_stdin=True Popen timeout branch so both timeout paths share one rule.
[verified]
|
Solid catch on the POSIX bytes/str inconsistency, and the test is a good repro. The blocker is that the fix unconditionally calls .decode() on TimeoutExpired output, which crashes on Windows where that output is already str. Normalize defensively (isinstance check or only-decode-if-bytes) and add coverage for the str path before merge. |
|
🔒 Automated review in progress — @rchiodo is auto-reviewing this PR via the Review Center. Other automated reviewers defer to this claim to avoid duplicate, conflicting feedback. The claim is refreshed while the review is active; if it is not refreshed within the ownership timeout it is treated as vacant and another reviewer may take over. This is an automated marker. |
| # str here so RunResult is uniformly typed. | ||
| partial_out = e.stdout.decode("utf-8", errors="replace") if e.stdout else "" | ||
| partial_err = e.stderr.decode("utf-8", errors="replace") if e.stderr else "" | ||
| return RunResult(partial_out, partial_err, None) |
There was a problem hiding this comment.
📍 python/vscode_common_python_lsp/runner.py:136
The unconditional e.stdout.decode(...) assumes TimeoutExpired always carries bytes, but that is only true on POSIX. The Advocate, Skeptic, and Architect each independently verified (live, Python 3.14.3 / Windows) that subprocess.run(encoding="utf-8") populates TimeoutExpired.stdout as str on Windows, so this branch raises AttributeError: 'str' object has no attribute 'decode' — an uncaught crash thrown from inside run_path on every timeout, strictly worse than the bytes it replaced (which only blew up downstream and only for some callers). Make the conversion type-defensive, e.g. a helper def _as_text(v): return "" if not v else (v if isinstance(v, str) else v.decode("utf-8", errors="replace")) applied to both fields. The errors="replace" guard against a truncated multibyte sequence is correct and should be kept inside that helper.
[verified]
| # the subprocess had produced any output before being killed | ||
| assert isinstance(result.stdout, str) | ||
| assert isinstance(result.stderr, str) | ||
| assert "partial line" in result.stdout |
There was a problem hiding this comment.
📍 python/tests/test_runner.py:73
This test encodes the same false invariant as the fix: on Windows/newer CPython TimeoutExpired.stdout is already str, so the fixed code raises AttributeError before any assertion runs, making the test error (not pass) on Windows CI — the author's "18 passed" is POSIX-only. Make it platform/version-agnostic so it asserts the str contract on both the bytes-carrying (POSIX) and str-carrying (Windows) timeout behaviors, e.g. via monkeypatch/parametrize that forces e.stdout to each type.
[verified]
| # str here so RunResult is uniformly typed. | ||
| partial_out = e.stdout.decode("utf-8", errors="replace") if e.stdout else "" | ||
| partial_err = e.stderr.decode("utf-8", errors="replace") if e.stderr else "" | ||
| return RunResult(partial_out, partial_err, None) |
There was a problem hiding this comment.
📍 python/vscode_common_python_lsp/runner.py:136
The Skeptic and Architect both observe the sibling use_stdin=True timeout path already yields uniform str by doing process.kill(); process.communicate() (which decodes in text mode). Mirroring that proven kill-then-communicate pattern here would have produced str on all platforms and avoided this bug entirely; worth considering instead of decoding the exception attribute, to keep the two branches consistent.
[verified]
|
Good catch on the bytes-vs-str timeout bug, but the decode is only correct on POSIX. On Windows |
|
🔒 Automated review in progress — @rchiodo is auto-reviewing this PR via the Review Center. Other automated reviewers defer to this claim to avoid duplicate, conflicting feedback. The claim is refreshed while the review is active; if it is not refreshed within the ownership timeout it is treated as vacant and another reviewer may take over. This is an automated marker. |
| # stdout/stderr must be str, not bytes, regardless of whether | ||
| # the subprocess had produced any output before being killed | ||
| assert isinstance(result.stdout, str) | ||
| assert isinstance(result.stderr, str) |
There was a problem hiding this comment.
📍 python/tests/test_runner.py:92
Confirmed violation of review rule tests-no-partial-asserts.md: assert "partial line" in result.stdout is a banned substring check. The output is fully deterministic (print('partial line') always yields "partial line\n"), so an exact assertion is achievable and required: assert result.stdout == "partial line\n". Per the non-downgradeable rule gate this stays issue — the only way to waive it is to amend the rule doc, not this PR.
[verified]
|
The core fix is directionally right (RunResult should be uniformly typed), but the unconditional |
|
🔒 Automated review in progress — @rchiodo is auto-reviewing this PR. |
What
run_path(the no-use_stdinbranch) returns aRunResultwhosestdout/stderrwerebytesinstead ofstrwhenever the subprocess timed out.Why
subprocess.run(..., encoding="utf-8")decodes bytes from the pipe tostron the normal completion path, butsubprocess.TimeoutExpired.stdout/.stderrstill carry the raw bytes the OS delivered before the kill. The previous code forwarded those bytes straight intoRunResult(e.stdout or "", e.stderr or "", None), so:result.stdoutisstr(whatever the subprocess printed).result.stdoutisbytes(whatever was buffered before kill).Any caller that did
result.stdout + "\n"orf"tool said: {result.stdout}"would work fine on success andTypeError: can only concatenate str (not "bytes") to stron timeout — exactly the case where you most want the partial output to surface to the user.This is most visible when a formatter / linter is slow on a large file, hits the LSP-side
FORMATTING_TIMEOUT, and the caller wants to surface the partial stderr ("black: would reformat ..."). Today you getb"black: would reformat ...\n".Fix
Decode
e.stdoutande.stderrusing the same encoding that was passed tosubprocess.run, witherrors="replace"so a malformed UTF-8 partial byte at the cut-off point doesn't raise. The decode mirrors the encoding on the happy path so the result is uniformly typed.Added a test (
test_timeout_returns_str_stdout_stderr) that asserts the result isstrafter a forced timeout with partial output. The test fails on the pre-fix code with the exact TypeError callers would see at runtime.Repro
Test