fix: no-mmap safetensors loader for >4GB models on Windows ROCm/UMA by HD-LV · Pull Request #14587 · Comfy-Org/ComfyUI

HD-LV · 2026-06-22T17:56:24Z

Problem

On Windows with AMD ROCm using a UMA (Unified Memory Architecture) GPU (e.g. AMD Radeon 8050S / Strix Halo, gfx1151), loading safetensors files larger than ~4 GB causes an access violation crash.

Root cause: ROCm initializes by reserving ~14 GB of Windows virtual address space for the GPU. This exhausts the contiguous address space needed for mmap, so any file above ~4 GB fails to memory-map.

SD 1.5 checkpoints (~3.97 GB) are below the threshold and load fine. SDXL/RealVisXL (~6.46 GB) reliably crash.

Fix

In comfy/utils.py, when a .safetensors file exceeds 4 GB and CUDA/ROCm is active, route it through a new _load_safetensors_no_mmap() function that reads tensors sequentially via open() + seek() + read() instead of mmap. All other files continue using the existing mmap path — no behavior change for unaffected users.

Tested on

Device: AMD Radeon 8050S (gfx1151 / Strix Halo), 14.37 GB UMA VRAM
OS: Windows 11
ROCm: 6.5 (scottt/rocm-TheRock gfx1151 wheels)
PyTorch: 2.7.0a0
SD 1.5 (3.97 GB): unaffected, still uses mmap path ✅
RealVisXL V4.0 (6.46 GB): loads successfully, 768×1024 portrait ~5 it/s ✅

- torch 2.7.0a0 + ROCm 6.5 via scottt/rocm-TheRock gfx1151 wheels - numpy pinned to 1.26.4 for wheel compatibility - SD1.5 512x512 20 steps ~5 it/s confirmed stable - Saved workflow: sd15_test_rocm_workflow.json - AMD Radeon 8050S, 14.37 GB UMA VRAM correctly detected

Root cause: Strix Halo UMA ROCm init reserves ~14 GB of Windows virtual address space for GPU. This prevents safetensors from mmap-ing files larger than ~4 GB (SDXL fp16 ~6.5 GB), causing access violations. SD1.5 (3.97 GB) is below the threshold and unaffected. Fix in comfy/utils.py: - Add _LARGE_FILE_MMAP_THRESHOLD = 4_000_000_000 - Add _load_safetensors_no_mmap(): reads tensors via open()+seek()+read() instead of mmap, then clones each tensor for independent ownership - In load_torch_file(): route files >4 GB with CUDA active through _load_safetensors_no_mmap() automatically Tested: RealVisXL_V4.0.safetensors (6.46 GB) loads and generates 768x1024 portrait images at ~5 it/s on AMD Radeon 8050S (gfx1151). SD1.5 baseline unaffected (still uses original mmap path).

coderabbitai · 2026-06-22T18:00:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b4f2f59d-32f5-4c7e-8f34-7709e2c50033

📥 Commits

Reviewing files that changed from the base of the PR and between e912b91 and 0df0b0d.

📒 Files selected for processing (3)

baselines/system_info.txt
baselines/system_info_realvisxl.txt
comfy/utils.py

✅ Files skipped from review due to trivial changes (2)

baselines/system_info_realvisxl.txt
baselines/system_info.txt

🚧 Files skipped from review as they are similar to previous changes (1)

comfy/utils.py

📝 Walkthrough

Walkthrough

A new no-mmap fallback is introduced in comfy/utils.py for loading large safetensors files on Windows+CUDA systems. A constant _LARGE_FILE_MMAP_THRESHOLD (4 GB) and helper _load_safetensors_no_mmap are added; the helper parses the safetensors header and reads each tensor sequentially via seek+read, returning cloned PyTorch tensors. load_torch_file routes to this path when the file exceeds the threshold, aimdo is disabled, and CUDA is available. Alongside the code change, four baseline files are added: a pinned ROCm package environment list, two system-info snapshots (one for SD1.5, one for RealVisXL no-mmap), and a ComfyUI workflow JSON for SD1.5 ROCm testing.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: a no-mmap safetensors loader specifically targeting large models on Windows ROCm/UMA systems.
Description check	✅ Passed	The description comprehensively explains the problem (mmap failures with large files on Windows ROCm), root cause (virtual address space exhaustion), solution (sequential read fallback), and testing validation across different model sizes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@baselines/system_info_realvisxl.txt`:
- Around line 25-27: Replace the hardcoded personal account path in the cd
command with a generic environment variable or placeholder. In the line
containing cd C:\Users\LvHHu\ComfyUI, replace the specific username path LvHHu
with either %USERPROFILE%\ComfyUI (to use the Windows environment variable) or
<COMFYUI_ROOT> (to use a generic placeholder), making the instructions reusable
across different machines and preventing exposure of personal identifiers.

In `@baselines/system_info.txt`:
- Around line 31-33: The baseline documentation at lines 31-33 contains a
personal machine username in the file path C:\Users\LvHHu\ComfyUI, which exposes
private information and reduces portability across different machines. Replace
the hardcoded username LvHHu with a neutral environment variable placeholder
such as %USERPROFILE% so the instructions become C:\Users\%USERPROFILE%\ComfyUI,
or alternatively use a generic placeholder like <COMFYUI_ROOT> to make the
baseline documentation portable and suitable for general use.

In `@comfy/utils.py`:
- Around line 125-126: The _load_safetensors_no_mmap function currently ignores
the device parameter and always returns CPU tensors, which breaks API
consistency with the existing safetensors loading path. Modify the
_load_safetensors_no_mmap function to accept a device parameter and move the
loaded tensors to the specified device before returning them, ensuring it
matches the device handling behavior of the standard safetensors loading path.
Apply the same fix to any other locations mentioned (lines 161-163) where the
device semantics need to be preserved.
- Around line 158-161: The condition on line 158 that checks
torch.cuda.is_available() applies the slower no-mmap loading path too broadly to
all CUDA environments with files greater than 4 GB. Instead of using
torch.cuda.is_available(), scope this fallback to specifically target the
affected platform by checking for Windows (using sys.platform or platform
module) and optionally detecting ROCm specifically, so that only the systems
with the UMA GPU initialization issue use the _load_safetensors_no_mmap fallback
instead of unnecessarily adding load-time and RAM overhead to all other CUDA
systems.
- Around line 139-144: The condition `if raw:` in this block only checks if any
bytes were read, but does not verify that the complete expected amount was read.
A partial read returning fewer bytes than `end - start` would pass this check
and create a corrupted tensor, while a failed read would silently create an
empty tensor. Replace the `if raw:` check with a condition that validates the
length of `raw` equals the expected length `end - start`. If the read is
incomplete or returns the wrong number of bytes, treat it as corruption and
raise an error or handle it appropriately rather than silently creating an empty
tensor with `torch.empty(shape, dtype=dtype)`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7fc4afed-789f-403f-a32e-f88f34c4b2e6

📥 Commits

Reviewing files that changed from the base of the PR and between 6978a46 and e912b91.

📒 Files selected for processing (5)

baselines/environment_rocm_working.txt
baselines/system_info.txt
baselines/system_info_realvisxl.txt
baselines/workflows/sd15_test_rocm_workflow.json
comfy/utils.py

- utils.py: add device param to _load_safetensors_no_mmap, move tensors to target device instead of always returning CPU tensors - utils.py: validate read length == expected bytes; raise RuntimeError on partial/corrupt reads instead of silently creating empty tensors - utils.py: scope no-mmap fallback to sys.platform == win32 to avoid unnecessary overhead on Linux/Mac CUDA systems; add sys import - baselines: replace hardcoded LvHHu username with %USERPROFILE% in startup commands for portability Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

HD-LV added 2 commits June 22, 2026 18:52

HD-LV requested review from Kosinkadink, alexisrolland, comfyanonymous, guill, kijai and rattus128 as code owners June 22, 2026 17:56

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread baselines/system_info_realvisxl.txt Outdated

Comment thread baselines/system_info.txt Outdated

Comment thread comfy/utils.py Outdated

Comment thread comfy/utils.py Outdated

Comment thread comfy/utils.py Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: no-mmap safetensors loader for >4GB models on Windows ROCm/UMA#14587

fix: no-mmap safetensors loader for >4GB models on Windows ROCm/UMA#14587
HD-LV wants to merge 3 commits into
Comfy-Org:masterfrom
HD-LV:master

HD-LV commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

HD-LV commented Jun 22, 2026

Problem

Fix

Tested on

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading