fix: enable Wan/TAEHV video generation on the Metal backend (#850)#1731
Merged
Conversation
The Metal backend aborts with "unsupported op 'IM2COL_3D'" when running Wan 2.1/2.2 video generation (leejet#850): - ggml_ext_conv_3d: call ggml_conv_3d_direct (GGML_OP_CONV_3D, which the Metal backend supports) instead of ggml_conv_3d, which decomposes into GGML_OP_IM2COL_3D (unsupported on Metal). - ggml_ext_pad_ext: Metal's PAD kernel only implements right-padding. When any left-pad is requested, decompose into a right-pad of lp+rp followed by ggml_roll by lp (ROLL is supported on Metal; shift < ne always holds because ne grew by lp+rp). - wan_vae.hpp / tae.hpp: route direct ggml_pad_ext calls that use left-padding through ggml_ext_pad_ext. Tested on an M4 Pro (24 GB): Wan 2.2 TI2V-5B q8_0, t2v and i2v, with both TAEHV and full VAE decode.
- ggml_ext_pad_ext and ggml_ext_conv_3d now take the runner backend and only use the fallbacks (pad+roll, conv_3d_direct) when ggml_backend_supports_op reports the native op unsupported. This keeps graphs unchanged on other backends: CUDA implements GGML_OP_IM2COL_3D but not GGML_OP_CONV_3D, so switching unconditionally would break it. - Route the LTX audio VAE left-pad call sites through ggml_ext_pad_ext as well (compile-tested only; the model does not fit in 24 GB).
Owner
|
Thank you for your contribution. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wan 2.1/2.2 video generation aborts on the Metal backend with
unsupported op 'IM2COL_3D'(#850). There are two separate problems:ggml_ext_conv_3dbuildsggml_conv_3d, which decomposes intoGGML_OP_IM2COL_3D— not implemented by the Metal backend.GGML_OP_PADops, and Metal's PAD kernel only implements right-padding (itssupports_oprequires op params 0/2/4/6 to be zero).This PR fixes both on the sd.cpp side, without touching
ggml(per CONTRIBUTING, ggml updates are maintainer-only). Both fixes are gated onggml_backend_supports_op, so graphs on backends that already work are byte-for-byte unchanged:ggml_ext_conv_3dnow takes the runner backend and falls back toggml_conv_3d_direct(GGML_OP_CONV_3D, which Metal implements) only when the backend does not support the probedIM2COL_3Dop. The probe matters: CUDA implementsIM2COL_3Dbut notGGML_OP_CONV_3D, so switching unconditionally would break CUDA video generation.ggml_ext_pad_extnow takes the runner backend; when the native (left-)pad op is unsupported, it decomposes into a right-pad bylp+rpfollowed byggml_rollbylp(GGML_OP_ROLLis unconditionally supported on Metal; theshift < neassertion always holds becausenegrew bylp+rp).ggml_pad_extcalls with left-padding inwan_vae.hpp,tae.hpp, andltx_audio_vae.hppare routed throughggml_ext_pad_extand pass the runner backend.Once the missing Metal kernels land in ggml (see below), the gates automatically stop firing — no cleanup is required to benefit.
Related Issue / Discussion
IM2COL_3D, PAD left-padding). It is still open/unmerged, and per CONTRIBUTING ggml bumps are maintainer-only, so this PR makes video generation work with the currently vendored ggml. The two are complementary.Additional Information
Verified on an M4 Pro (24 GB, macOS 15 / Darwin 25.5), Metal backend:
ggml_abortwithunsupported op 'IM2COL_3D'before the first diffusion step).taew2_2.safetensors) and the full Wan 2.2 VAE.--diffusion-fa --offload-to-cpu→ ~5 min end-to-end.Not covered: the two LTX audio VAE left-pad sites received the same mechanical change and compile, but LTX audio was not run (the model does not fit in 24 GB). On backends with native left-pad support the gate keeps the original single PAD op there, so only Metal behavior changes.
Checklist