Skip to content

fix: enable Wan/TAEHV video generation on the Metal backend (#850)#1731

Merged
leejet merged 2 commits into
leejet:masterfrom
mmandelker-code:metal-video-pad-fixes
Jul 2, 2026
Merged

fix: enable Wan/TAEHV video generation on the Metal backend (#850)#1731
leejet merged 2 commits into
leejet:masterfrom
mmandelker-code:metal-video-pad-fixes

Conversation

@mmandelker-code

Copy link
Copy Markdown
Contributor

Summary

Wan 2.1/2.2 video generation aborts on the Metal backend with unsupported op 'IM2COL_3D' (#850). There are two separate problems:

  1. ggml_ext_conv_3d builds ggml_conv_3d, which decomposes into GGML_OP_IM2COL_3D — not implemented by the Metal backend.
  2. The Wan VAE, TAEHV, and LTX audio VAE build left-padding GGML_OP_PAD ops, and Metal's PAD kernel only implements right-padding (its supports_op requires op params 0/2/4/6 to be zero).

This PR fixes both on the sd.cpp side, without touching ggml (per CONTRIBUTING, ggml updates are maintainer-only). Both fixes are gated on ggml_backend_supports_op, so graphs on backends that already work are byte-for-byte unchanged:

  • ggml_ext_conv_3d now takes the runner backend and falls back to ggml_conv_3d_direct (GGML_OP_CONV_3D, which Metal implements) only when the backend does not support the probed IM2COL_3D op. The probe matters: CUDA implements IM2COL_3D but not GGML_OP_CONV_3D, so switching unconditionally would break CUDA video generation.
  • ggml_ext_pad_ext now takes the runner backend; when the native (left-)pad op is unsupported, it decomposes into a right-pad by lp+rp followed by ggml_roll by lp (GGML_OP_ROLL is unconditionally supported on Metal; the shift < ne assertion always holds because ne grew by lp+rp).
  • Direct ggml_pad_ext calls with left-padding in wan_vae.hpp, tae.hpp, and ltx_audio_vae.hpp are routed through ggml_ext_pad_ext and pass the runner backend.

Once the missing Metal kernels land in ggml (see below), the gates automatically stop firing — no cleanup is required to benefit.

Related Issue / Discussion

Additional Information

Verified on an M4 Pro (24 GB, macOS 15 / Darwin 25.5), Metal backend:

  • Wan 2.2 TI2V-5B (Q8_0), umt5-xxl Q4_K_M: t2v and i2v both complete end-to-end (previously: ggml_abort with unsupported op 'IM2COL_3D' before the first diffusion step).
  • Decode verified with both TAEHV (taew2_2.safetensors) and the full Wan 2.2 VAE.
  • Example: 33 frames, 480x480, 20 steps, --diffusion-fa --offload-to-cpu → ~5 min end-to-end.

Not covered: the two LTX audio VAE left-pad sites received the same mechanical change and compile, but LTX audio was not run (the model does not fit in 24 GB). On backends with native left-pad support the gate keeps the original single PAD op there, so only Metal behavior changes.

Checklist

The Metal backend aborts with "unsupported op 'IM2COL_3D'" when running
Wan 2.1/2.2 video generation (leejet#850):

- ggml_ext_conv_3d: call ggml_conv_3d_direct (GGML_OP_CONV_3D, which the
  Metal backend supports) instead of ggml_conv_3d, which decomposes into
  GGML_OP_IM2COL_3D (unsupported on Metal).
- ggml_ext_pad_ext: Metal's PAD kernel only implements right-padding.
  When any left-pad is requested, decompose into a right-pad of lp+rp
  followed by ggml_roll by lp (ROLL is supported on Metal; shift < ne
  always holds because ne grew by lp+rp).
- wan_vae.hpp / tae.hpp: route direct ggml_pad_ext calls that use
  left-padding through ggml_ext_pad_ext.

Tested on an M4 Pro (24 GB): Wan 2.2 TI2V-5B q8_0, t2v and i2v, with
both TAEHV and full VAE decode.
- ggml_ext_pad_ext and ggml_ext_conv_3d now take the runner backend and
  only use the fallbacks (pad+roll, conv_3d_direct) when
  ggml_backend_supports_op reports the native op unsupported. This keeps
  graphs unchanged on other backends: CUDA implements GGML_OP_IM2COL_3D
  but not GGML_OP_CONV_3D, so switching unconditionally would break it.
- Route the LTX audio VAE left-pad call sites through ggml_ext_pad_ext
  as well (compile-tested only; the model does not fit in 24 GB).
@leejet leejet merged commit 7dab366 into leejet:master Jul 2, 2026
11 checks passed
@leejet

leejet commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unsupported op 'IM2COL_3D' on Mac

2 participants