Step 1: Generate a base image using the Kodak-K9 model (or any model you like really)
https://civitai.com/models/1285481?modelVersionId=1450374

Height should be greater than width - WAN 2.2 I2V only works for portrait layout images and will crop out parts that don't fit its preferred resolution.

A Noob-compatible anatomically correct genitals LORA was used since Kodak is Noob-based. Since I don't have the catbox link for it, I've just included it here.

The image file here should have its metadata intact, it was generated using A1111 WebUI Reforge. Forgive the schizo prompt, I'm sure it's possible to improve.

If using the cum LORA below, some care needs to be taken with the image you use. For example, an image with paws too close together will almost always generate cum coming from them, no matter your negative prompting. Similarly an image with the female character having their mouth open or head pointing down even slightly will frequently vomit cum, no matter your negative prompting. If these are things you're going for, great! Otherwise, prepare to be running a lot of generations.

Step 2: Set up ComfyUI with WAN 2.2
I used DaSiWa's WAN 2.2 I2V MidnightFlirt models: 
https://civitai.com/models/1981116?modelVersionId=2388627
https://civitai.com/models/1981116?modelVersionId=2388548

There's a newer version here (TastySin) that may be better, as of this writing it's not yet publicly released:
https://civitai.com/models/1981116?modelVersionId=2512098
https://civitai.com/models/1981116?modelVersionId=2512333

Note that you will need both the High and Low model. Place them in ComfyUI/models/checkpoints.

You will also need some files from WAN 2.1:

Download the text encoder, place it in ComfyUI/models/text_encoders:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors?download=true

Download the WAN 2.1 VAE, place it in ComfyUI/models/vae/Wan:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors?download=true

Download clip_vision_h, place it in ComfyUI/models/clip_vision:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors?download=true

Download the workflow for ComfyUI here:
https://civitai.com/models/1823089

I used the FastFidelity C-AiO workflow, in case more get uploaded.

I also used a cumshot LORA (you will also need one High and one Low LORA):
https://civitai.com/models/1946997?modelVersionId=2203637
https://civitai.com/models/1946997?modelVersionId=2203608

Place them in ComfyUI/models/loras. Note that I've had VERY mixed results with this LORA, you may want to look for another. All of the cum LORAs are very sensitive to your positive prompt - describing in detail what you want to see in the video will usually lead to better results.

Step 3: Run WAN 2.2 on your base image, enabling the option to Extract Last Frame. Chain this until your desired length video. Finally, use ffmpeg to concatenate the videos together.

I've included a Python script to perform all of this automatically using the ComfyUI API. The provided config will generate a common stage 1 (slower thrusting), stage 2 (faster thrusting), then multiple stage 3s (cum inside). Once videos are generated, the script will join them together using ffmpeg. The multiple cum generations is because the LORA frequently generates bad results, even with a negative prompt.

Example usage: python3 -i 00015-1530528807.png -c config_sex.json --delete

The script was cobbled together using the public API example script and an addition to upload images. It's probably not the most readable, and is very tied to the included exported workflows. If you're running on Windows it will likely need some modifications to properly set up the path for ffmpeg.

To export your own workflow for use with this script or any other script automating ComfyUI, import it into ComfyUI's web interface, set it up for generation (select models, select settings you want to use, include LORAs if desired), then use File > Export (API). The provided config may need modification if node IDs change, but if the standard WAN nodes are used that should be the only change necessary.

With a 3090 it takes approximately 2.5 minutes per 7 second video generation, approximately 18 minutes total for a run on a single image when repeating stage 3 5 times. If you have multiple GPUs with 20 GB or more VRAM each, there's a ComfyUI extension to automatically replace normal model loading nodes with ones you can assign per CUDA device. Splitting the high and low models up between different GPUs and loading the other models/LORAs in remaining VRAM will probably increase performance significantly.


Bonus Step 4: Convert the 2D video to 3D stereoscopic
If on Windows:
Just download and use their installers for easy mode.

If on Linux/you don't want to use their installers:
Clone https://github.com/VisionDepth/VisionDepth3D (you do not need to download their installers or anything, though you can if you want. They're Windows only.)
You do not actually need conda to use this.
Download required weights: See https://github.com/VisionDepth/VisionDepth3D/blob/Main-Stable/weights/WEIGHTS_README_PLACEHOLDER.md
Place weights in VisionDepth3D/weights with the directory structure defined in that README file
Open a terminal under the VisionDepth3D directory
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install torchvision
python3 VisionDepth3D.py

In the UI:
Use Depth Estimation to process the video
    Select a model from the dropdown
    Local model detection is finnicky - it will auto-download one for you if needed
    Output to the VisionDepth3D directory
Use 3D Video Generator to generate the video
    Input Video is the original video
    Depth Map is the video created in the Depth Estimation step
    Output Video is your desired output path
    Open Encoder Settings and select Vertical aspect ratio (9:16), check the box to use ffmpeg, select an NVENV codec (I used H264 for maximum compatibility)
    Open Processing Options, check the box for Preserve Original Aspect Ratio
