Qwen ComfyUI Explained

Digital rendering setup for modern interior design with lighting and material nodes

Qwen ComfyUI Explained: Benefits to Building locally hosted Pipelines

Our team from MT Research Labs, part of Mind Theory, use Qwen-Image-Edit.
This ComfyUI pipeline simultaneously feeds the input image into Qwen2.5-VL (for visual semantic control) and the VAE Encoder (for visual appearance control), achieving capabilities in both semantic and appearance editing.

In short, the use case for such a workflow has benefits such as

Changing Angle of a Person/Character/Scene, while still maintaining consistency of clothes and facial features
Changing Expression of a Actor/Person/Character, without going back to reshoot photography or video

Wan 2.2 Animate (Image to Video)

A model developed by Alibaba that enables the transformation of static images into short video sequences.

Wan2.2 represents a significant leap in motion generation capabilities, trained on a substantially expanded dataset—featuring 65.6% more images and 83.2% more videos compared to its predecessor. This enriched training data dramatically improves the model’s ability to generalize across key dimensions: motion fluidity, semantic coherence, and aesthetic quality. As a result, Wan2.2 achieves top–tier performance across both open-source and closed-source models, setting a new benchmark in AI-driven motion synthesis.

Heres an example we created of the same character from above.

Qwen Image Blend

This functionality allows users to combine people, scenes, or products into a single, cohesive image.<span class="vKEkVd" data-animation-atomic=”” data-processed=”true”>
Prompt-based Merging: Users can upload multiple images and provide a prompt instructing the model on how to combine them. For example: “Place the armchair on the living room, blend the image, correct the product’s perspective and lighting to make it blend into the background.”

Heres an example we did.

Minimalist living room with modern furniture and natural light

The resulting image shows details such as lighting changes, and accurate shadows. You can also upload up to three objects and combine them into a single image. This workflow will automatically remove each background, isolate the product or subject, and place them onto your chosen new background with no extra editing needed.

Digital art workflow showing character and landscape composition nodes

More on Qwen Multi-Angle Capability

Qwen, which refers to the large language model family built by Alibaba Cloud. Qwen-Image-Edit-2509 is camera-aware image editing built on with the Lightning adapter and the dx8152 multi-angle LoRA fused in by default. A single upload plus a few camera sliders are enough to rotate, tilt, or zoom the virtual camera while keeping subjects, lighting, and texture consistent.

Heres an example we did.

The above image illustrates a novel camera angle synthesized by AI using approximation-based computation.
We use a AMD Ryzen 9 9950X3D with Nvidia RTX 5090 to generate such images/videos.

NVIDIA GeForce RTX 50 Series graphics card with green dynamic background

If your organisation would like us to set up a full ComfyUI localhost toolkit, complete with hardware, tailored to your workflow, feel free to reach out. Our team can design a secure, private, and optimised pipeline that lets you generate images, and run advanced AI workflows directly on your own machine. Send us an email and we will be happy to discuss how to integrate it into your setup.