A friend recently asked, “How do AI tools make videos?”—so I put together this non-technical breakdown of how it all works. Inspired by Karen X. Cheng’s work, here’s a simple way to think about GenAI video and stylization (or whatever you want to call it).
Essentially we want the ability to throw a video in, and to stylize it over to another style. Do note that due to the nature of locally hosted models, it is harder to get consistent video without flicker – even in the references the background and the foreground do still tend to phase in and out at different key frames.
Video references
ref (1)
ref (2)
Most of the time we are trying to hit somewhere in between – so it goes to show the importance of knowing the workflow and knowing what to change. As with green screen keying, the numbers change for every single footage, so there has to be time spent moving numbers. In fact, it would take more chances to get the result one want for GenAi production as the inherent randomness of a model will come into play.
It is possible to hone in on an exact style with plenty of descriptors to kill the ‘randomness’ – but this will require quite some time to refine the prompting. From my tests, well lit, natural light base footage tend to work better than darker, highly tinted footage in terms of consistency
Images references (online app + workflows)
If you’ve ever wondered “How do pros get their AI videos to look so smooth?”—the answer is workflow layering. There are two main ways people tackle AI video stylization:
Locally Hosted Models (Runs on your own hardware)
You get more control, but it’s much harder to stabilize. Running these models requires a high-end GPU to handle multiple context frames, and even then, flicker is a persistent issue.
Online/Subscription-Based Tools (Cloud-powered AI)
Some online services blend AI-generated frames more coherently over time, making them computationally heavier—but since it’s all on their servers, you don’t have to worry about it. They used to be less flexible than running your own model, but with the latest Kling and Pika updates, setting up ComfyUI for full video generation is starting to feel practical—to me at least.
The best results often come from combining both approaches—leveraging local tools for customization and cloud-based AI for better temporal stability.
What else?
Another note is that in local usage, we use these video input modules to drive pseudo real-time interactions as well, like these shown below.
final thoughts
AI-generated video is still evolving, and while it unlocks creative new possibilities—like making AI-assisted stop-motion animations or transforming home videos into different artistic styles—it’s not a plug-and-play solution.
You’ll need to:
- Experiment with different workflows.
- Balance style vs. consistency.
- Work around hardware limitations.
The good news? Open-source tools offer some interesting alternatives if you’re patient enough to tinker with them. The bad news? If you want perfect, flicker-free AI videos, you might have to wait for future advancements (or find uncle Jensen Huang for a high-end GPU).
Leave a Reply