Netflix - yes Netflix - jumps on the AI bandwagon with video editor

A new Netflix model promises to rewrite the way we make movies. Just imagine this. As the director of the multi-million dollar epic Car Crash III: Suddenest Impact, you've just finished filming the finale where your star, Cruz Control, drives straight into an onrushing semi. The collision is spectacular. Cruz's car – operated remotely – explodes on impact, scattering debris across the highway. It's glorious. You high-five Cruz, moping beside you at the camera monitor station as his lucrative franchise career concludes, and head to the craft services truck. Your producer, Maya Cash, grabs you by the shoulder. "You're not going to want to hear this," she says. "But what if Cruz just drives away into the sunset. What if he doesn't die after all?" You pause and look at her over the rims of your Balenciaga sunglasses. "They're going to fund number four after all?" Netflix's VOID model was made for that moment. Instead of reshooting the scene or redoing it entirely with computer graphics, you can just transform the crash footage into an open road denouement. VOID stands for Video Object and Interaction Deletion. It's a VLM (vision-language model) that can not only erase objects from a scene but can also inpaint how remaining objects in the scene should behave without the influence of whatever was excised. It can turn, for example, a head-on collision between two vehicles into a scene of a single vehicle driving down the road by removing one and generating video depicting the physically plausible path of the remaining vehicle. Post-impact debris, smoke, and flames – all erased and replaced with pristine pavement. The video model's creators – Saman Motamed (Netflix/Sofia University), William Harvey (Netflix), Benjamin Klein (Netflix), Luc Van Gool (Sofia University), Zhuoning Yuan (Netflix), and Ta-Ying Cheng (Netflix) – describe VOID in a preprint paper [PDF] as "a video object removal framework designed to perform physically-plausible inpainting in these complex scenarios." It can remove objects and model how remaining objects would behave in the absence of removed objects. So given a scene of a person jumping into a pool and splashing water on the ground, VOID could remove that person and generate video that would make the pool appear undisturbed, with no splash in the pool or on the ground. VOID isn't limited to Netflix productions alone. The company has made its model available on Hugging Face, where anyone can install it. There are other tools for altering video, such as Runway, Generative Omnimatte, DiffuEraser, ROSE, MiniMax-Remover, and ProPainter. The Netflix boffins, however, claim VOID outperforms these alternatives substantially. Based on a survey of 25 people across multiple scenarios, VOID was preferred 64.8 percent of the time, with Runway coming in a distant second at 18.4 percent. "Through extensive evaluations against inpainting and text-guided video model baselines on synthetic and real-world data, we show that VOID excels at modeling complex dynamics which can follow on from object removal," the authors claim. Whether the world really needs more convincing video manipulation is another question. ®
AI Article