November 25, 2023

Meta AI Announces "Emu Video", A Text-To-Video AI Tool

Meta AI announces its first AI model that generates video from text description.

by

Jim Clyde Monge

In the fast-paced world of AI, while text-to-image models have been rapidly advancing, text-to-video AI tools have lagged behind. There are a few video generator tools available, but only Runway’s Gen2 and Pika Labs have managed to produce truly compelling results.

Today, Meta AI announced their own version of an AI video generator called Emu Video, and it looks amazing.

What is Emu Video?

Emu Video, an extension of the Emu model known for image generation, brings an innovative approach to text-to-video generation. It leverages diffusion models in a way that’s both simple and highly effective.

The model is trained on the largest dataset of its kind—10 million synthesized samples with an input image, task description, and target output image. This makes it the largest dataset of its kind to date.

Here's an example:

What do you think of these videos? I love how smooth the transitions between frames are. Meta has done an excellent job with this model.

How does it work?

Generating videos involves two steps:

First, generate an image conditioned on a text prompt
Then generate a video conditioned on the prompt and the generated image

According to Meta AI, this “factorized” or split approach to video generation lets them train video generation models efficiently.

The result is a 4-second, 16-fps, 512x512-pixel video.

However, the researchers said that it is possible to extend the video and still get a decent result.

They demonstrated a model that generates plausible continuations of original videos conditioned on new prompts.

How does it compare to the competitors?

Meta researchers used human raters to compare Emu Video’s results against state-of-the-art text-to-video generation models on a variety of prompts based on quality and faithfulness.

Make-a-Video (MAV)
Imagen-Video (Imagen)
Align Your Latents (AYL)
Reuse and Diffuse (R&D)
Cog Video (Cog)
Runway Gen2 (Gen2)
Pika Labs (Pika)

Emu Video performed well according to Meta’s own evaluation, showcasing their progress in text-to-video generation. However, this is only based on their internal testing; I can’t fully attest to these results or draw any definitive conclusions about Emu Video’s capabilities until I get hands-on experience with the tool myself.

How to get access

Right now, Emu Video is fundamental research and is not a real product yet. Meta has released a demo website here for you to check out a collection of videos generated by Emu Video.

Final Thoughts

Don’t get me wrong—the tech behind Emu Video looks seriously impressive. But as eager as I am to try these new AI tools from Meta, I know that real-world use doesn’t always live up to lab tests. I hope they release a publicly accessible tool soon.

Still, I’m thrilled Meta is pushing boundaries in AI innovation. We need companies thinking big to keep technology moving forward. At the same time, I hope Meta considers open-sourcing these tools.

‍

Stay ahead. Stay updated.