AI news
June 7, 2024

Chinese AI Video Generator "Kling" Is A Threat To OpenAI's Sora

Kling can generate high-quality and up to 120-second AI videos.

Jim Clyde Monge
by 
Jim Clyde Monge

In one of my previous articles, I featured a Chinese AI video generator called Vidu. I called it a real Sora competitor because of how impressive the sample videos were. Today, another AI video generator tool called Kling was unveiled, and it looks even better than Vidu.

What is Kling?

Kling is a new AI video generator from Kuaishou (“quick hand”), a company from Beijing that competes with TikTok.

Kling can generate up to 120-second videos with 30 frames per second at 1080P resolution and free aspect ratio. According to its creators, their AI model can understand physics better and model complex motion accurately.

Here’s a fun fact: Sora requires eight NVIDIA A100 graphics processing units (GPUs) running for over three hours to produce a one-minute clip. One NVIDIA A100 costs over 10,000 USD. So Kling would probably require double of that compute power to produce a 2-minute video result.

Take a look at this example video:

Prompt: A Chinese man sitting at a table, eating noodles with chopsticks

You can see how good the temporal coherence on this example video.

Temporal coherence in AI videos refers to the ability of a video generation model to create a sequence of frames that are consistent and logically connected in terms of time.

This means that the model should be able to maintain a consistent narrative, maintain the same scene or setting, and ensure that the actions and movements of objects within the scene are coherent and plausible over time.

You can explore the website and be amazed by the examples. You can also check the example GIFs I attached below. Kling is currently open for testing on Kuaishou’s video clip app Kmovie.

How does it compare with Sora?

Actions that affect the state of the world are some of the hardest simulations that an AI video generator faces. For example, a painter can leave new strokes along a canvas that persist over time, or a man can eat a burger and leave bite marks.

Both Sora and Kling can do that.

So why not put them side by side? Here’s an example video of a person eating a hamburger:

Kling’s Prompt: A Chinese boy wearing glasses closes his eyes and enjoys a delicious cheeseburger in a fast food restaurant

Well, both results are mind-blowing. It’s easy to get fooled that these are not real videos at first glance.

But looking more closely at these examples, you can tell that Sora’s result has more details on the subject and better lighting conditions.

However, Kling can output a two-minute video, which is twice the length of what Sora is capable of.

More Example Videos

I have noticed that the website has slowed down since yesterday, and some users are reporting that it’s not accessible due to a surge of concurrent access. So I attached a few examples below:

Prompt: A giant panda playing guitar by the lake
Prompt: An emperor angelfish with yellow and blue stripes swims in a rocky underwater habitat
Prompt: A man riding a horse in the Gobi Desert, with a beautiful sunset behind him, a movie-quality scene

You can also see more examples from this X thread.

How to Get Access?

Right now, the AI model or app to generate videos is not publicly available. It is reportedly available through the Kwaiying app for invited beta testers.

For more up-to-date news on its availability, you can check their official website for updates, though all text is in Chinese.

One Reddit user claimed that Kling will be available for everyone either later this year or next year.

No its in demo you have to be on a waiting list just like google open ai etc. From my research they’ll come out for everyone either later 2024 november-december or 2025. We don’t go past 2025 without having a model better than what we seen in the sora demo unless nuclear war or civil war or something
Image by author
No its in demo you have to be on a waiting list just like google open ai etc. From my research they’ll come out for everyone either later 2024 november-december or 2025. We don’t go past 2025 without having a model better than what we seen in the sora demo unless nuclear war or civil war or something

Aside from the text-to-video generator, Kuaishou also released a tool that can generate a dance video from a single image of a person.

Screenshot from Kuaishou’s Kling
Screenshot from Kuaishou’s Kling

While there are existing apps that can create AI-generated videos, what sets Kling apart is how smoothly each frame transitions to another, giving it next-level realism. The way the clothes interact with the subject’s movement is also really good.

Final Thoughts

Overall, Kling is an impressive AI model based on the examples showcased by its creators. Is it better than Sora? In some cases, yes. But Sora was unveiled months ago and may have made improvements since then that OpenAI hasn’t announced yet.

Is it better than Google’s Veo? Yes.

Is it better than Pika Labs, RunwayML, and StableVideo? It’s waaay better.

One or two more versions and Kling could be on the verge of subverting the entire video content industry. The rapid advancement in AI video generation technology is astounding. With each new release, the line between real and AI-generated content blurs even further.

The public is now waiting for OpenAI to announce updates on Sora.