Image Video and Phenaki, the Google AIs that create fantastic videos from text

Google lanza dos IA que crean videos partiendo de texto

We continue with the news about artificial intelligences that create audiovisual content starting from a textual description. Yesterday we reviewed the existence of AudioGen, a text-to-audio AI, and days before we talked about Make-a-Video, Meta’s AI that generates videos from text.

What artificial intelligence is it time to talk about today? Not one, but two AI models that could directly compete with the Meta app . They are called Image Video and Phenaki, they were presented by Google and they are two AIs that convert text to video.

Image Video prioritizes the quality of the videos created with its AI, even if they are shorter

If we technology enthusiasts know anything, it is that Google’s experience in Artificial Intelligence is very extensive . Thus, the fact that they have presented two text-based video generators is not something that takes us by surprise. But why two? Because everyone’s approach is different and because they can.

The first model is Image Video, an AI that focuses on creating high-quality videos . Its base starts from the same source code as Image, Google’s text-to-image AI that was introduced a few weeks ago. However, Image Video is a refined version that incorporates a lot of new elements capable of converting static images into moving images.

Like the Meta model, Google’s AI delivers results that aren’t perfect, but are certainly amazing . Some videos can be unsettling, especially if there are faces or people moving, but it’s still a big step forward.

The best? It works like any other AI of this style (it only requires a text description), but the image quality is better than Make-A-Video . According to Google developers, Image Video starts from a file of only 16 frames at a speed of 3 fps and resolution of 24 x 48 pixels.

Once the low-resolution base video is ready, various super-resolution AI models are run, bringing the end result down to the following: a 128-frame video, at 24 fps, and a resolution of 1280 x 768 pixels. In other words, a video in HD quality of just over 5 seconds . In the case of the Metra AI, the output resolution is 768×768 pixels.

Phenaki bets on long videos, but sacrifices image quality

Google’s other text-to-video AI does the opposite: it generates much longer videos, but to do so it has to sacrifice the final quality of the output image.

The other difference? Since its goal is to make much longer videos, Phenaki requires much more detailed instructions . In fact, Image Video does its job with a simple sentence, but you can ask Phenaki to animate a whole paragraph with different sequences and it will do it.

As one might expect, the consistency of the resulting images is not that great . But the fact of being able to handle various scenes and scenarios (as if it were a movie) is something that leaves us speechless.

Additionally, the Phenaki development team revealed another fact: its AI model generates videos of arbitrary length. There is no maximum time limit , although the same text can generate two videos of very different durations.

According to Google, future versions of these two artificial intelligences “will be part of a growing set of tools that help artists and ordinary users create exciting ways to express their creativity.”

Is this the future of cinema? We don’t know, but time will tell. How can you test these applications? Unfortunately, these two AI models are not yet available to users, although you can see some videos produced by them on their official portals.