5 Biggest Announcements In Google IO 2024

OpenAI vs Google is the biggest beef I’ve seen by far in the AI space.

Just one day after OpenAI dropped its most advanced and highly impressive AI model GPT-4o, Google made several huge updates to Gemini and announced brand new AI products during the Google IO 2024.

Honestly, the nearly 2-hour conference was TMI.

Google packed a lot of new and updated features but here are my five top five picks:

1. Project Astra

2. Imagen 3 (Text-to-image)

3. Veo (Text-to-video)

4. Gemini in Google Search

5. Gemini in Google Photos

Let’s discuss each of these features in detail.

1. Project Astra (Gemini Live)

The head of Google DeepMind, Demis Hassabis, showed off a very early version of Project Astra, a real-time, multimodal AI assistant that he hopes will become a universal assistant.

To me, this is the most interesting new product that Google unveiled today.

It is a direct competitor to the real-time voice assistant that OpenAI demoed yesterday that is powered by the new AI model, GPT-4o.

According to Google, public access will come through the Gemini app later this year. We’re evolving from just having a chatbot into being able to summon an army of AI agents who know everything about you and can work 24/7. Bots that don’t just talk with you but actually accomplish stuff on your behalf.

If this lives up to the hype, I would be very happy to use it on a daily basis.

2. Imagen 3 (Text-to-image)

It looks like Midjourney now has a strong competitor. The initial results shown at the demo look very promising. I mean just take a look at this example.

Prompt: Three women stand together laughing, with one woman slightly out of focus in the foreground. The sun is setting behind the women, creating a lens flare and a warm glow that highlights their hair and creates a bokeh effect in the background. The photography style is candid and captures a genuine moment of connection and happiness between friends. The warm light of the golden hour lends a nostalgic and intimate feel to the image.

It looks so photorealistic. Aside from better quality, Google also improved the model to produce better interpretation and better text generation.

It’s funny that they had to put “unedited raw output” below the image since Google has been widely criticized for faking demo images and videos.

3. Veo (Text-to-video)

Finally some developments in AI video generation. It’s been months since OpenAI announced Sora and it caused massive pressure on Google to announce its own version of the text-to-video model.

Google is calling it Veo, their most capable video generation model to date. It generates high-quality, 1080p resolution videos that can go beyond a minute, in a wide range of cinematic and visual styles.

The sample videos Google showed off looked really good and comparable to that of Sora.

In addition, Veo supports masked editing, enabling changes to specific areas of the video when you add a mask area to your video and text prompt.

If you’re interested in knowing how Google managed to improve the overall quality and reduce the time it takes to generate videos, here are the steps:

Check out sample videos and more information on Veo here.

4. Gemini in Google Search

I wanted to highlight this because, in recent months, thousands of websites have been affected by the changes in Google’s SEO algorithm. On top of that, Google introduced generative AI in search which pushed search results from various websites even farther from the ranking.

Gemini is already in the search function with Google’s generative search feature — but they’re taking it even further.

This is an interesting update for regular users but is terrible news for small website owners. It will roll out soon to Search Labs, for English queries in the U.S.

The Search also has new planning capabilities and an AI-organized results page. Check out more information about that here.

5. Gemini in Google Photos

As someone who uses Google Photos a lot and has thousands of images stored on the cloud, this update is particularly exciting. Google is rolling out a new feature this summer called “Ask Photos” which lets Gemini pore over your Google Photos library in response to your questions.

Sure, now it’s easier for you to look for a specific memory or recall information included in your gallery, but what about privacy? Here’s what Google has to say:

Your personal data in Google Photos is never used for ads. And people will not review your conversations and personal data in Ask Photos, except in rare cases to address abuse or harm. We also don’t train any generative AI product outside of Google Photos on this personal data, including other Gemini models and products. As always, all your data in Google Photos is protected with our industry-leading security measures.

Here’s a fun fact, over 6 billion photos are uploaded every day to Google Photos — that’s a lot.

Aside from the list above, other products were announced:

2 Million Tokens in Gemini 1.5: As a regular user, I don’t see myself exhausting that 2 million tokens. So I’m not really excited about this update.
Gemini 1.5 Flash: This new multimodal model is just as powerful as Gemini 1.5 Pro, but it’s optimized for “narrow, high-frequency, low-latency tasks.”
Music AI Sandbox: Google’s AI music generator in partnership with YouTube. This helps artists generate music and sound effects quickly.
Gemini in Workspace: Similar to how Microsoft is adding Copilot to its flagship software products, Google is rolling out Gemini Pro into the sidebar for Docs, Sheets, Slides, Drive, and Gmail.

And oh, in case you are subscribed to Google’s AI chatbot Gemini Advanced, it is already powered by Gemini Pro 1.5.

Final Thoughts

In just two days, we’ve seen two of the biggest tech companies OpenAI and Google unveil the most powerful AI products. So, what’s going on with Apple? They’re so far behind it’s impossible for them to catch up.

But here’s the thing with what has been announced by Google today. None of them are actually generally available immediately. They’re all either via waitlist, available only in the US, or available later this year. Will the pre-recorded demos really work the same way in real-world usage?

Google has been notorious for botched product announcements and faking demo videos. Will they deliver this time?

‍

Stay ahead. Stay updated.