AI news
May 14, 2024

OpenAI's New GPT-4o Is Mind-Blowing

GPT-4o is a free but highly capable language model.

Jim Clyde Monge
Jim Clyde Monge

Earlier today, thousands of AI enthusiasts eagerly tuned in to OpenAI’s highly anticipated livestream event, where the company unveiled its latest groundbreaking advancements in ChatGPT. While speculation ran rampant about the possibility of a revolutionary search feature to challenge Google’s dominance or the reveal of the much-awaited GPT-5 model, the actual announcement took a slightly different direction.

They announced a new model that’s smarter, cheaper, faster, better at coding, multi-modal, and mind-blowingly fast language model, GPT-4o.

OpenAI’s GPT-4o announcement video
OpenAI’s GPT-4o announcement video

It was a good choice for OpenAI to demo the new features live at 1x speed instead of using a pre-recorded video (yes, I am looking at you Google).

What is GPT-4o?

First things first, the “o” in GPT-4o means omni which represents the multimodality support for both inputs and output.

GPT-4o can process and generate text, audio, and images in real time. It represents a significant step towards more natural human-computer interaction, accepting any combination of text, audio, and image inputs and generating corresponding outputs.

Perhaps the most notable advancement to GPT-4o is its almost real-time response as a voice assistant.

It can respond to audio inputs in as little as 232 milliseconds on average, which is comparable to human response times in conversation. It matches the performance of GPT-4 Turbo on English text and code while showing significant improvements in non-English languages, and it is notably faster and 50% cheaper in the API.

What’s new in GPT-4o?

Here is the list of new features in GPT-4o.

1. Real-time responses

One of the coolest things is how fast it responds. When you chat with GPT-4o, it feels like talking to a real person. It can match your tone, crack jokes, and even sing in harmony.

OpenAI’s GPT-4o feature example video
OpenAI’s GPT-4o feature example video

This natural, speedy back-and-forth makes using the chatbot feel way more fun and engaging. But how did OpenAI pull this off?

Before GPT-4o, ChatGPT’s Voice Mode relied on a three-step process: audio was transcribed to text, then processed by GPT-3.5 or GPT-4, and finally converted back to audio. This led to slower response times (2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4) and loss of information like tone and background noise.

GPT-4o uses a single AI model trained to handle text, images, and audio all at once. This end-to-end processing allows GPT-4o to respond much faster and more naturally, picking up on nuances that previous models would miss.

2. Improved Reasoning

GPT-4o has achieved new heights in reasoning, setting a record score of 88.7% on the 0-shot COT MMLU benchmark, which tests general knowledge. This was measured using OpenAI’s new “simple evals” library. GPT-4o also scored 87.2% on the traditional 5-shot no-CoT MMLU, another record.

OpenAI GPT-4o benchmark text evaluation
OpenAI GPT-4o benchmark

However, other AI models like Llama3 400b are still in training and could potentially outperform GPT-4o in the future.

GPT-4o also demonstrated significant advancements in both mathematical reasoning and visual understanding.

OpenAI GPT-4o benchmark

On the M3Exam benchmark, which evaluates performance on standardized test questions from various countries, often including diagrams and figures, GPT-4o outperformed GPT-4 across all languages tested.

In terms of pure vision understanding, GPT-4o achieved state-of-the-art results on several key benchmarks, including MMMU, MathVista, and ChartQA. Notably, these evaluations were conducted in a 0-shot setting, meaning GPT-4o was not specifically trained or fine-tuned on these tasks.

3. GPT-4o is free to use

GPT-4o is going to be free to use. This is huge because if the free version of ChatGPT with the GPT-3.5 model brought in 100 million users, the smarter model GPT-4o could potentially bring in 100 million more.

Users on the Free tier will be defaulted to GPT-4o with a limit on the number of messages they can send using GPT-4o, which will vary based on current usage and demand. When unavailable, Free tier users will be switched back to GPT-3.5. — OpenAI

Honestly, it’s quite intriguing how OpenAI is offering this new and improved model for free without losing so much money. I mean, the amount of computing power required to run these language models is astonishing. Plus, GPT-4o runs so fast (100 tok/sec).

Here are a few thoughts on why they are making it free:

  1. They are running out of training data from the internet. User-AI training data is the best source, and offering free access to the new model might allow them to get much better quality data to use.
  2. Their latest partnership with NVIDIA gave them a boost in terms of computing power and therefore allowed them to run these models more efficiently at lower costs.
  3. They are trying to win back customers (including me) who ditched ChatGPT and used better alternatives like Anthropic’s Claude.

GPT-4 Turbo vs. GPT-4o

For better context, this is how GPT-4o compares to GPT-4. GPT-4o has the same high intelligence but is faster, cheaper, and has higher rate limits than the GPT-4 Turbo:

  • Pricing: GPT-4o is 50% cheaper than GPT-4 Turbo, coming in at $5/M input and $15/M output tokens).
  • Rate limits: GPT-4o’s rate limits are 5x higher than GPT-4 Turbo — up to 10 million tokens per minute.
  • Speed: GPT-4o is 2x as fast as GPT-4 Turbo.
  • Vision: GPT-4o’s vision capabilities perform better than GPT-4 Turbo in evals related to vision capabilities.
  • Multilingual: GPT-4o has improved support for non-English languages over GPT-4 Turbo.

GPT-4o currently has a context window of 128k and has a knowledge cut-off date of October 2023.

Pricing and accessibility of GPT-4o

Right now, I do not see the GPT-4o option in the free version of ChatGPT. But if you go to OpenAI Playground, the new model is now accessible.

Pricing and accessibility of GPT-4o
Image by Jim Clyde Monge

According to a tweet from Sam Altman, the new voice mode will be live in the coming weeks for ChatGPT Plus users.

There are currently two models gpt-4o and gpt-4o-2024–05–13. The pricing for both GPT-4o models is as follows:

  • Input: $5.00 per 1 million tokens
  • Output: $15.00 per 1 million tokens
Pricing of GPT-4o API
Pricing of GPT-4o

Take note that access to GPT-4, GPT-4 Turbo, and GPT-4o models via the OpenAI API is only available after you have made a successful payment of $5 or more (usage tier 1).

Final Thoughts

Overall, it was an impressive demo of GPT-4o, particularly on the fact that it’s free to use and impressively fast voice responses.

So the question now is, will it attract more users? It’s a sure yes. The new model is free to use and the real-time voice responses are definitely worth checking out.

Is it worth the $20 upgrade though? I couldn’t say it’s worth it because I still need to do more hands-on tests on the model and see if it’s really better than Claude Opus. Besides, Google may drop some huge updates to Gemini tomorrow in Google IO that are more exciting than what was announced by OpenAI today.