AI news
April 24, 2024

Microsoft Announced Phi-3: A Tiny But Mighty Language Model

Language models could soon be running on our smartphones.

Jim Clyde Monge
Jim Clyde Monge

The race for the best small language models (SMLs) is on! With more and more smartphone makers looking to run AI models right on your device, tech giants are battling it out to release the most powerful SMLs to the market. Today, Microsoft just made a big move with the release of Phi-3, a new family of small but seriously impressive language models.

Isn’t it interesting to use an AI chatbot like ChatGPT on our smartphones without the need to be connected to the internet?

What is Phi-3?

The Phi-3 family is a set of AI models that are designed to be the most capable and cost-effective SLMs available, outperforming models of similar and larger sizes across various benchmarks in language, reasoning, coding, and math.

The Phi-3 family offers a range of models along the quality-cost curve, providing customers with practical choices when building generative AI applications.

  • Phi-3-mini is a 3.8B parameter transformer decoder model trained on 3.3 trillion tokens. It uses a context length of 4K tokens and the same tokenizer as LLaMa-2 with a 32K vocabulary. The model has 3072 hidden dimensions, 32 attention heads, and 32 layers. There is also a long-context version called Phi-3-mini-128K that extends the context to 128K tokens.
  • Phi-3-small is a 7B parameter model trained on 4.8T tokens. It uses the tiktoken tokenizer with 100K vocab and 8K context length. The architecture has 32 layers, 4096 hidden sizes, and uses grouped query attention plus alternating dense/block sparse attention to reduce memory usage.
  • Phi-3-medium is a 14B parameter preview model also trained on 4.8T tokens, with 40 layers, 40 heads, and 5120 embedding size.

This flexibility is crucial for developers and companies looking to integrate AI capabilities into their products without breaking the bank.

These models achieve groundbreaking performance on key benchmarks despite their relatively small size. However, the smaller model size limits their performance on factual knowledge benchmarks like TriviaQA.

Phi-3 benchmark
Image from Microsoft

The image below compares the quality (as measured by performance on the Massive Multitask Language Understanding benchmark) vs. size (in billions of active parameters) for various SLMs.

Phi-3 compared to other small language models
Image from Microsoft

The key takeaways are:

  • Microsoft’s new Phi-3 models, especially the small and medium preview versions, achieve higher MMLU benchmark scores compared to other models of similar size like Mistral 7B, Gemma 7B, Llama-3–8B-int, and Mixtral 8x7B.
  • The Phi-3 mini models (4k and 128k) have lower scores but are also much smaller in size compared to the other models shown.
  • In general, there appears to be a trend of higher quality scores with increasing model size, but the new Phi-3 small and medium models seem to outperform this trend compared to the other models plotted.

Take a look at this example of Phi-3 mini running on a mobile phone.

Phi-3 example running on device
Image from Microsoft

That’s pretty impressive, isn’t it? The fact that we can now run AI models like this directly on our smartphones is a game-changer.

If you want to learn more about Phi-3, check out the whitepaper here.

Try it yourself

Phi-3 mini is currently available on the following platforms:

On Azure AI Studio, the Phi-3-mini 4k and 128k instruct are already available.

Azure AI studio running phi-3
Image by Jim Clyde Monge

Here’s what the dashboard looks like:

Phi-3 example running on Azure AI studio
Image by Jim Clyde Monge

The model in HuggingChat seems to be even more capable and also has the ability to search the internet, which is a fantastic feature.

Phi-3 in HuggingChat seems to be even more capable and also has the ability to search the internet, which is a fantastic feature.
Image by Jim Clyde Monge

Final Thoughts

It’s pretty wild to think that we’ll soon be carrying around smartphones with built-in AI models that can rival the likes of ChatGPT. I’m incredibly excited to see how this technology evolves and what new apps and experiences it enables.

While Phi-3 and similar small language models are still limited compared to their much larger counterparts like GPT-4 in terms of knowledge, reasoning, and generation capabilities, they represent a significant step forward in on-device AI. There will likely continue to be a gap in capabilities between device-based models and cloud-based models for the foreseeable future, but that gap is closing faster than many of us expected.

But overall, I’m optimistic — I think on-device AI has the potential to make our gadgets feel a whole lot smarter. So, what do you think? Are you ready to welcome Phi-3 and other small language models into your pocket?