AI news
June 4, 2024

A Deep Dive into Gemini, GPT-4, and GPT-4o Comparison

Comparing top large language models. Which one is best?

Tetiana Tsymbal
Tetiana Tsymbal

Generative AI has proven to be a game-changer for businesses and professionals, especially in the marketing world. Large Language Models (LLMs) like the ones I’ll be discussing here can do some pretty incredible things with text — they recognize, extract, summarize, predict, and even generate new content. For marketers, this is like having a Swiss Army knife for campaign planning, content creation, and prospects data analysis.

But here’s the catch: not all LLMs are created equal. It’s easy to assume they’re all-knowing and can handle any task with ease, but that’s not quite true. Different models have unique strengths and weaknesses, with some being better suited for writing or research, while others are used for building chatbots.

As a content manager who’s been working with Gemini Advanced and GPT-4 for over a year, I’ve seen these differences firsthand. So, I wanted to share my experiences and insights in this article, comparing these two tools (with extra observations on the latest GPT-4o) across a range of aspects. Hopefully, this will give you a clearer picture of how Gen AI can be a valuable asset in your marketing toolkit and help you choose the right tool for your needs. Let’s dive in and explore what these platforms can do!

Disclaimer: My assessment of these AI models is based on my personal experience and tests. I’ve evaluated their features and outputs subjectively, according to my knowledge, goals, and expectations. Your opinions may differ, and I don’t intend for this article to invalidate them. However, I hope this information proves valuable for those just starting with Generative AI, whether for work or personal use, and helps them choose the tool that best suits their needs.

Interface and Convenience of Use

Gemini Advanced

Gemini, Google’s advanced AI chatbot (formerly Bard), makes things super easy to use and gives content creators a bunch of cool features to play with. In my opinion, notable advantages include:

  • Multiple Draft Generation: This model generates three alternative responses for each prompt, increasing the likelihood of finding an output that aligns with your precise requirements. Answers can be effortlessly regenerated if they don’t meet expectations.
  • Granular Editing Capabilities: You can highlight and regenerate specific sections of text, either using pre-set commands (e.g., “make it shorter/longer”) or by providing custom instructions.
  • Tone and Length Customization: A dedicated menu permits users to tailor replies to be more casual or professional, and to adjust the desired word count.
  • Integration with Google Workspace: Gemini facilitates efficient workflows by enabling direct export to Google Docs or insertion into Gmail.
  • Fact-Checking with Google Search: A unique feature enables the comparison of the generated text with the search engine results, highlighting areas of similarity or potential factual discrepancies.

However, there are certain limitations that content professionals should be aware of:

  • Limited Editing History: The platform currently only allows for rewriting the most recent prompt; previous inputs are not accessible for revision.
  • Single Draft View: After selecting and reworking a response, the alternative draft options are no longer available for review.

All in all, Gemini is a pretty awesome instrument for content creation. It’s flexible, accurate, and helps you save a ton of time. While there are some minor adjustments to be made when working with its functionalities, these are easily adaptable for most people and should not detract from the overall value of the tool.

GPT-4 and GPT-4o

For content creators like myself, OpenAI’s models also offer a few significant advantages, especially:

  • Comprehensive Editing History: The ability to revisit and revise any past prompt within a conversation is a major plus. You can easily switch between the different draft responses written by the model, facilitating a more iterative and refined writing process.
  • Robust Plugin Ecosystem: OpenAI’s extensive library of community-developed and partner-provided plugins significantly extends the platform’s capabilities. These tools can be invaluable for tasks like research, data analysis, and productivity optimization.

A notable limitation for users, however, is the lack of granular editing functions within the generated output. Additionally, the current inability to directly export content in various formats, particularly tables, can be cumbersome. While workarounds exist (such as copying tables as images or text), these methods are not always seamless.

For a more detailed comparison of features across discussed AI systems, please refer to the attached table.

Content Production

Factual Accuracy

While all three models generally do a pretty good job at providing accurate information, it’s important to remember that how you phrase your prompts can really influence their responses. Even a slight change in wording can lead to unexpected, inaccurate, or even harmful outputs. This prompt injection threat, along with several other vulnerabilities of large language models, is nicely explained in this article by a seasoned application security leader.

One thing I’ve noticed, however, is that when it comes to digging up specific research, statistics, or real-world examples, these LLMs can sometimes “hallucinate.” I’ve seen all three of them generate data that either doesn’t quite match the source material or even contradicts it entirely. My advice? Double-check those facts, especially when you’re dealing with unfamiliar territory. It’s a simple step that can save you from spreading misinformation.

Writing Style

It’s important to remember that even though prompts can steer each model’s writing style in a certain direction, they still have their own quirks and tendencies — kinda like how we all have our own unique narrative manner.

  • Gemini consistently delivers the most concise summaries, focusing on the core information without unnecessary embellishment. Its factual, objective language and straightforward sentence structures make it a go-to for quick, accessible content. It’s also the most responsive and flexible, allowing for easy refinement of the text without multiple regenerations.
  • GPT-4, in contrast, dives deeper, offering more nuanced explanations. Its academic tone and complex sentences are well-suited for research or analytical pieces but might require some simplification for a wider audience. GPT-4 can be more challenging to work with, as it’s not as attentive to instructions, particularly elaborate ones.

From my personal experience, Gemini produces the most natural-sounding and easily understood content. I’ve also found its vocabulary to be more diverse, which helps avoid the repetitive language that can sometimes plague GPT models. A major bonus is Gemini’s built-in suggestions for refining or expanding the output, simplifying the proofreading process.

While prompt engineering can certainly adjust the style of any model, the initial answer greatly influences the editing workload. With Gemini, I find myself spending 20–30% less time on revisions and fine-tuning the text.

Note: GPT-4o seems to produce output similar to GPT-4, potentially with slight improvements. However, I haven’t spent much time working with it yet, so I can’t draw definitive conclusions.

Creative Writing

Assessing creativity in AI models is subjective and depends on individual preferences. In my tests with fiction, poetry, and dialogue, all LLMs showed some level of creative ability, including rhyming and figurative language. However, Gemini consistently stood out for me with its intriguing plots, emotional depth, and distinctive writing style.

Ultimately, the best way to determine which model suits your creative needs is to experiment with them firsthand. I often find it helpful to use multiple models simultaneously, either to double-check each other’s output for a single task or to leverage their unique strengths for different aspects of a project.

Content Repurposing

All models demonstrated strong content modification capabilities, adeptly creating summaries, video scripts, social media posts, email sequences, etc. Gemini excels at understanding target audiences and adjusting tone, but its inability to process certain links and analyze the data from the provided source correctly is a significant drawback for most users. I’ve observed instances where the model summarizes the source but does so inaccurately, either providing completely different data or only a partial summary. While searching by title and website name yielded slightly better results, I still found the results to be unsatisfactory.

While GPT-4 and GPT-4o effectively adapt to different formats, their overviews may occasionally lack Gemini’s conciseness. However, GPT models excel at analyzing almost any link (except for protected websites) and extracting information accurately, making them valuable tools for convenient work with various online resources.

Prompt Understanding

Language models showed similar performance in my tests and during actual work. I found they can struggle with long, complex prompts and may miss certain instructions if a user provides excessive details. The cornerstone of achieving optimal LLM outputs is the “Iterative Prompt Development” technique. It involves experimenting with different prompts, continuously refining them, and evaluating the results.

Personally, I often find myself tweaking my guidelines multiple times, adding or removing elements until I achieve the desired output. By adopting such a test-and-learn strategy, you’ll get to know each model’s quirks and figure out which commands and how much detail they need. Moreover, this approach gives you the flexibility to create effective prompts consistently.


When it comes to translations, I found Gemini to be particularly adept at capturing cultural nuances, while the GPT models occasionally faltered. Based on my observations, LLMs still struggle with tasks requiring high levels of artistic expression, such as translating poetry.

However, the real-time translation capabilities of the new GPT-4o model on mobile devices are a standout feature. In my tests, it proved to be remarkably convenient and largely accurate in understanding and deciphering speech. This innovation could greatly benefit airlines, the hospitality sector, and client care. As a former customer support agent, I often encountered situations where language barriers prevented us from assisting clients. Looking back, having a GPT-4o-level virtual assistant would have been a game-changer. We could have easily catered to such a wider range of consumers and markets.

Overall, I’m confident that tools like GPT-4o could significantly enhance managers’ productivity, potentially by 20–30% or more, depending on the workload and company size. I encourage you to explore GPT-4o for yourself, especially if you frequently travel or work in multilingual environments.

Additional Functions


I’m no programmer, but I turned to LLMs for help with HTML coding for a WordPress site. I put all three models to the test to see how they could lend a hand. Here are the conclusions I’ve made about their effectiveness for this purpose.

Firstly, getting the right prompt to make them produce exactly what you want takes a lot of trial and error. GPT-4o was the fastest and most accurate in my experience, while GPT-4 needed a bit more time to assemble the code. Gemini was the most frustrating to work with, often refusing assignments that involved code generation, with responses like “I’m a text-based AI, and that is outside of my capabilities.”

Secondly, it’s important to remember that all AI systems have common limitations and can make mistakes. For example, I’ve seen GPT-4 providing answers with errors despite having clear instructions and completing the same tasks multiple times within the same chat. So, my advice is to always stay alert and double-check their output carefully.


With the given data, all three models offer the capability to create simple charts, but they differ in customization options. GPT-4 is the most limited, only allowing the chart to be saved as a non-editable image. GPT-4o offers slightly more flexibility, letting users make minor design adjustments. Gemini is the most versatile, enabling you to change diagram types, edit axis labels, and export information to Google Sheets. In addition, all platforms show pretty good results in basic data analysis so I believe they can be effectively applied in the accounting and finance fields.

Image Generation

I primarily use Gen AI for crafting social media visuals, and GPT-4 (Dall-E) is my current favorite. While Gemini produces several decent options that generally match my instructions, the output is limited to a square form. GPT-4 offers greater flexibility with various formats and delivers high-quality results, though not quite reaching Midjourney’s level.

However, getting the perfect image with GPT-4 frequently requires a bit of trial and error with prompt refinement. If you’re looking to get creative with visual content creation using LLMs, check out this guide. It’s packed with ideas on how to craft awesome prompts.

One thing to note is that while GPT-4 makes more diverse pictures, I find Gemini’s jpeg format more convenient for my workflow compared to GPT-4’s webp. As for GPT-4o, its performance is comparable to GPT-4.

I’ve also noticed some shared challenges across all three models: intricate details can get lost in the generation process, hands often look a bit artificial, and text within images can be hard to read or simply incorrect.

Image, Video, Sheet, and PDF Analysis

I also evaluated the performance of each model in analyzing and working with different types of materials and files. From what I’ve seen, here’s my assessment:

  • Gemini: Impressively, it was able to analyze all four formats I threw at it: YouTube videos (only with captions), Excel sheets, various images, and PDFs. I could easily summarize information, extract details, and even perform calculations, which were mostly accurate.
  • GPT-4 and GPT-4o: Both of them performed well with images and PDF docs, but they weren’t able to process YouTube links or Google Sheets during my tests.

While my experience with Generative AI has been relatively extensive (except for a recent GPT-4o), it’s important to note that I can’t guarantee 100% accuracy across all scenarios. Additionally, these models are constantly evolving, so it’s likely that they’ll soon be able to handle a wider range of formats.

Wrapping Up

My journey with Gemini, GPT-4, and GPT-4o has been a learning experience, full of surprises, challenges, and “aha” moments. It’s a reminder that even as artificial intelligence becomes more sophisticated, these tools are not magic wands, and we still play a crucial role in guiding them, refining our prompts, and carefully reviewing their output. But the rewards are definitely worth the effort.

By embracing these technologies and discovering how to use them effectively, we can not only improve our productivity and efficiency but also free up time to focus on other skills and tasks that truly deserve our attention. So, don’t be afraid to experiment, explore, and find the AI instruments that best suit your individual needs. This path is undoubtedly rewarding.