Google Gemini seeks to put ChatGPT in the rearview mirror

Image of Google Gemini
Google Gemini is Google's new AI model powering Bard and the Pixel 8 Pro (Image credit: Google)

What you need to know

  • The AI race heats up as Google has released a new AI model, (think GPT-4 vs GPT-3) to power their AI ecosystem.
  • Google's benchmarks show Gemini outperforming GPT-4V in several performance metrics.
  • Gemini will come in 3 formats: Ultra, Pro, and Nano.
  • The Pixel 8 Pro will be the first Pixel to implement Gemini through Gemini Nano. 
  • Gemini Pro will be available through Gemini API in Google AI Studio on Dec. 13th.  

With the recent upheaval at OpenAI with the firing, and then rehiring of Sam Altman, Google must have smelled blood in the water because just a few weeks later Google announced a new AI model that seems to be more powerful than GPT-4V. 

Google announced Google Gemini as the future of AI for Google and starting today will be powering Bard, and soon it will come to all of Google's AI products. With 3 different sizes of the model: Ultra, Pro, and Nano, Gemini 1.0 is built to be ubiquitous just like the rest of Google.

What is Google Gemini?

Google is calling Gemini "the most capable and general model we’ve ever built." It is the backend model that will power Google's stack of AI products, though the decision to release the model with three sizes.

  • Gemini Ultra — Google's largest and most capable model for highly complex tasks.
  • Gemini Pro — Google's best model for scaling across a wide range of tasks.
  • Gemini Nano — Google's most efficient model for on-device tasks.

Some of the performance numbers being touted by Google for Gemini are pretty impressive, but if I have learned one thing in tech, don't trust manufacturer benchmarks. That being said, it is difficult to question Gemini's effectiveness when seeing it work live. @rowancheung on X(Twitter) posted a video showing Gemini in action and the results are nothing short of remarkable.

How does Google Gemini Perform?

Google is touting Gemini as the best AI model on the planet through the benchmarks it posted. If these benchmarks hold up to 3rd party testing, Gemini will be the top dog on the market, at least until OpenAI releases ChatGPT-5. The great rule with the economy as it is currently structured is that as companies compete to have the best product, consumers usually win. 

Gemini should push OpenAI to continue pushing innovation, but obviously, there have been a lot of concerns about reckless research without proper considerations for safety, even from CEOs like Satya Nadella comparing AI to atomic energy.

Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

Google

Google Gemini outperformed ChatGPT-4V in most of the benchmarks that were shown by Google. Sometimes by over 4% points. The benchmark with the most interesting name out of the bunch, HellaSwag, was the one that Gemini underperformed compared to ChatGPT-4V. Take a look at the full list of benchmarks. 

Swipe to scroll horizontally
CapabilityBenchmarkDescription Gemini UltraGPT-4V
GeneralMMLU Representation of questions in 57 subjects (incl. STEM, humanities, and others)90.0% CoT@32*86.4% 5-shot* (reported)
ReasoningBig-Bench HardDiverse set of challenging tasks requiring multi-step reasoning83.6% 3-shot83.1% 3-shot (API),
Row 2 - Cell 0 DROPReading comprehension (F1 Score)82.4 Variable shots80.9 3-shot (reported)
Row 3 - Cell 0 HellaSwagCommonsense reasoning for everyday tasks87.8% 10-shot*95.3% 10-shot* (reported)
MathGSM8KBasic arithmetic manipulations (incl. Grade School math problems)94.4% maj1@3292.0% 5-shot CoT (reported)
Row 5 - Cell 0 MATHChallenging math problems (incl. algebra, geometry, pre-calculus, and others)53.2% 4-shot52.9% 4-shot (API)
CodeHumanEvalPython code generation74.4% 0-shot (IT)*67.0% 0-shot* (reported)
Row 7 - Cell 0 Natural2CodePython code generation. New held out dataset HumanEval-like, not leaked on the web74.9% 0-shot73.9% 0-shot (API)
Row 8 - Cell 0 Row 8 - Cell 1 Row 8 - Cell 2 Row 8 - Cell 3 Row 8 - Cell 4

While these scores are impressive, they probably don't mean a ton to the average consumer. Google pushing Gemini Nano onto Pixel 8 Pro is more exciting to me as it is a model for on-device tasks. A lot of manufacturers are beginning to add on-device AI capabilities, like NVIDIA's TensorRT-LLM to the devices they make. For me, this is a more exciting prospect for the future of AI, where we can have true personal assistants built into our phones and customize our AI model to work best for our individual needs. 

One of the best, and likely possible, future applications for these LLM AIs is something we have all dreamed of since Star Trek 80 plus years ago. A universal language translator. ChatGPT can already act as a translator, but there is a pretty long processing time to generate the translations. There are now AI models that can translate voice acting into another language, keeping the original actor's voice intact. I'm a huge fan of anime, as well as Japanese, and Korean dramas, I would love a world where I can press a button on my TV and get to hear the original actors' voice but just hear it in English in real-time. As these mega corporations compete to outdo each other in AI advancement, this reality approaches closer and closer. 

Colton Stradling
Contributor

Colton is a seasoned cybersecurity professional that wants to share his love of technology with the Windows Central audience. When he isn’t assisting in defending companies from the newest zero-days or sharing his thoughts through his articles, he loves to spend time with his family and play video games on PC and Xbox. Colton focuses on buying guides, PCs, and devices and is always happy to have a conversation about emerging tech and gaming news. 

TOPICS
  • leo lozano
    Windows Central said:
    Gemini scored over 90% on the Measuring Massive Multitask Language Understanding, being the first AI model to outperform humans.

    Google Gemini seeks to put ChatGPT in the rearview mirror : Read more
    So with absolutely no one to verify this, you just decided to just parrot google's marketing. Wasn't Barf supposed to be "amazing" too? didn't it turned out to be a complete dud once people got their hands on it?
    83.6 vs 83.1 scores, that's what you call "blowing out of the water".
    You can see marketing BS all over this, why are you reporting it with out questioning it? had it been Microsoft touting something as this you wouldn't have let it pass the first sentence w/o criticizing and expressing doubt at the announcement, but since it's google then "it must be true, because is the internet"
    Reply