Google Gemini seeks to put ChatGPT in the rearview mirror

Image of Google Gemini — Google Gemini is Google's new AI model powering Bard and the Pixel 8 Pro (Image credit: Google)

What you need to know

The AI race heats up as Google has released a new AI model, (think GPT-4 vs GPT-3) to power their AI ecosystem.
Google's benchmarks show Gemini outperforming GPT-4V in several performance metrics.
Gemini will come in 3 formats: Ultra, Pro, and Nano.
The Pixel 8 Pro will be the first Pixel to implement Gemini through Gemini Nano.
Gemini Pro will be available through Gemini API in Google AI Studio on Dec. 13th.

With the recent upheaval at OpenAI with the firing, and then rehiring of Sam Altman, Google must have smelled blood in the water because just a few weeks later Google announced a new AI model that seems to be more powerful than GPT-4V.

Google announced Google Gemini as the future of AI for Google and starting today will be powering Bard, and soon it will come to all of Google's AI products. With 3 different sizes of the model: Ultra, Pro, and Nano, Gemini 1.0 is built to be ubiquitous just like the rest of Google.

What is Google Gemini?

Google is calling Gemini "the most capable and general model we’ve ever built." It is the backend model that will power Google's stack of AI products, though the decision to release the model with three sizes.

Gemini Ultra — Google's largest and most capable model for highly complex tasks.
Gemini Pro — Google's best model for scaling across a wide range of tasks.
Gemini Nano — Google's most efficient model for on-device tasks.

Some of the performance numbers being touted by Google for Gemini are pretty impressive, but if I have learned one thing in tech, don't trust manufacturer benchmarks. That being said, it is difficult to question Gemini's effectiveness when seeing it work live. @rowancheung on X(Twitter) posted a video showing Gemini in action and the results are nothing short of remarkable.

🚨 BREAKING: Google DeepMind just revealed Gemini- ChatGPT's biggest competitor.Gemini is the FIRST multimodal AI to outperform human experts on the MMLU, scoring over 90%. pic.twitter.com/A7It1hPKGQDecember 6, 2023

How does Google Gemini Perform?

Google is touting Gemini as the best AI model on the planet through the benchmarks it posted. If these benchmarks hold up to 3rd party testing, Gemini will be the top dog on the market, at least until OpenAI releases ChatGPT-5. The great rule with the economy as it is currently structured is that as companies compete to have the best product, consumers usually win.

Gemini should push OpenAI to continue pushing innovation, but obviously, there have been a lot of concerns about reckless research without proper considerations for safety, even from CEOs like Satya Nadella comparing AI to atomic energy.

Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.
Google

Google Gemini outperformed ChatGPT-4V in most of the benchmarks that were shown by Google. Sometimes by over 4% points. The benchmark with the most interesting name out of the bunch, HellaSwag, was the one that Gemini underperformed compared to ChatGPT-4V. Take a look at the full list of benchmarks.

Swipe to scroll horizontally

Capability	Benchmark	Description	Gemini Ultra	GPT-4V
General	MMLU	Representation of questions in 57 subjects (incl. STEM, humanities, and others)	90.0% CoT@32*	86.4% 5-shot* (reported)
Reasoning	Big-Bench Hard	Diverse set of challenging tasks requiring multi-step reasoning	83.6% 3-shot	83.1% 3-shot (API),
Row 2 - Cell 0	DROP	Reading comprehension (F1 Score)	82.4 Variable shots	80.9 3-shot (reported)
Row 3 - Cell 0	HellaSwag	Commonsense reasoning for everyday tasks	87.8% 10-shot*	95.3% 10-shot* (reported)
Math	GSM8K	Basic arithmetic manipulations (incl. Grade School math problems)	94.4% maj1@32	92.0% 5-shot CoT (reported)
Row 5 - Cell 0	MATH	Challenging math problems (incl. algebra, geometry, pre-calculus, and others)	53.2% 4-shot	52.9% 4-shot (API)
Code	HumanEval	Python code generation	74.4% 0-shot (IT)*	67.0% 0-shot* (reported)
Row 7 - Cell 0	Natural2Code	Python code generation. New held out dataset HumanEval-like, not leaked on the web	74.9% 0-shot	73.9% 0-shot (API)
Row 8 - Cell 0	Row 8 - Cell 1	Row 8 - Cell 2	Row 8 - Cell 3	Row 8 - Cell 4

While these scores are impressive, they probably don't mean a ton to the average consumer. Google pushing Gemini Nano onto Pixel 8 Pro is more exciting to me as it is a model for on-device tasks. A lot of manufacturers are beginning to add on-device AI capabilities, like NVIDIA's TensorRT-LLM to the devices they make. For me, this is a more exciting prospect for the future of AI, where we can have true personal assistants built into our phones and customize our AI model to work best for our individual needs.

One of the best, and likely possible, future applications for these LLM AIs is something we have all dreamed of since Star Trek 80 plus years ago. A universal language translator. ChatGPT can already act as a translator, but there is a pretty long processing time to generate the translations. There are now AI models that can translate voice acting into another language, keeping the original actor's voice intact. I'm a huge fan of anime, as well as Japanese, and Korean dramas, I would love a world where I can press a button on my TV and get to hear the original actors' voice but just hear it in English in real-time. As these mega corporations compete to outdo each other in AI advancement, this reality approaches closer and closer.

TOPICS

Colton is a seasoned cybersecurity professional that wants to share his love of technology with the Windows Central audience. When he isn’t assisting in defending companies from the newest zero-days or sharing his thoughts through his articles, he loves to spend time with his family and play video games on PC and Xbox. Colton focuses on buying guides, PCs, and devices and is always happy to have a conversation about emerging tech and gaming news.