Is Microsoft's new AI still "4x more accurate than human doctors"? — Typos in medical prompts to chatbots could be catastrophic

Lab technician with blood samples and medical chart.

A new study highlights how a simple typo can prompt an AI-powered tool to incorrectly advise a patient against seeking medical attention. (Image credit: Getty Images | Dana Neely)

As generative AI becomes more advanced and scales to greater heights, beyond simple query responses like the early Bing Chat days, it's increasingly becoming an uphill task for users without the technical know-how to leverage AI tools at an optimum level.

It's especially concerning as tools like OpenAI's ChatGPT are gaining broad adoption across the world, attracting one million new users in under one hour after the company launched its new image generator tool, which became a viral sensation among more general crowds on social media with Studio Ghibli memes.

Despite Microsoft's ongoing feud with OpenAI over for-profit evolution plans, users often compare their AI offerings, Microsoft Copilot and ChatGPT. Besides, they predominantly run on the same technology and AI models, though recent reports suggest that Microsoft is testing third-party models in Copilot and developing its own off-frontier models.

A separate report revealed that the top complaint lodged with Microsoft's AI division by users is "Copilot isn't as good as ChatGPT." The tech giant quickly dismissed the claim, shifting blame to poor prompt engineering skills. It even launched Copilot Academy to help users improve their AI skills and generally bolster their user experience with AI tools like Copilot.

In May, Microsoft Teams lead Jeff Taper admitted that Copilot and ChatGPT are virtually the same thing, but the tech giant's offering sports better security and a more powerful user experience.

But as it turns out, Microsoft could be on to something, shifting blame to poor prompt engineering skills, especially if this new study by MIT researchers is anything to go by (via Futurism).

Be "WEARY" of typos when using AI

A photo taken on February 26, 2024 shows the logo of the ChatGPT application developed by US artificial intelligence research organization OpenAI on a smartphone screen.

(Image credit: Getty Images | KIRILL KUDRYAVTSEV )

The study reveals that an overdependence on AI tools for medical advice can prove to be dangerous and, at times, misleading. Perhaps more concerning, the report revealed that AI tools can advise users against seeking medical attention if their query includes typos, such as a misspelled word or an extra space in a sentence. Colorful language and slang are also a red flag in this context.

The researchers further claimed that female users are more susceptible to falling victim to this ill AI-powered advice than males, so take that with a pinch of salt. The research was centered on the following AI tools: OpenAI's GPT-4, Meta's LLama-3-70b, and a medical AI called Palmyra-Med.

They simulated thousands of health cases, which included a combination of real patient complaints from a medical database, health-related Reddit posts, and AI-generated cases.

Interestingly, the researchers decided to include "perturbations" to the data with the aim of throwing the chatbots off their game, including inconsistent capitalization of letters at the beginning of sentences, exclamation marks, colorful language, and use of uncertain language like "possibly."

The chatbots seemingly fell for the trick, prompting them to change their perception and medical advice. The research claims that the perturbations heightened the chatbots' chance of advising a patient not to go to the hospital by 7 to 9 percent.

The researchers deduced that the AI tools are heavily reliant on their training medical data, making it difficult for them to decipher the information shared by the patient because it is not seamless and structured like medical literature.

According to the study's lead author and researcher at MIT, Abinitha Gourabathina:

"These models are often trained and tested on medical exam questions but then used in tasks that are pretty far from that, like evaluating the severity of a clinical case. There is still so much about LLMs that we don't know."

The findings raise critical concerns about the integration of AI tools into medicine. This news comes after Microsoft just touted a new AI medical tool as 4x more accurate and 20% cheaper than human doctors. The company's AI CEO referred to it as "a genuine step toward medical superintelligence."

It all suggests that generative AI still has a long way to go before it can be completely trusted with complex fields like medicine.

See more Artificial Intelligence News

TOPICS

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.