Microsoft’s AI agents couldn’t even order dinner — and that should terrify us

Microsoft recently released a new study revealing that AI agents might not be ready for prime time. (Image credit: Getty Images | NurPhoto)

As generative AI evolves and becomes more advanced, the technology is gaining broad adoption across the world. It's also redefining how we view work as it acts as a productivity and efficiency booster.

More organizations are seemingly embracing the technology and integrating it into their workflows to automate redundant and repetitive tasks. For instance, Salesforce CEO Marc Benioff indicated that the company was seriously debating hiring software engineers in 2025 — later revealing that AI is already handling up to 50 percent of its operations while citing incredible productivity gains from agentic AIs.

Arizona State University downtown Los Angeles campus. — Arizona State University. (Image credit: Getty Images | Jason Armond)

Microsoft's research, in collaboration with Arizona State University, sought to establish how well AI agents would perform assigned tasks without human intervention or supervision. One of the experiments included a customer AI agent attempting to order dinner based on a user's instructions and prompts. At the same time, other agents representing various restaurants competed to win the order.

The experiment included 100 different customer AI agents, which interacted with 300 business AI agents. It's worth noting that the source code for Microsoft's simulated Magnetic Marketplace is open source, making it easy for anyone to use the code to run different experiments and more. The models used in the experiments included OpenAI's GPT-4o, GPT-5, and Google's Gemini-2.5-Flash.

The managing director of Microsoft Research’s AI Frontiers Lab, Ece Kamar, reiterated the importance of using such experiments to gauge and establish the strengths and capabilities of these AI agents:

There is really a question about how the world is going to change by having these agents collaborating and talking to each other and negotiating,” said Kamar. “We want to understand these things deeply.

Interestingly, the study revealed some weaknesses across the models, which would present business owners with several techniques designed to manipulate customer agents into buying their products. The situation got worse when these customer agents were provided with a wide range of options, seemingly overwhelming their capabilities and attention span.

In this photo illustration the logo of Microsoft is being displayed on a laptop screen and the logo of Copilot is being displayed on a smart phone.

Copilot remains the consumer AI focus for Microsoft. (Image credit: Getty Images | Anadolu)

According to Kamar:

“We want these agents to help us with processing a lot of options. And we are seeing that the current models are actually getting really overwhelmed by having too many options.”

The study further established that the agents were seemingly indecisive when asked to collaborate to achieve a common goal. It seemed like an uphill task for the models to identify which agent should be designated to take on a specific task to make the collaboration successful. However, the researchers indicated that they noticed an improvement in performance when the models were given explicit instructions on how to work together.

“We can instruct the models — like we can tell them, step by step,” concluded Kamar. “But if we are inherently testing their collaboration capabilities, I would expect these models to have these capabilities by default.”

Microsoft's simulation experiment demonstrates two key findings: first, that these models may not yet be ready for broad adoption and require further fine-tuning; and second, that a lack of proper prompt engineering skills significantly hinders users from unlocking the full potential of these agents (via TechCrunch).

FAQ

What is an AI agent?

AI agents are purpose-driven tools powered by generative AI technology, designed to achieve specific goals and tasks on behalf of users. For instance, OpenAI's "Operator" is an AI agent designed to browse and interact with the web on behalf of users.

Are AI agents reliable?

While the tools show great promise as productivity boosters at the workplace, Microsoft's latest research shows that there's still a lot of work that needs to be done in terms of their capability to collaborate with other tools to achieve desired results.

What is the Magnetic Marketplace?

A simulated environment created by Microsoft and Arizona State University to test how AI agents perform tasks, make decisions, and collaborate without human supervision.

What models were tested?

Microsoft used OpenAI’s GPT-4o, GPT-5, and Google’s Gemini 2.5 Flash.

What went wrong?

Agents became overwhelmed by too many options, struggled to collaborate, and were easily manipulated into making poor decisions.

Click to follow Windows Central on Google News

Follow Windows Central on Google News to keep our latest news, insights, and features at the top of your feeds!

TOPICS

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.