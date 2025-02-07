Forget DeepSeek: Researchers develop a $50 OpenAI competitor in less than 30 minutes that thinks harder when you ask it to "wait"
The cheap AI challenges OpenAI's o1 reasoning model by distilling information from Gemini 2.0 Flash Thinking Experimental.
The emergence of DeepSeek and its R1 V3-powered AI model, which surpasses OpenAI's o1 reasoning model across a wide range of benchmarks, including math, science, and coding, has raised investor concern about the exorbitant cost tied behind AI advances, seemingly making commitments such as OpenAI's $500 billion Stargate project seem counter-productive.
Researchers at Stanford and the University of Washington recently developed an AI model to take on OpenAI's o1 reasoning model. For more context, the model, dubbed s1, was trained using a dataset of 1,000 questions for under $50 (via TechCrunch). The researchers managed to achieve this milestone by distilling information from proprietary larger AI models.
Distillation is the process where a small AI model extracts information from larger AI models. In this case, the researchers indicated that s1 extracted its answers from Google's Gemini 2.0 Flash Thinking Experimental AI reasoning model. As spotted by The Verge, the tool's terms of service categorically indicate that it's prohibited to use Gemini's API to develop models that compete with the company's AI models.
The process narrows the gap between AI startups and well-established AI firms, as they can develop sophisticated entries without breaking the back. However, top AI labs, including OpenAI and Microsoft, by extension, aren't happy about smaller AI startups using distillation to refine their AI models. OpenAI and Microsoft recently accused DeepSeek of using their copyrighted data to train its ultra-cost-effective model.
s1's training process took less than 30 minutes using 16 NVIDIA H100 GPUs. The model is based on Qwen2.5, an open-source Alibaba AI model. More interestingly, the researchers revealed that they asked the AI model to "wait" during the reasoning process, prompting it to think harder before generating its response to the query. “This can lead the model to doublecheck its answer, often fixing incorrect reasoning steps,” the researchers noted. As a result, the AI model seemingly generated well-curated and accurate answers.
You can check out the s1 model on GitHub.
Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.