Watch out, Hollywood! OpenAI's latest model generates lifelike minute-long AI videos, but it has some critical weaknesses

Sora struggles with the prompt "Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care." (Image credit: OpenAI)

What you need to know

  • OpenAI recently debuted a new AI model dubbed Sora with video generation capabilities.
  • The text-to-video model can generate up to one-minute-long videos while maintaining high quality and adherence to the user’s prompt.
  • However, Sora struggles to simulate the physics of a complex scene and understand specific instances of cause and effect.

At the beginning of the year, Microsoft's Bill Gates and OpenAI's Sam Altman touched base at the Unconfuse Me podcast. The two revolutionary leaders discussed everything revolving around the ChatGPT maker, including Altman's firing and rehiring, the development of GPT-5, superintelligence, and more.

Sam Altman also discussed the possibility of video capabilities shipping to the company's AI-powered chatbot since it's the top request from most users. He added that this addition would build on the already existing voice mode and image generation features. 

And now, barely a month after sharing this information, OpenAI has unveiled a new text-to-video model dubbed Sora. The AI model "can generate videos up to a minute long while maintaining visual quality and adherence to the user's prompt."

See more

It's worth noting that the model won't be available for everyone to access immediately. OpenAI is shipping the tool exclusively to "red teamers," visual artists, designers, and filmmakers who will assess potential areas for harm and risk.

Additionally, this will create an avenue for seasoned professionals in the film industry to provide feedback and suggest new ways for OpenAI to advance and improve the model.

Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt but also how those things exist in the physical world.

OpenAI

While the model ships with a deep understanding of language that allows it to interpret text prompts and generate life-like characters correctly, OpenAI admits that it also has its fair share of weaknesses. 

The company pointed out that the model may face challenges when trying to simulate the physics of a complex scene. It may also struggle to understand specific instances of cause and effect. According to an example provided by OpenAI to further explain this premise, "a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark."

Sora also has the capability to generate a video featuring multiple shots that "accurately persist characters and visual style." However, it may fall short when it comes to the spatial details of a prompt. For instance, it may struggle to decipher right from left or even specific events that take place over time. 

AI may render more professions obsolete

(Image credit: Future | Image Creator by Designer)

Besides the tough economic times, generative AI comes a close second when it comes to factors negatively impacting job security. AI-powered chatbots like Microsoft Copilot and ChatGPT are already claiming jobs from journalists. We've seen multiple publications lay off some of their employees in favor of these AI chatbots, and it turned out to be a hot mess. Microsoft has introduced a new program designed to equip journalists with skills that will prepare them for a future newsroom with AI

RELATED: AI-generated article recommends a food bank as a tourist attraction

Even AI-powered tools like Microsoft's Image Creator from Designer (formerly Bing Image Creator) are getting good at designing projects. This could potentially render architectural jobs redundant and obsolete

Admittedly, if someone showed me the videos generated by Sora, I wouldn't have even imagined that they were AI-generated (they look that good). And while the videos are currently capped at one minute, it's only a matter of time till you can generate an entire episode of your favorite show. 

OpenAI has indicated that it is working on elaborate measures to prevent instances of misinformation, hateful content, and bias  before it ships the model to general availability. 

Microsoft 365 Personal | From $70/year

<a href="https://click.linksynergy.com/deeplink?id=kXQk6%2AivFEQ&mid=46107&u1=hawk-custom-tracking&murl=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fmicrosoft-365%2Fbuy%2Fcompare-all-microsoft-365-products-b%3Fef_id%3Dec81e8803f67177f723a8aa22dc6ab22%253AG%253As%26OCID%3DAID2200005_SEM_ec81e8803f67177f723a8aa22dc6ab22%253AG%253As%26lnkd%3DBing_O365SMB_Brand%26msclkid%3Dec81e8803f67177f723a8aa22dc6ab22" data-link-merchant="microsoft.com"">Microsoft 365 Personal | <a href="https://click.linksynergy.com/deeplink?id=kXQk6%2AivFEQ&mid=46107&u1=hawk-custom-tracking&murl=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fmicrosoft-365%2Fbuy%2Fcompare-all-microsoft-365-products-b%3F" data-link-merchant="microsoft.com"" data-link-merchant="microsoft.com"">From $70/year
Microsoft 365 Personal comes with the Office suite and 1TB of OneDrive storage. It allows you to work from several devices, including Windows, macOS, iOS, and Android. It also includes a long list of other apps and services, such as Editor, Microsoft Forms, and Microsoft Teams.

Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

  • Ben Wilson
    I'm most interested in where they expect to get the training data for Sora, at least whenever the full version is released. You likely have to guess it's YouTube, which feels strange.
    Reply