A new cutting-edge AI model boasts lightning-fast image generation speeds and OpenAI's Sora video capabilities, with no demand for powerful GPUs and high-end hardware

New KOALA image generation model
(Image credit: Image Creator from Designer | Windows Central)

What you need to know

  • Korean scientists recently developed a new AI image generation model called KOALA.
  • Unlike other models like Microsoft's Image Creator from Designer, the tool flaunts faster image generation speeds.
  • It leverages a new technique dubbed knowledge distillation, which compresses the size of an open-source image generation tool called Stable Diffusion XL.
  • This way, it can generate images faster, even on old PCs with outdated GPUs.

A new AI-powered image generator is on the horizon and could potentially take on Microsoft's Image Creator from Designer (formerly Bing Image Creator), Midjourney, and OpenAI's DALL-E 3 model

The new tool can generate images in less than two seconds, significantly faster than your average image generation tool. According to a spot by Live Science, the South Korean scientists behind this new invention leveraged a new technique dubbed knowledge distillation, which compresses the size of an open-source image generation tool called Stable Diffusion XL.

How does this AI tool work?

An image of an AI robot stood in front of code generated by Bing Image Creator

(Image credit: Windows Central / Bing Image Creator)

For context, Stable Diffusion XL features up to 2.56 billion parameters. As you might already know, AI heavily relies on existing content, including images, for training. This large set of parameters explains why generating images might take a bit of time. However, with this new technique, the scientists cut down the parameters for its smallest model, KOALA, to 700 million.

As such, the tool can generate images in a split second. The image generation model doesn't require high-end GPUs and sophisticated devices to run smoothly. It only requires about 8GB of RAM to generate images. Essentially, the knowledge distillation technique sieves information from the large model to the smaller one without affecting the quality or performance. This way, the smaller model is capable of generating quality images faster.

RELATED: Microsoft's Image Creator's image generation speed is excruciatingly painful

According to benchmarks shared by the scientists, KOALA is significantly faster than OpenAI's DALL-E 3 or DALL-E 2 models. When prompted to generate "a picture of an astronaut reading a book under the moon on Mars," the former took 13.7 seconds and the latter 12.3 seconds. KOALA only took 1.6 seconds to generate the image.

There are five versions of KOALA. Three versions of the model generate images based on text prompts, while the remaining two versions (Ko-LLaVA) can generate both images and videos (much like OpenAI's Sora model).

The Korean scientists from the Electronics and Telecommunication Research Institute (ETRI) shared their work and findings in the open-source AI repository Hugging Face and the arXiv database.

The scientists intend to integrate these models across existing image generation services, content production, and more.

Microsoft 365 Personal | From $70/year

Microsoft 365 Personal | From $70/year

Microsoft 365 Personal comes with the Office suite and 1TB of OneDrive storage. It allows you to work from several devices, including Windows, macOS, iOS, and Android. It also includes a long list of other apps and services, such as Editor, Microsoft Forms, and Microsoft Teams.

Kevin Okemwa

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.