Microsoft quietly deleted a blog promoting training AI on pirated Harry Potter books — amid backlash over copyright concerns

The English version of the 5th installment of the Harry Potter books, The Order Of The Phoenix.
(Image credit: Getty Images, Pascal Le Segretain | Microsoft)

While OpenAI CEO Sam Altman has openly admitted that it's virtually impossible to develop advanced AI models like ChatGPT without copyrighted content, he argues that copyright law doesn't categorically prohibit AI firms, ultimately leveraging "the fair use doctrine" to violate copyright law and destroy the internet.

More recently, Microsoft was forced to delete a blog post it had published in November 2024, which seemingly encouraged developers to pirate Harry Potter books to train AI models following backlash from critics in a Hacker News thread.

The dataset was marked as Public Domain by mistake; there was no intention to misrepresent the licensing status of the works.

Shubham Maindola, Data Scientist

The dataset was deleted late last week after the outlet reached out to Shubham Maindola, a data scientist in India with no known affiliations to Microsoft. “The dataset was marked as Public Domain by mistake," Maindola told Ars Technica. "There was no intention to misrepresent the licensing status of the works.”

Developing generative AI is no easy feat. Top AI research labs, such as OpenAI, are quickly burning through substantial funds to maintain the hype amid rising concerns among investors about returns on their investments. The ChatGPT maker is reportedly on-course to make a $14 billion loss in 2026 before going into bankruptcy by mid-next year.

The money aside, AI models heavily rely on information from the internet for training. However, reports suggest that Google, OpenAI, and Anthropic are suffering from a lack of high-quality data for model training, slowing down the advances in AI development.

AI model training has always been a complex issue, largely because there are no clear laws preventing tech companies from using copyrighted material in the process. Many firms lean on the concept of fair use as a legal shield, arguing that their practices fall within its protections.


Click to join us on r/WindowsCentral

Join us on Reddit at r/WindowsCentral to share your insights and discuss our latest news, reviews, and more.


Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.