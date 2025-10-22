A new study claims that LLMs can get "brain rot" due to prolonged exposure to low-quality data.

Generative AI has evolved, scaling greater heights across a wide range of fields in computing, education, medicine, and more. The technology has come a long way from the early days, where it was synonymous with hallucinations and generating outright incorrect responses to queries.

As you may know, top AI labs like Anthropic, OpenAI, Google, and more are heavily dependent on content uploaded and otherwise shared by humans on the internet to train their LLMs (large language models). Last year, a report suggested that these companies had hit a wall due to a lack of high-quality content for training, preventing them from developing advanced AI models.

And as it now seems, the same issues continue to haunt advances and development in the AI landscape. According to a new study by Cornell University, LLMs can get "brain rot" due to prolonged exposure to low-quality online data. It further elaborated that this heavily contributes to a decline in their cognitive capabilities.

For context, this kind of internet "brain rot" refers to prolonged exposure and consumption of low-quality and trivial online content. Studies show that this negatively impacts human cognitive capabilities, reasoning, and focus. The same can also be said about AI-powered models.

The researchers used two measures to assess and identify internet junk content. The first test was centered on engagement with short and viral posts, with a lot of engagement, while the latter focused on semantic quality with a bias on posts considered as low-quality and rife with a clickbait writing style.

Consequently, the researchers used the measures to construct datasets containing varying proportions of junk and high-quality content. They used the datasets to determine the impact of low-quality content on LLMs like Llama 3 and Qwen 2.5.

The goal behind the study was to determine the impact on AI systems when they continuously depend on low-quality content uploaded to the web, which is seemingly flooded with short, viral, or machine-generated content.

Perhaps of more concern, the study revealed that the accuracy of AI models purely using junk content fell from 74.9% to 57.2%. Their long-context comprehension capabilities were also negatively impacted, dropping from 84.4% to 52.3%. The researchers further revealed that the AI models' cognitive and comprehension capabilities would only worsen with prolonged exposure to low-quality content for training, a phenomenon they referred to as a dose-response effect.

The study also revealed that prolonged exposure to low-quality content negatively impacted the models' ethical consistency, prompting a "personality drift". As a result, the models were even more prone to generating incorrect responses to queries, making them less reliable.

Exposure to junk data also impacted the models' thought process, often skipping the step-by-step chain of thought. This prompted the models to rush through the process only to generate superficial responses.

The Dead Internet Theory is turning into a reality