Google's going to scrape the entire public Internet to train its AI tools and there's nothing we can do about it

Google Bard announcement blog
(Image credit: Future)

What you need to know

  • Google's latest privacy policy came into effect on July 1 and it's going to be a little controversial. 
  • The owner of the largest search engine on the planet is now going to use all that scraping to train its AI models and we're basically going to have to live with it. 
  • Use of data to train AI models has already providing its own drama, notably from large sources like Reddit. 

Google's latest privacy policy update isn't necessarily surprising, but it does also set off some alarm bells. Particularly for those who already have their doubts over the AI revolution. 

As highlighted by Gizmodo the latest statement on the search giant's privacy policy contains a key update relating to AI: 

“For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

The most recent policy prior to this only made mention of "language models" and specifically, Google Translate. The latest update makes it clear that anything public on the Internet Google is going to be feeding into its AI tools like Bard

Is this surprising? Not at all. Google is the gatekeeper to the Internet, especially for publishers like us and our parent company. Playing the game of getting your content to rank well in Google is exhausting, but also critical. And now all of that content is going to be fed into Google AI. All of it. 

It's certainly going to stoke the flames of debate. Recently we've seen issues on Reddit with regards to access to its API, the losers of which were basically the users of Reddit. Twitter's owner, Elon Musk, has also been vocal about scraping, claiming the recent disaster on the platform with rate limits is in response to that (even if it might not be 100% true). 

BingGPT brings the Bing Chat experience to the desktop

AI is the future but it's going to get messy.  (Image credit: Windows Central)

This move is only going to further stoke the debate, and the backlash over the training of AI tools. OpenAI has already had its fair share over the data used to train the GPT model, the same that powers Microsoft's Bing Chat. Microsoft also has a search engine, but its reach pales in comparison to that of Google Search. 

The legality will also come into question. We're in uncharted, murky waters with all this stuff. The EU already has issues with Google Bard, and quite how this will align with the territory's GDPR rules will be interesting to find out. Until it's technically not illegal, maybe Google is just going to do what Google does. Which is whatever it wants. 

AI models need to be trained somehow. But Google's latest policy doesn't seem to indicate the company is willing to compensate any of the creators of that content. Everyone needs their stuff to be surfaced in Google, and it does feel kind of like Google is abusing that to its own ends. 

Buckle up, it's going to be a bumpy ride. 

Richard Devine
Managing Editor - Tech, Reviews

Richard Devine is a Managing Editor at Windows Central with over a decade of experience. A former Project Manager and long-term tech addict, he joined Mobile Nations in 2011 and has been found on Android Central and iMore as well as Windows Central. Currently, you'll find him steering the site's coverage of all manner of PC hardware and reviews. Find him on Mastodon at