Companies go after robot web crawlers

Robots have never been welcome on the internet—think of all the crosswalk photos waiting to stump us in a CAPTCHA—but that’s never been more true in an era of web scraping bots.

Driving the news: The companies behind some of the world’s most-visited websites, including retailers, job sites, and news publishers, are blocking these bots from accessing their sites. Deployed by AI companies like OpenAI, web scraping bots gather information from around the web to train the models that power AI tools, like the chatbot ChatGPT.

OpenAI is reportedly on track to make over US$1 billion in revenue over the next year. Media companies, in particular, have also started to argue that their content is part of what has created that value—so they’re looking to get AI firms to pay up.

Bottom line: Regulation for the AI industry has been slow to catch up to the reality of companies using material gathered from the web to train algorithms and chatbots. Big companies are taking matters into their own hands, between heading to court or blocking crawlers themselves. About 20% of the top 1000 websites have already taken action.—SB

Companies go after robot web crawlers

Get smarter in just 5 minutes.

Get smarter in just 5 minutes.