The publicly traded cloud service provider, Cloudflare, has introduced a new free tool designed to stop bots from collecting data from websites hosted on its network to build artificial intelligence (AI) models.
The Cloudflare team also released some data regarding how its clients are handling the surge in bots that use content scraping to build generative AI models in a blog post announcing this update.
Newsng gathered that this tool is available to all customers of the cloud service provider, even those on free plans.
Some AI providers, such as Google, OpenAI, and Apple, allow website owners to disable the bots used for data scraping and model training by editing their site’s robots.txt file, which instructs bots which pages they can access on a website.
“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the company writes on its official blog.
“We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”
Blocking is not always effective. According to studies, several manufacturers circumvent AI bot exclusion laws to gain an advantage.
Perplexity has been accused of impersonating real visitors to scrape content, while OpenAI and Anthropic have breached robots.txt laws.
Thus, Cloudflare examined AI bot and crawler traffic to improve automated bot detection models to solve the issue.
The models take into account, among other things, the possibility that an AI bot is attempting to avoid detection by imitating the look and actions of a web browser user.
“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint,” Cloudflare writes.
“Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”
Hosts can report suspected AI bots and crawlers using the form that Cloudflare has set up, and the company promises to keep adding AI bots to its manual blacklist over time.
We earlier reported that Meta, the social networking giant, revealed plans to train its artificial intelligence models using user data from privacy-conscious Europeans.