Despite precautions, artificial intelligence companies are mining the internet

robots.txt is ignored

According to reports, it is described as a free artificial intelligence search engine. Perplexity, ForbesHe is accused of stealing and republishing news from . Perplexity is known as the Robots Exclusion Protocol, a widely accepted standard for determining which parts of a site are allowed to be crawled. Ignoring robots.txt was revealed. Although this protocol, which has been used since 1994, has been generally followed so far, there is no obligation and it is based on voluntary basis.

Perplexity, including themselves, according to Wired Condé Nast publications It continues to scrape data from websites. On the other hand, it seems that Perplexity is not the only artificial intelligence company that pulls content from websites despite “do not crawl” signals in robots.txt. OpenAI ve AnthropicIt also says that it ignores robots.txt signals and pulls data. Both companies have previously said they respect the “do not crawl” instructions they place in their websites’ robots.txt files.

Perplexity says that they respect robots.txt. However, this does not mean that search robots that ignore the protocol do not benefit. According to the company, the findings detected belong to one of the robots. Additionally, according to Aravind Srinivas, CEO of Perplexity, robots.txt does not have a legal obligation and states that publishers and AI companies need to establish a new relationship.

Source

https://www.engadget.com/ai-companies-are-reportedly-still-scraping-websites-despite-protocols-meant-to-block-them-132308524.html

https://www.wired.com/story/perplexity-is-a-bullshit-machine/

Share via Email
This is titled mail it to your friend.

This news our mobile application Download using
You can read it whenever you want (even offline):

Despite precautions, artificial intelligence companies are mining the internet

robots.txt is ignored

İlgili haberler: