I was reading this article about LLMs making bad citations. I found it pretty interesting, so I decided to try to replicate it with ChatGPT.
https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
I tried it with a document I wrote, FEP 5711. It's an enhancement proposal for ActivityPub, adding some inverse relationships for important properties.
https://codeberg.org/fediverse/fep/src/branch/main/fep/5711/fep-5711.md
Anyway, I took a paragraph out of the document and asked ChatGPT to identify the URL, publisher, publication date, and title. It failed. You can see the transcript here:
https://chatgpt.com/share/68573fa9-b340-800f-b9b4-7b74fdf0bf46
I was surprised to see that it had really no visibility of the FEPs. After a while, I realized that codeberg.org, the hosting service for FEPs, has ChatGPT blocked.
I understand the goal; many people don't want their code to be used by LLM code generators. But it also means that this document repository isn't visible for people who use LLMs like a search engine. Numbers vary, but afaict somewhere around 10% of people use LLMs as their primary search engine, and about 50% of people use LLMs some of the time for search.
@evan @Codeberg Addendum: My source (apart from a matrix channel): https://blog.codeberg.org/letter-from-codeberg-software-is-about-humans.html
"Codeberg was running smoothly overall, except for one thing: Occassionally, AI scrapers crawled our site too much and caused downtimes."
@elshid @Codeberg @evan yes, absolutely agree - as long as LLM scraper bots don't follow simple rules (like following robots.txt files) and act not like a destruction swarm on services, I'm not sorry for LLM users not finding the information.
I've seen LLM-bot swarms hitting forges and the only way to get the forges running for the intended audience was blocking . Sad thing is the scrapers then try to circumvent the blocks aggressively, even dropping their IDs so the blocking gets more complicated again. It's an arms race, and legit users of LLMs are not in the focus of any of the racing parties.
@elshid @Codeberg that's interesting! Also, really self-destructive.