I was reading this article about LLMs making bad citations. I found it pretty interesting, so I decided to try to replicate it with ChatGPT.
https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
I tried it with a document I wrote, FEP 5711. It's an enhancement proposal for ActivityPub, adding some inverse relationships for important properties.
https://codeberg.org/fediverse/fep/src/branch/main/fep/5711/fep-5711.md
Anyway, I took a paragraph out of the document and asked ChatGPT to identify the URL, publisher, publication date, and title. It failed. You can see the transcript here:
https://chatgpt.com/share/68573fa9-b340-800f-b9b4-7b74fdf0bf46
I was surprised to see that it had really no visibility of the FEPs. After a while, I realized that codeberg.org, the hosting service for FEPs, has ChatGPT blocked.
I understand the goal; many people don't want their code to be used by LLM code generators. But it also means that this document repository isn't visible for people who use LLMs like a search engine. Numbers vary, but afaict somewhere around 10% of people use LLMs as their primary search engine, and about 50% of people use LLMs some of the time for search.
@evan @Codeberg Addendum: My source (apart from a matrix channel): https://blog.codeberg.org/letter-from-codeberg-software-is-about-humans.html
"Codeberg was running smoothly overall, except for one thing: Occassionally, AI scrapers crawled our site too much and caused downtimes."
@elshid @Codeberg that's interesting! Also, really self-destructive.