**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:22

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:22

Evan Prodromou @evan@cosocial.ca

Jun 21, 2025, 23:22

I was reading this article about LLMs making bad citations. I found it pretty interesting, so I decided to try to replicate it with ChatGPT.

https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:24

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:24

Jun 21, 2025, 23:24

Evan Prodromou @evan@cosocial.ca

I tried it with a document I wrote, FEP 5711. It's an enhancement proposal for ActivityPub, adding some inverse relationships for important properties.

https://codeberg.org/fediverse/fep/src/branch/main/fep/5711/fep-5711.md

Show thread

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:27

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:27

Jun 21, 2025, 23:27

Evan Prodromou @evan@cosocial.ca

Anyway, I took a paragraph out of the document and asked ChatGPT to identify the URL, publisher, publication date, and title. It failed. You can see the transcript here:

https://chatgpt.com/share/68573fa9-b340-800f-b9b4-7b74fdf0bf46

Show thread

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:28

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:28

Jun 21, 2025, 23:28

Evan Prodromou @evan@cosocial.ca

I was surprised to see that it had really no visibility of the FEPs. After a while, I realized that codeberg.org, the hosting service for FEPs, has ChatGPT blocked.

https://codeberg.org/robots.txt

Show thread

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:34

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:34

Jun 21, 2025, 23:34

Evan Prodromou @evan@cosocial.ca

I understand the goal; many people don't want their code to be used by LLM code generators. But it also means that this document repository isn't visible for people who use LLMs like a search engine. Numbers vary, but afaict somewhere around 10% of people use LLMs as their primary search engine, and about 50% of people use LLMs some of the time for search.

Show thread

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:39

**Evan Prodromou** @evan@cosocial.ca · Jun 21, 2025, 23:39

Jun 21, 2025, 23:39

Evan Prodromou @evan@cosocial.ca

I guess there's maybe some justification like, those people are bad, and they don't deserve nice things like Fediverse Enhancement Proposals? Or, maybe, we have to take a principled stand against LLMs by not providing any training data for them? Such that, perhaps, people disappointed by not having good results in LLMs will return to using traditional search engines like Google or Bing, which are more ethical because reasons.

Show thread

**Mirko Adam** @elshid@librem.one · 2025-06-22T01:15:32Z

Mirko Adam @elshid@librem.one

@evan No, the reason is simply the server load. The AI crawlers have so excessively crawled @Codeberg that their main service, to host a git server, was often very slow.

Jun 22, 2025, 01:15 · Tusky · · ·

**Evan Prodromou** @evan@cosocial.ca · Jun 22, 2025, 02:30

**Evan Prodromou** @evan@cosocial.ca · Jun 22, 2025, 02:30

Jun 22, 2025, 02:30

Evan Prodromou @evan@cosocial.ca

@elshid @Codeberg that's interesting! Also, really self-destructive.

**K. Ryabitsev 🍁** @monsieuricon@social.kernel.org · Jun 22, 2025, 02:34

**K. Ryabitsev 🍁** @monsieuricon@social.kernel.org · Jun 22, 2025, 02:34

Jun 22, 2025, 02:34

K. Ryabitsev 🍁 @monsieuricon@social.kernel.org

@evan @elshid @Codeberg AI crawlers descend on online resources like locust and consume then until they are dead. When they recover, they do it again.

**Mirko Adam** @elshid@librem.one · Jun 22, 2025, 02:54

**Mirko Adam** @elshid@librem.one · Jun 22, 2025, 02:54

Jun 22, 2025, 02:54

Mirko Adam @elshid@librem.one

@evan @Codeberg Addendum: My source (apart from a matrix channel): https://blog.codeberg.org/letter-from-codeberg-software-is-about-humans.html

"Codeberg was running smoothly overall, except for one thing: Occassionally, AI scrapers crawled our site too much and caused downtimes."