Ever more websites are using #Cloudflare to block #AI scrapers. Cloudflare is still a man-in-the-middle #MITM attack on the web, but I do think people should have the ability to block the AI crap. So I now have some sympathies for using Cloudflare. What if we had real gov #eID that could be used for captchas? This requires privacy-respecting services that only see the data they need, e.g. "are you a human with an eID? yes/no". There are concerns with eIDs but in implementations not the core idea
@eighthave but a client side scrupt could automate the interaction witha real id card then automate scrappung using this id.
And no doubt people will rent the use of their id by scrapping companies...
@tuxicoman sounds like a well known problem with known solutions, for example, APIs with rate limiting, tokens, etc.
@eighthave you mean throttling usage per "detected" same request origin (same id here)?
It looks you want to do the same as copyright holders. Control usage of published content. DRM in a sense.
There could be watermarks in your content you could retrieve in the AI outputs but it's fragile.
There could be legal obligations by AI companies to justify the source of their materials (we have it for food, intellectual property) and show proof of rights of use.
How can AI comapnies train of anna's archive and it's fine for them....only.
@tuxicoman that's not how copyright law works. Currently everything is copyrighted once its created, that includes emails. Just because someone forwards someone else email does not mean that they can grant a license for a text that someone else created.