One of the relatively little discussed phenomenon in the social critique of AI is the fact that is is not only a centralizing technology, but also one that increases the distance between people and the power of the owners of the coordinating infrastructure. Of course, there is not direct determinism of the intrastructure, but it matters in which way the playing field is tilted.
I think it's possible to look at the history of Internet intrastructure, particularly those aspects that users interact with, and see it as shaped by successive waves of centralization. Think Usenet/email (1980s), Web (1990s), Web2.0 (2000s), Web3 (2010s) and AI (2020s). The dates here do not indicate a historic periodization (they are way too neat), but a heuristic device useful for this particular purpose. So, when thinking of email, think of email in the 1980s (self-hosted) rather than contemporary Gmail (centrally-hosted).
In each of these waves, processing power moved from the edges of the network into the center. This is not always a bad thing, but it affects what is possible. For one, it lowers the barrier for entry for users (the web was clearly more user-friendly than usenet), but it also adds power to whom ever control the central infrastructure. Hosting a discussion-forum on your webserver provides the host with more control over the discussion than distributing a newsnet group. Again, this is not necessarily a bad thing when it comes to, say, design, technical improvements, or moderation. Decentralization (of certain technical aspects) is not a positive value per se and it's not incompatible with centralization (of other technical aspects). 1/3
@festal if someone claims there is no such problem, then I would like to tell them: "okay but then you don't need to bother with new training data anymore, just skip that and make a loop feeding the machine its own output back as input".
I imagine it will be like an animal trying to survive by eating its own feces. After a few cycles of that, there is not much more to gain.
@eliasr good question. We will probably never get there. But it seems indicative that openai tries to avoid training its system on its own output.