I keep turning this idea over in my head. Is there a way to ignore the corporate internet and just keep being awesome.
I think there are a couple of potentially viable paths forward, and I'm using this thread to ramble about some of my thoughts on this.
Search engines are a big part of what I'm thinking about.
You might or might not have noticed that functionality for both Google and DuckDuckGo has changed recently. What I'm specifically noticing is that in the past year or so I'm increasingly seeing "no matching pages" results, whereas this virtually never happened two or three years ago.
I don't know what's going on under the hood, but I think others have noticed it too, and some friends of mine have suggested this is a result of ChatGPT (and friends).
The hypothesis I've heard for why search engines results have become so much more sparse is that there is such a massive volume of utter crap being broadcast to the internet now that LLMs are readily accessible that search engine maintainers have been forced to adjust their thresholds for what pages are considered acceptable query results.
I have no idea whether this is true, but it is food for thought.
Whether or not the recent change in behavior by Google and DuckDuckGo is due to LLMs, it reminds me of past changes in search engine behavior. In recent (but less recent) years, there often seemed to be a problem of search engines were returning too much of the wrong content.
Sometimes it felt / feels like this unwanted content is disproportionately for-profit, which raises questions about sponsorship and motives, but there are other explanations.
The more innocuous hypothesis for why Google got worse is that there are just so many people on the internet now that filtering through the cruft to find specific information just became a harder problem.
Whatever the reason, the big players in connecting people with desired content have been falling short of what a lot of us would like to see.
I'm using the phrase "a lot of us" very carefully here. I think the vast majority of internet users have been content, if not satisfied, with the kinds and volumes of content that they've been able to access.
(There's a sidebar here about OKCupid's algorithms, which I won't get into right now.)
I don't know how large a number "a lot of us" is---those of us who have been frustrated for a while with the functionality of the corporate internet---but I'm wondering if that number might be enough to start building useful alternatives for at least some purposes.
With Google and DuckDuckGo not quite as useful as they used to be, could there might be space opening up for something different? And are there enough of us willing to resist the allure of LLMs to make it happen?
There's a double challenge with LLMs: whether or not they are the reason why search engines aren't working as well, they will be a massive distraction from any efforts to get democratically curated tools off of the ground.
Just yesterday a technically literate (if IMO excessively techno-optimistic) friend of mine remarked that ChatGPT seems to be a better search engine than search engines.
He presented a concrete example, and I get what he was talking about.
On the flip side, as it becomes harder to filter real data from statistically generated noise (i.e. LLM-based content), the LLM-developers are also going to have a harder time accessing robust training data.
LLMs create their own headwind to success, even as they make the web a more miserable place in general.
And here's a real challenge: any public user-curated platform that attempts to build better content filters is also a potential asset that LLM developers can tap into.
We've seen a version of this already when creators became more reluctant to post public content at all, following the rise of OpenAI.
If you believe that LLMs are effective tools for reorganizing data in useful ways, then a free, public, democratic alternative to search engines would be the last thing that people trying to resist LLMs would want to do.
Conversely, if you embrace the idea that LLMs are epistemically broken, and that it's just a matter of time before the mainstream technorati catches onto this, then there's no harm in trying.
Unfortunately, it isn't clear which of these is more true.
Stated another way, part of the question---and I don't know the answer---is whether the nightmares about LLMs beating out real content creation are a realistic picture of the future or something closer to what Lee Vinsel calls Criti-Hype (https://sts-news.medium.com/youre-doing-it-wrong-notes-on-criticism-and-technology-hype-18b08b4307e5).
Getting back to where I started on this thread, if the deluge of crap content generated by LLMs has indeed effectively hobbled traditional search engines, that might point to a general decline in usefulness of web-crawlers, might that mean that the best path forward is indexing websites by hand?
If so, how do we make this happen in a way that is democratic and non-exploitive?
This is where I'm looking for ideas from others.
I think maybe there would be two distinct paths forward:
a) Embrace the idea of a public web while letting go of fears that the proprietary internet will steal it all, or
b) Create something that's somehow locked down, perhaps leveraging some kind of web of trust to determine who gets to not just contribute but to use the resulting tools).
I think there's a real ugliness to option b here, the idea of setting up barriers to keep out those who seek to enclose the information commons, but I don't necessarily want to write it off entirely.
I especially don't want to write it off if leaving that door open causes people who might otherwise despair to see a ray of hope.
Because we really need hope in this world.
Whether going with option (a), which we might call "freedom", or option (b), which we could perhaps call "security", there are plenty of questions to be answered.
Who does the work? Is the labor compensated or voluntary? In either case, is there a viable way to get it done that isn't exploitive?
How does such a product get off of the ground?
How do we minimize inclusion of bad information sources (whether LLM or otherwise)?
@dynamic I dug deeper once, but I really don't recall. I don't think it was too hard to find.
@lwriemen
Quite possibly! I like that they are interested in integrating factual information into their indexing system.
From a quick glance, I didn't see anything about how they will be developing their index / database of websites. Have you seen anything about it?