@pointlessone @risottobias@tech.lgbt @seldo
Isn't it exactly how it works? It doesn't "give answers" it just attempts to predict a criteria that you do not define explicitly. Then it "hallucinates" some totally random stuff (that takes the most computational resources) and matches it against that criteria, presents to you what statistically matched it the most — that's it!
@m0xee @risottobias @seldo That's true. No magic. But there's utility. You don't throw out your hammer because it can't drive screws. You accept the tool’s limitations and use it for what it’s good for.
@m0xee @risottobias @seldo Well, let me point out that “totally random” and “statistically matched” kinda clash.
I’d say there’s significant correlation between a question and the right answer to it. Even with all the flexibility of natural language there’s only so many ways a right answer can be phrased.
So if we take those statements as true we have a decent chance of getting a right answer from an LLM. We have to take into account that LLMs are trained on the internet. That is, most questions don’t have only right answers to them in the training data. Most of internet also is not questions and answers. So yeah, you’re getting a statistically plausible continuation of your prompt, not an answer to your question, and not necessary the right answer to your question.
Extracting knowledge from an LLM is not the main useful thing about them either. There are other functions that you can get from statistical language model. For instance, summarisation. Remove less significant tokens and repetitions and you have your input summary. Translation. If an LLM is trained on multiple languages tokens that have similar meaning would naturally settle close to each other in the token space because they’re used in similar contexts.
Hallucination can also be the desired function. An LLM with high temperature setting (cranked up randomness) can be used as a brainstorming tool. You have your idea, the prompt, and you want to get a bunch of related avenues to explore. They won’t be absolutely novel but there might be something that you personally have not thought of.
There’s no magic here. It can’t solve al your problems. But there are useful functions in there.
@m0xEE@breloma.m0xee.net @risottobias @seldo @m0xee@librem.one I’m not entirely sure where you were going with the balls metaphor so I might be a bit off mark commenting on it but I’ll still do. :)
Neural nets are not random. If anything, they’re extremely deterministic. LLMs have special machinery to introduce some randomness in order to give more varied output. That is, you don’t randomly take ball out of a black box. It’s rather you put a ball into one side of a black box, and get a statistically correlated kind of a bad on the other side of that box. You can put in random balls and out put might look somewhat random in that case. Correlation might not be perfect, too. But a trained NN will encode some degree of correlation. Current generation of image generators use Diffusion approach. They start with some noice and tweak it to be closer a bit to an image that has desired features. It does it multiple times in small increments and eventually the noise is transformed into an image with high degrees of “floweriness” and “cainess”. But again, this is a totally deterministic process. Starting noice is there to increase variety. If you run the NN on the same noisy input you’ll get the same image.
The point about red ball, I think is correct. If LLM sees a new token it won’t know what to do with it. There’s no statistical relation for that token to any other token. So it can not produce that token. But even in more practical sense, if there was just a handful examples of some tokens in the training data there might be not enough to produce robust relationships to other tokens. Say, ChatGPT is quite good with widely used languages such as English, German, Spanish, etc. but its performance declines proportionally with the amount of training data. You will not get good performance in languages that poorly represented on the internet. Say, some dead languages or a language spoken by some Amazonian tribe of 200 people won’t give you much.
On knowledge. This is s ticking point of most modern AI scepticism. LLMs (and neural nets in general) do not know things. That is not how NNs work. NN that perfectly encodes factual knowledge is said to be overfit to its training data. However, that doesn’t mean there’s no correlation between input and output. Language encodes a decent amount of knowledge. There’s a significant chance token “nice” can be found somewhere after token “69”. This statistical correlation effectively encodes knowledge about a meme. LLM doesn’t have any understanding/knowledge what “nice”, “69”, or “meme” means but it still encodes it. Same goes for image generation NNs. They don’t know what “flowery” or “cat” means but they encode statistical relation between those words and patterns/features in images.
On novel discoveries. Current generation of LLMs can’t do it as far as I can tell. However, that is mostly because that is what we’ve coded them to do. Maybe a meta-LLM might notice patterns in relations between different groups of tokens and make that connect in some way? But that’s just speculation. What you’re describing is a regular grift. Same as homeopathy, essential oils. In my opinion it’s a different problem. It is absolutely a problem but it’s not the same as AI having no utility. Learn how stuff works and you’ll have better chance not to fall for it. We’ve known what homeopathy for decades and people still fall for it. However, unlike homeopathy AI system do have real utility. AI scepticism and AI grift scepticism should be two different things.
On efficiency. It’s true that huge LLMs require a lot of power. We’re unlikely to bring them down to the level of efficient sorting algorithms, for example. But work is being done to increase efficiency and it’s done at a pretty good pace. GPT-3.5 is 175 billion parameters. We have open source models of 13B parameters that are very close on some tasks. That is 10x improvement in just a couple of years. People are trying alternative architectures like collectives of specialised models that can only activate relevant models greatly reducing required power. It will become better. I mean, it will become more efficient. It might still require a lot of power but it will probably be better at what we want it to do. Maybe it will even learn to fact check its output if we really want that.
@m0xEE@breloma.m0xee.net @risottobias @seldo @m0xee@librem.one BTW, I’ve touched in some of these topics before.
About distinguishing AI the technology and AI the grift: https://status.pointless.one/@pointlessone/111807312813079938
About power consumption: https://status.pointless.one/@pointlessone/111778211684985340
@pointlessone @risottobias@tech.lgbt @seldo
I mean it doesn't know any answers that we don't already know, the trick is to expand very vague criteria it's given with what people are likely to expect — no magic there 🤷