Hi, thanks for an interesting discussion.
Here is one question I think you missed: for that kind of AI system that is "trained" on data from the internet, suppose now that in the future the "new" content added to web pages will be increasingly generated by such systems. Then when the systems are to be improved by training on the new added content, they are increasingly taking in their own earlier output as new training input.
1/?
@eliasr @reality2cast @dsearls @katherined @eze_lanza Interesting thought Elias that I haven't thought of. Another question I have, how does AI fact check? Eager to hear what the podcasters have to enlighten us!
@hehemrin @eliasr @reality2cast @dsearls @eze_lanza That's the thing. In the case of ChatGPT, fact checking is crowd-sourced, but I don't think it could fact check itself without human input, but I'd defer to @eze_lanza on that. It definitely has accuracy issues today in its current state. Significant ones, as we mentioned in the podcast. Depending on the prompt, it might make up something totally nonsensical that could seem true.
@katherined @hehemrin @eliasr @reality2cast @dsearls I can be pretty sure they do have some validation checks (policies or rules) but it's almost impossible to have rules for every topic, so they need people feedback which is gold!
@eliasr @reality2cast @dsearls @katherined Hey good point! I see it as when used in scenarios like computer vision, where you can create synthetic data to increase your dataset size (data augmentation). It's also useful when you have a very little amount of data to train your model, it can help on the training task when you are in this situation and it works well. An important disclaimer here is that the models trained with synthetic data has lower accuracy than those trained on real data 1/2
@eliasr @reality2cast @dsearls @katherined
Since the models are tested with real data, you can test how well the synthetic data worked to improve your system. In my experience it's pretty useful but it has to be either validated before training the model (check data) or after training manually or with a validation system. Mmmm maybe this is what chatGPT is doing when we provide the feedback?
2/2
@eze_lanza @eliasr @reality2cast @dsearls Yeah, I think you are right about user feedback.
@clacke @katherined @eliasr @dsearls @reality2cast
This is sure to be a hot area of research for years to come!
@clacke @eliasr @dsearls @eze_lanza @reality2cast That's very interesting. I could see how self-training on translation might eventually create a new language or dialect. And now I want to go ask ChatGPT to create a new spoken language and see what happens. :)
@reality2cast
Will the systems still improve in that situation, or will things stagnate because of lack of real input from real human beings?
I guess that if you think there is no problem with that, then you could already construct such a feedback loop where the system eats its own output as input. Does that work, or not?
Will "original content" created by actual humans capable of original thought become increasingly rare and increasingly valuable?
2/2
@dsearls @katherined @eze_lanza