New episode is out! @dsearls and @katherined talk to @eze_lanza and Tony Mongkolsmai about #ChatGPT, generative #AI, and #opensource software.

Episode 135 - Experts Weigh in on ChatGPT. Listen here: reality2cast.com/135
#podcast #newEpisode #tech

Follow

@reality2cast

Hi, thanks for an interesting discussion.

Here is one question I think you missed: for that kind of AI system that is "trained" on data from the internet, suppose now that in the future the "new" content added to web pages will be increasingly generated by such systems. Then when the systems are to be improved by training on the new added content, they are increasingly taking in their own earlier output as new training input.

1/?

@dsearls @katherined @eze_lanza

@reality2cast

Will the systems still improve in that situation, or will things stagnate because of lack of real input from real human beings?

I guess that if you think there is no problem with that, then you could already construct such a feedback loop where the system eats its own output as input. Does that work, or not?

Will "original content" created by actual humans capable of original thought become increasingly rare and increasingly valuable?

2/2

@dsearls @katherined @eze_lanza

@eliasr @reality2cast @dsearls @katherined @eze_lanza Interesting thought Elias that I haven't thought of. Another question I have, how does AI fact check? Eager to hear what the podcasters have to enlighten us!

@hehemrin @eliasr @reality2cast @dsearls @eze_lanza That's the thing. In the case of ChatGPT, fact checking is crowd-sourced, but I don't think it could fact check itself without human input, but I'd defer to @eze_lanza on that. It definitely has accuracy issues today in its current state. Significant ones, as we mentioned in the podcast. Depending on the prompt, it might make up something totally nonsensical that could seem true.

@katherined @hehemrin @eliasr @reality2cast @dsearls I can be pretty sure they do have some validation checks (policies or rules) but it's almost impossible to have rules for every topic, so they need people feedback which is gold!

@eliasr @reality2cast @dsearls @katherined Hey good point! I see it as when used in scenarios like computer vision, where you can create synthetic data to increase your dataset size (data augmentation). It's also useful when you have a very little amount of data to train your model, it can help on the training task when you are in this situation and it works well. An important disclaimer here is that the models trained with synthetic data has lower accuracy than those trained on real data 1/2

@eliasr @reality2cast @dsearls @katherined
Since the models are tested with real data, you can test how well the synthetic data worked to improve your system. In my experience it's pretty useful but it has to be either validated before training the model (check data) or after training manually or with a validation system. Mmmm maybe this is what chatGPT is doing when we provide the feedback?
2/2

@eliasr @katherined @dsearls @eze_lanza @reality2cast This is a crisis already for translation software. Some people (are OpenAI doing this?) do research on identifying bot-generated content precisely for the purpose of bots not training on themselves.
@eze_lanza @eliasr @katherined @dsearls @reality2cast Almost existential. The feedback loop of bots training on themselves and people then listening to the bots could really spin out of control.

@clacke @eliasr @dsearls @eze_lanza @reality2cast That's very interesting. I could see how self-training on translation might eventually create a new language or dialect. And now I want to go ask ChatGPT to create a new spoken language and see what happens. :)

Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml