Fascinating report from 404 Media's Samantha Cole on a trove of leaked NVIDIA Slack messages and emails about how they're scraping millions of YouTube videos to train their own new foundation video generation model: 404media.co/nvidia-ai-scraping

Posted a few of my own notes here: simonwillison.net/2024/Aug/5/n

It's not surprising to learn that they're doing this - that's practically the industry standard right now - but is still really interesting to see internal details of what they're collecting and why

@simon

every now and then i feel like im taking crazy pills because i remember when aaron swartz killed himself because he was going to go to jail forever because he scraped JSTOR,

and twenty years later your manager tells you “sshhhh it’s fine just scrape all of it don’t worry the CEO said it’s fine”

Follow

@phillmv @simon He was going to go to jail not for scraping JSTOR, but for plugging his laptop into an ethernet port he found in a school library.

@zachdecook @phillmv @simon He was going to jail for not showing enough deference to his overlords.

Sign in to participate in the conversation
Librem Social

Librem Social is an opt-in public network. Messages are shared under Creative Commons BY-SA 4.0 license terms. Policy.

Stay safe. Please abide by our code of conduct.

(Source code)

image/svg+xml Librem Chat image/svg+xml