July 19, 2024

Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI

Proof News

Contributed by: Drex DeFord

Summary

An investigation by Proof News has revealed that several major AI companies, such as Apple, Nvidia, and Salesforce, have leveraged subtitles from 173,536 YouTube videos to train their AI models, despite YouTube’s rules against unauthorized data harvesting. This dataset, known as YouTube Subtitles, includes transcripts from educational channels like Khan Academy and Harvard, as well as popular shows and channels like MrBeast, Marques Brownlee, and PewDiePie. Creators were not notified or compensated for the use of their content, which has sparked concerns about consent and fair use within the creator community. AI companies argue that the data was publicly available and did not violate YouTube's terms directly. However, there are ongoing legal debates about the ethical implications and potential hazards of using such data without explicit permission from content creators. The case raises important questions about data use, copyright, and compensation in the age of AI.