Nvidia Scraps Massive Amount of Data For AI

JoshB

It’s becoming more apparent in that the largest corporations don’t fear any repercussions from scrapping data from many large sources, like YouTube, for its own AI training data. 404Media posted recently in that they received confirmation of internal Nvidia staff Slack messages, going over which content to use to toss into their AI machine.

When asked about legal and ethical aspects of using copyrighted content to train an AI model, Nvidia defended its practice as being “in full compliance with the letter and the spirit of copyright law.” Internal conversations at Nvidia viewed by 404 Media show when employees working on the project raised questions about potential legal issues surrounding the use of datasets compiled by academics for research purposes and YouTube videos, managers told them they had clearance to use that content from the highest levels of the company.

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI - 404 Media

The one thing I can see as a response to this is something we’ve already seen. Reddit entered an exclusive contract with Google that would allow the search company full access to the user-generated content within Reddit’s servers to train Google AI. It also put in a block against any other search browser that anyone trying to search for any information that might live within the walls of Reddit, has no results.

There will be further siloing of these massive sites to keep just about any search browser out and to make the average internet user an experience that is severely diminished.

View full article