- Big tech companies are scrambling to find new data sources to train their AI systems.
- Meta has considered several ways to collect the data, including acquiring Simon & Schuster, the Times reported.
- It also considered pursuing litigation instead of negotiating a license agreement, the Times wrote.
Big tech companies are scrambling to discover new data sources to fuel the AI arms race.
And at Meta, the issue is so serious that management met almost every day in March and April of last year to develop a plan, The New York Times reported.
As AI systems become more powerful, tech companies are being forced to acquire data more aggressively, potentially exposing them to potential copyright infringement. For example, some suspect that OpenAI uses his YouTube to train its video generator “Sora”. Mira Murati, the company's chief technology officer, denied the accusations.
The Times reported that during Mehta's meeting, some attendees broached the idea of buying publisher Simon & Schuster, which private equity firm KKR sold for $1.62 billion in August last year. It was reported that it had been acquired. Others suggested paying $10 per book to get full licensing rights to new titles.
By the time of our meeting, Mehta had already summarized many books, essays, and other online works. The company hired contractors in Africa to compile summaries of fiction and nonfiction titles, some of which contained copyrighted information. “We can't afford not to take this back,” the manager said during the meeting.
Attendees discussed whether the company could continue to collect data from potentially copyrighted sources without spending the time and expense of obtaining licensing agreements. When lawyers raised the “ethical” concerns of acquiring intellectual property rights, they were met with silence, the Times reported.
Meta did not immediately respond to Business Insider's request for comment.
In the end, the executives at the meeting decided based on precedent. Author Guild vs. Google, a lawsuit filed in the Supreme Court in 2015. The court upheld the lower court's decision and declined to hear the case. The court said Google could scan and digitize books for Google Books under fair use guidelines. Meta's lawyers said Meta could train its AI systems based on the same guidelines, the newspaper reported.