In a rapidly evolving controversy at the intersection of artificial intelligence (AI) development and copyright law, Facebook parent company Meta is under scrutiny for allegedly using pirated content from Library Genesis (LibGen) to train its AI models. This situation has sparked legal actions and drawn sharp criticism from authors and publishing organizations, highlighting the complex ethical and legal challenges posed by AI’s reliance on extensive datasets.
LibGen: Pirate library under legal fire
Established by Russian activists, LibGen is a notorious online repository offering free access to millions of books and academic papers, many copyrighted. The site has faced multiple legal challenges over the years, including a 2015 lawsuit by academic publisher Elsevier that led to a court order to shut down the site. LibGen continued to operate through alternative domains and mirrors.
In September 2023, educational publishers — including Pearson Education, McGraw Hill, and Macmillan — filed suit, accusing it of “extensive violations” of copyright law and demanding control or deletion of its domains. By December 2024, these publishers had succeeded in disabling many LibGen domains, but the pirate site found ways to continue.
Meta’s alleged use of pirated content
The Meta controversy began when unsealed court documents revealed that the company had used LibGen datasets to train its AI language models, including Llama 3. Internal communications indicated that Meta employees were aware of the pirated nature of these materials. Though one employee expressed ethical concerns, Meta CEO Mark Zuckerberg reportedly signed off on their use.
OpenAI CEO Sam Altman has also famously said his company could not build competitive AI models without training them on copyrighted material.
The Atlantic published a searchable database that allows authors to see whether their works were included in the LibGen data reportedly used in AI training. The move is part of a growing demand among creators for transparency and accountability in how their work is being used, with or without their permission.
Legal actions and author responses
In response to these revelations, a group of American authors — including Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey — filed a lawsuit against Meta, alleging copyright infringement. The plaintiffs argue that Meta’s use of pirated books to train its AI models violates their intellectual property rights and undermines their livelihoods. The lawsuit seeks damages and an injunction to prevent Meta from further using unauthorized materials.
The Authors Guild, a prominent organization representing writers, has been actively involved in combating digital piracy. The Guild has collaborated with publishers and the federal government to take down major piracy websites like Z-Library and has assisted in actions against LibGen, resulting in blocked U.S. domains and multimillion-dollar fines. The Guild continues to advocate for authors’ rights and emphasizes the importance of protecting creative works from unauthorized use.
“I am furious to learn my books have been again pirated and used without my consent to train a generative AI system which is not only unethical and illegal in its current form, but something I am vehemently opposed to,” Australian novelist Holden Sheppard told The Guardian. “No consent has been obtained from any of the thousands of authors who have had our work taken, and not a single cent has been paid to any of us. Given Meta is worth literally billions, they are absolutely in a financial position to compensate authors fairly.”
Other writers were quick to respond to The Atlantic’s article on social media, filling long threads by adding their name to the list of those whose works had been stolen.
“What matters is they should’ve ASKED, and (if yes) paid the rights holder,” bestselling novelist N.K. Jemisin posted on Bluesky. “I would’ve said no.”
Other artists, other lawsuits
The Meta-LibGen case is just one among many lawsuits aimed at AI companies alleged to have scraped copyrighted content without permission. The New York Times won a legal victory when a federal judge allowed its lawsuit against OpenAI to move forward. The paper accuses OpenAI of using its journalism to train ChatGPT without authorization or payment, undermining readership and revenue. OpenAI maintains it acted within the bounds of “fair use,” but the court’s eventual ruling could redefine how news content can legally be used in AI models.
Another major case moved forward in August 2024 when a federal judge allowed key claims to proceed in a lawsuit brought by a group of artists against Stability AI and Midjourney. The lawsuit alleges that the companies used billions of copyrighted images, without permission, to train their generative AI systems. Notably, the judge allowed claims of direct copyright infringement and violations of the artists’ trademark rights to continue, signaling that courts are increasingly willing to scrutinize how AI companies source their training data.
Broader implications for AI and copyright law
This case underscores the broader tension between AI development and copyright law. AI models require vast amounts of data for training, and the use of copyrighted material without authorization raises significant legal and ethical questions. Meta and other tech firms have defended their actions by invoking the “fair use” doctrine, arguing that using publicly available data for statistical modeling of language is permissible. This defense is increasingly being challenged in court.
Industry and political reactions
The revelations have also drawn reactions from the broader creative community and political figures. Bestselling author Richard Osman urged writers to take action against Meta, emphasizing the need to protect authors’ rights in the face of technological advancements. Additionally, the involvement of politicians’ works in the pirated datasets has led to embarrassment and calls for policy reforms to better safeguard intellectual property in the digital age.
The unfolding situation involving Meta’s alleged use of pirated content from LibGen to train its AI models highlights the complex challenges at the intersection of technology, law, and creative rights. As lawsuits multiply and creators demand greater transparency, the courts will play a pivotal role in determining how intellectual property is treated in the AI era — and whether innovation can truly coexist with the rights of those who create.
Read our guide to navigating the ethical challenges of generative AI to learn more.