Condé Nast and several other media companies sued the AI startup Cohere today, alleging that it engaged in “systematic copyright and trademark infringement” by using news articles to train its large language model.
“Without permission or compensation, Cohere uses scraped copies of our articles, through training, real-time use, and in outputs, to power its artificial intelligence (‘AI’) service, which in turn competes with Publisher offerings and the emerging market for AI licensing,” said the lawsuit filed in US District Court for the Southern District of New York. “Not content with just stealing our works, Cohere also blatantly manufactures fake pieces and attributes them to us, misleading the public and tarnishing our brands.”
Condé Nast, which owns Ars Technica and other publications such as Wired and The New Yorker, was joined in the lawsuit by The Atlantic, Forbes, The Guardian, Insider, the Los Angeles Times, McClatchy, Newsday, The Plain Dealer, Politico, The Republican, the Toronto Star, and Vox Media.
The complaint seeks statutory damages of up to $150,000 under the Copyright Act for each infringed work, or an amount based on actual damages and Cohere’s profits. It also seeks “actual damages, Cohere’s profits, and statutory damages up to the maximum provided by law” for infringement of trademarks and “false designations of origin.”
In Exhibit A, the plaintiffs identified over 4,000 articles in what they called an “illustrative and non-exhaustive list of works that Cohere has infringed.” Additional exhibits provide responses to queries and “hallucinations” that the publishers say infringe upon their copyrights and trademarks. The lawsuit said Cohere “passes off its own hallucinated articles as articles from Publishers.”
Cohere defends copyright controls
In a statement provided to Ars, Cohere called the lawsuit frivolous. “Cohere strongly stands by its practices for responsibly training its enterprise AI,” the company said today. “We have long prioritized controls that mitigate the risk of IP infringement and respect the rights of holders. We would have welcomed a conversation about their specific concerns—and the opportunity to explain our enterprise-focused approach—rather than learning about them in a filing. We believe this lawsuit is misguided and frivolous, and expect this matter to be resolved in our favor.”
We asked Cohere for information on its IP controls and will update this article if it responds.
The plaintiffs are part of the News/Media Alliance, which issued a press release about the complaint.
“This suit alleges that Cohere, an AI company valued at over $5 billion, engaged in widespread unauthorized use of publisher content in developing and running its generative AI systems,” the press release said. “Cohere’s behavior amounts to massive, systematic copyright infringement, as well as trademark infringement. The complaint provides a non-exhaustive list of thousands of articles that Cohere has infringed, through training, real-time use of content, and infringing outputs. Plaintiffs seek a permanent injunction and damages for Cohere’s extensive and willful infringement.”
The lawsuit asks for an order requiring Cohere to destroy all infringing copies of the publishers’ copyrighted works. It also demands that Cohere install a filter or other technology to prevent its system “from retrieving or copying Publishers copyrighted works, whether from Publishers’ websites or other locations.”
Cohere offers AI products for businesses, including those in financial services, health care and life sciences, manufacturing, energy and utilities, and the public sector. The company says its investors include Salesforce, Oracle, Nvidia, SAP, Fujitsu, and AMD. Its customers include Notion and Oracle.
No “ordinary AI”
Cohere, which is based in Toronto, pitches itself as a business-friendly AI, with a recent advertisement stating that it is not just an “ordinary AI.” The ad says that unlike Cohere’s product, ordinary AI leaks customer data and trade secrets, creates security audit nightmares, and steals intellectual property.
In February 2024, Cohere announced that it would provide legal protection against intellectual property claims to its paying enterprise customers. This includes “full indemnification for any third party claims that the outputs generated by our models infringe on a third party’s intellectual property rights,” for Cohere “enterprise customers that comply with our guidelines and do not intentionally attempt to generate infringing content.”
Condé Nast and other news publishers involved in the lawsuit have licensed their content to other AI companies, such as OpenAI. But OpenAI also stands accused of using news articles without permission in a lawsuit filed by The New York Times. The case is proceeding through discovery.
Condé Nast CEO Roger Lynch said in an email to staff that the news groups’ lawsuit against Cohere “is a first for our industry, coming together to protect our rights and assert that creative and journalistic work cannot be taken without permission or fair compensation.”
Vox Media President Pam Wasserstein said the lawsuit aims to create a legal precedent and “establish the terms of the playing field for licensed use of journalism for AI, including for training and also real-time uses,” according to The Wall Street Journal.
Earlier this week, a federal judge in Delaware handed a victory to Thomson Reuters in a lawsuit regarding a legal-research search engine that uses artificial intelligence. US Circuit Judge Stephanos Bibas rejected the fair use claims made by defendant Ross Intelligence, which was sued over the use of Westlaw headnotes that summarize key points of law and case holdings.
A “fabricated story”
Pointing to the “ordinary AI” ad, the news organizations’ lawsuit said that “rather than reconcile those concepts and act lawfully, Cohere fails to license the content it uses.” The AI company “helps itself to unlicensed copies of Publishers’ news and magazine articles to build a training dataset,” and “further infringes Publishers’ copyrights by providing copies of Publishers’ articles,” the lawsuit claimed.
“Cohere delivers verbatim texts of Publishers’ copyrighted articles even when asked generally for information about a particular topic rather than a specific piece,” the lawsuit said.
In other cases, Cohere provides summaries that “heavily paraphrase” the source articles and include “enough details to substitute for the original piece,” the lawsuit said. These aren’t always accurate and can result in a “fabricated story… with a fake source, title, and date,” the lawsuit said.
The lawsuit described an example:
For example, The Guardian published an article on October 7, 2024 titled “‘The pain will never leave’: Nova massacre survivors return to site one year on.'” As shown below, when prompted for this piece with RAG [Retrieval-Augmented Generation] turned off, Cohere delivered a wildly inaccurate article that it represented was “published on June 29 2022 in The Guardian by Luke Harding.” Among other flaws, the Cohere article confused the October 7, 2023 massacre at The Nova Music Festival with a mass shooting that took place in Nova Scotia, Canada in 2020. Cohere also manufactured details about the Nova Scotia tragedy, attributing several quotes—including those gathered in The Guardian’s reporting—to Tom Bagley, a man who was murdered in the 2020 shootings and thus could neither “return[] to the scene of the killings” nor offer quotes to a news outlet. Needless to say, this fictional article never appeared in The Guardian.
The lawsuit alleges that Cohere “disregards” robots.txt files that instruct bots not to crawl news websites and that “Cohere has an obligation not to use Publishers’ copyrighted content without authorization regardless of whether Publishers have taken affirmative steps to block Cohere’s crawlers.”