openai – Techdirt (original) (raw)
Judge: Just Because AI Trains On Your Publication, Doesn’t Mean It Infringes On Your Copyright
from the that's-not-how-any-of-this-works dept
I get that a lot of people don’t like the big AI companies and how they scrape the web. But these copyright lawsuits being filed against them are absolute garbage. And you want that to be the case, because if it goes the other way, it will do real damage to the open web by further entrenching the largest companies. If you don’t like the AI companies find another path, because copyright is not the answer.
So far, we’ve seen that these cases aren’t doing all that well, though many are still ongoing.
Last week, a judge tossed out one of the early ones against OpenAI, brought by Raw Story and Alternet.
Part of the problem is that these lawsuits assume, incorrectly, that these AI services really are, as some people falsely call them, “plagiarism machines.” The assumption is that they’re just copying everything and then handing out snippets of it.
But that’s not how it works. It is much more akin to reading all these works and then being able to make suggestions based on an understanding of how similar things kinda look, though from memory, not from having access to the originals.
Some of this case focused on whether or not OpenAI removed copyright management information (CMI) from the works that they were being trained on. This always felt like an extreme long shot, and the court finds Raw Story’s arguments wholly unconvincing in part because they don’t show any work that OpenAI distributed without their copyright management info.
For one thing, Plaintiffs are wrong that Section 1202 “grant[ s] the copyright owner the sole prerogative to decide how future iterations of the work may differ from the version the owner published.” Other provisions of the Copyright Act afford such protections, see 17 U.S.C. § 106, but not Section 1202. Section 1202 protects copyright owners from specified interferences with the integrity of a work’s CMI. In other words, Defendants may, absent permission, reproduce or even create derivatives of Plaintiffs’ works-without incurring liability under Section 1202-as long as Defendants keep Plaintiffs’ CMI intact. Indeed, the legislative history of the DMCA indicates that the Act’s purpose was not to guard against property-based injury. Rather, it was to “ensure the integrity of the electronic marketplace by preventing fraud and misinformation,” and to bring the United States into compliance with its obligations to do so under the World Intellectual Property Organization (WIPO) Copyright Treaty, art. 12(1) (“Obligations concerning Rights Management Information”) and WIPO Performances and Phonograms Treaty….
Moreover, I am not convinced that the mere removal of identifying information from a copyrighted work-absent dissemination-has any historical or common-law analogue.
Then there’s the bigger point, which is that the judge, Colleen McMahon, has a better understanding of how ChatGPT works than the plaintiffs and notes that just because ChatGPT was trained on pretty much the entire internet, that doesn’t mean it’s going to infringe on Raw Story’s copyright:
Plaintiffs allege that ChatGPT has been trained on “a scrape of most of the internet,” Compl. , 29, which includes massive amounts of information from innumerable sources on almost any given subject. Plaintiffs have nowhere alleged that the information in their articles is copyrighted, nor could they do so. When a user inputs a question into ChatGPT, ChatGPT synthesizes the relevant information in its repository into an answer. Given the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote.
Finally, the judge basically says, “Look, I get it, you’re upset that ChatGPT read your stuff, but you don’t have an actual legal claim here.”
Let us be clear about what is really at stake here. The alleged injury for which Plaintiffs truly seek redress is not the exclusion of CMI from Defendants’ training sets, but rather Defendants’ use of Plaintiffs’ articles to develop ChatGPT without compensation to Plaintiffs. See Compl. ~ 57 (“The OpenAI Defendants have acknowledged that use of copyright-protected works to train ChatGPT requires a license to that content, and in some instances, have entered licensing agreements with large copyright owners … They are also in licensing talks with other copyright owners in the news industry, but have offered no compensation to Plaintiffs.”). Whether or not that type of injury satisfies the injury-in-fact requirement, it is not the type of harm that has been “elevated” by Section 1202(b )(i) of the DMCA. See Spokeo, 578 U.S. at 341 (Congress may “elevate to the status of legally cognizable injuries, de facto injuries that were previously inadequate in law.”). Whether there is another statute or legal theory that does elevate this type of harm remains to be seen. But that question is not before the Court today.
While the judge dismisses the case with prejudice and says they can try again, it would appear that she is skeptical they could do so with any reasonable chance of success:
In the event of dismissal Plaintiffs seek leave to file an amended complaint. I cannot ascertain whether amendment would be futile without seeing a proposed amended pleading. I am skeptical about Plaintiffs’ ability to allege a cognizable injury but, at least as to injunctive relief, I am prepared to consider an amended pleading.
I totally get why publishers are annoyed and why they keep suing. But copyright is the wrong tool for the job. Hopefully, more courts will make this clear and we can get past all of these lawsuits.
Filed Under: ai, cmi, copyright, dmca, generative ai, reading
Companies: alternet, openai, raw story
Elon Rage Quits His Silly OpenAI Lawsuit
from the can't-lose-a-lawsuit-that-no-longer-exists dept
Maybe the real artificial intelligence was the baseless lawsuits we filed along the way.
In March, we wrote about Elon’s patently ridiculous lawsuit filed against OpenAI, claiming a contract violation of a contract that didn’t actually exist. The whole thing was silly. Elon was mad about the ways in which OpenAI had changed since the time he had helped co-create it (and provide it with much of its initial funding). And, no matter how much people might agree that OpenAI has changed a lot in terms of its focus and mission, the fundamental problem was that no actual contract existed between Elon and OpenAI.
Instead, what Elon laid out were some emails between himself, Sam Altman, and Greg Brockman, none of which amounted to a contract. He also pointed to the Certificate of Incorporation… which is also not a contract. And certainly not a document of which he was a party to.
Today, the court was set to hold a hearing on OpenAI’s motion to dismiss. Yesterday, Elon pulled a “you can’t fire me, I quit!” move by just dropping the lawsuit. Most of the news coverage of this did not (of course) provide the actual dismissal document, so we’ve linked to it here in this paragraph and embedded it below.
There’s not much to it, other than that it’s very clearly Musk dropping the lawsuit, not the two parties agreeing to end the case through some sort of settlement (in which case it would have been mutually filed and would have requested dismissal with prejudice to foreclose a follow-up lawsuit).
Like so many of Musk’s lawsuits, this one really appears to have been entirely for show and to rile up his sycophantic fanboys. It’s the same reason he threw a very silly temper tantrum earlier this week about Apple’s partnership with OpenAI.
Given that Musk is building a competitor to OpenAI, called xAI, it seems pretty transparently obvious that all of this is for show, and to rage petulantly at the competition. He knows he has no legitimate legal claim. He knows that Apple isn’t somehow doing anything particularly nefarious in its deal with OpenAI.
But he thrives on unthinking, gullible people assuming, falsely, that Musk alone is out there fighting the good fight against tech that he doesn’t like. Musk needs to be the centerpiece of any story about tech these days, and when that’s not true, he finds a way to insert himself into it.
In this case, he was on the verge of losing this case in a very embarrassing fashion, given the lack of anything even resembling a contract to breach in a breach of contract case. Rather than go through that humiliation, he just decided to cut bait and run.
Filed Under: ai, breach of contract, contract, elon musk
Companies: openai, twitter, x, xai
Hey Journalists: Not Every Elon Musk Brain Fart Warrants An Entire News Cycle
from the sound-and-fury,-signifying-nothing dept
Tue, Jun 11th 2024 11:59am - Karl Bode
So on Monday you probably saw that [Apple announced](http://Apple Intelligence) it was more tightly integrating “AI” into its mobile operating system, both via a suite of AI-powered tools dubbed Apple Intelligence, and tighter AI integration with its Siri voice assistant. It’s not that big of a deal and (hopefully) reflects Apple’s more cautious approach to AI after Google told millions of customers to eat rocks and glue.
Apple was quick to point out that the processing for these features would happen on device to (hopefully) protect privacy. If Apple’s own systems can’t handle user inquiries, some of them may be offloaded to OpenAI’s ChatGPT, attempting to put a little distance between Apple and potential error-prone fabulism:
“Apple struck a deal with OpenAI, the maker of ChatGPT, to support some of its A.I. capabilities. Requests that its system can’t field will be directed to ChatGPT. For example, a user could say that they have salmon, lemon and tomatoes and want help planning dinner with those ingredients. Users would have to choose to direct those requests to ChatGPT, ensuring that they know that the chatbot — not Apple — is responsible if the answers are unsatisfying.”
Enter Elon Musk, who threw a petulant hissy fit after he realized that Apple had decided to partner with OpenAI instead of his half-cooked and more racist Grok pseudo-intelligence system. He took to ExTwitter to (falsely) claim Apple OS with ChatGPT integration posed such a dire privacy threat, iPhones would soon be banned from his companies and visitors would have to leave theirs in a copper-lined faraday cage:
This is, of course, a bunch of meaningless gibberish not actually based on anything technical. Musk just made up some security concerns to malign a competitor. The ban of iPhones will likely never happen. And to Luddites, his reference to a faraday cage certainly sounds smart.
Here’s the thing: nearly every app on your phone and every device in your home is tracking your every movement, choice, and behavior in granular detail, then selling that information to an international cabal of largely unregulated and extremely dodgy data brokers. Brokers that then turn around and sell that information to any nitwit with two nickels to rub together, including foreign intelligence.
So kind of like the TikTok hysteria, the idea that Apple’s new partnership with OpenAI poses some unique security and privacy threat above and beyond our existing total lack of any meaningful privacy whatsoever in a country too corrupt to pass an internet privacy law is pure performance.
Keep in mind that Musk’s companies have a pretty well established track record of playing extremely fast and loose with consumer privacy themselves. Automakers are generally some of the worst companies in tech when it comes to privacy and security, and according to Mozilla, Tesla is the worst of the worst. So the idea that Musk was engaging in any sort of good faith contemplation of privacy is simply false.
Still, it didn’t take long before the click-hunting press turned Musk’s meaningless comments into an entire news cycle. Resources that could have been spent on any number of meaningful stories were instead focused on platforming a throwaway comment by a fabulist that literally didn’t mean anything:
I’m particularly impressed with the Forbes headline, which pushes two falsehoods in one headline: that the nonexistent ban hurt Apple stock (it didn’t), while implying the ban already happened.
I’m unfortunately contributing to the news cycle noise to make a different point: this happens with every single Musk brain fart now, regardless of whether the comment has any meaning or importance. And it needs to stop if we’re to preserve what’s left of our collective sanity.
Journalists are quick to insist that it’s their noble responsibility to cover the comments of important people. But journalism is about informing and educating the public, which isn’t accomplished by redirecting limited journalistic resources to cover platform bullshit that means nothing and will result in nothing meaningful. All you’ve done is made a little money wasting people’s time.
U.S. newsrooms are so broadly conditioned to chase superficial SEO clickbait ad engagement waves they’ve tricked themselves into thinking these kinds of hollow news cycles serve an actual function. But it might be beneficial for the industry to do some deep introspection into the harmful symbiosis it has forged with terrible people and bullshit (see: any of a million recent profiles of white supremacists).
There are a million amazing scientific developments or acts of fatal corporate malfeasance that every single day go uncovered or under-covered in this country because we’ve hollowed out journalism and replaced it with lazy engagement infotainment.
And despite Musk’s supposed disdain for the press, his circus sideshow has always heavily relied on this media dysfunction. As his stock-fluffing house of cards starts to unravel, he’s had to increasingly rely on gibberish and controversy to distract, and U.S. journalism continues to lend a willing hand.
First it spent fifteen years hyping up Musk’s super-genius engineering mythology, despite mounting evidence that Musk was more of a clever credit-absconding opportunist than any sort of revolutionary thinker. After 20 years of this, the press still treats every belch the man has as worthy of the deepest analysis under the pretense they’re engaging in some sort of heady public service.
The public interest is often served by not covering the fever dreams of obnoxious opportunists, but every part of the media ecosystem is financially incentivized to do the exact opposite. And instead of any sort of introspection into the symbiosis the media has formed with absolute bullshit, we’re using badly crafted automation to supercharge all of the sector’s worst impulses at unprecedented new scale.
Filed Under: ai, artificial intelligence, chatgpt, clickbait, elon musk, hype, language leaning models, seo, siri
Companies: apple, openai, tesla, twitter, x
Ctrl-Alt-Speech: Between A Rock And A Hard Policy
from the ctrl-alt-speech dept
Ctrl-Alt-Speech is a weekly podcast about the latest news in online speech, from Mike Masnick and Everything in Moderation‘s Ben Whitelaw.
Subscribe now on Apple Podcasts, Overcast, Spotify, Pocket Casts, YouTube, or your podcast app of choice — or go straight to the RSS feed.
In this week’s round-up of the latest news in online speech, content moderation and internet regulation, Mike and Ben cover:
- Stack Overflow bans users en masse for rebelling against OpenAI partnership (Tom’s Hardware)
- Tech firms must tame toxic algorithms to protect children online (Ofcom)
- Reddit Lays Out Content Policy While Seeking More Licensing Deals (Bloomberg)
- Extremist Militias Are Coordinating in More Than 100 Facebook Groups (Wired)
- Politicians Scapegoat Social Media While Ignoring Real Solutions (Techdirt)
- ‘Facebook Tries to Combat Russian Disinformation in Ukraine’ – FB Public Policy Manager (Kyiv Post)
- TikTok Sues U.S. Government Over Law Forcing Sale or Ban (New York Times)
- Swiss public broadcasters withdraw from X/Twitter (Swissinfo)
- Congressional Committee Threatens To Investigate Any Company Helping TikTok Defend Its Rights (Techdirt)
This episode is brought to you with financial support from the Future of Online Trust & Safety Fund.
Filed Under: artificial intelligence, chatgpt, content moderation, disinformation, extremism, russia, ukraine
Companies: facebook, meta, openai, reddit, stack overflow, tiktok, twitter
Can ChatGPT Violate Your Privacy Rights If It Doesn’t Store Your Data?
from the this-makes-no-sense dept
If you were to ask someone to state the birthday of someone else, and the person asked just made up a date, which was not the actual birthday, would you argue that the individual’s privacy had been violated? Would you argue that there should be a legal right to demand that the person explain how they came up with the made-up date and to permanently “store” the proper birth date in their mind?
Or would you simply laugh it off as utter nonsense?
I respect the folks at noyb, the European privacy activists who keep filing privacy complaints that often have significant consequences. noyb and its founder, Max Schrems, have pretty much single-handedly continued to rip up US/EU privacy agreements by highlighting that NSA surveillance simply cannot comply with EU data privacy protections.
That said, noyb often seems to take things a bit too far, and I think its latest complaint against OpenAI is one of those cases.
In the EU, the GDPR requires that information about individuals is accurate and that they have full access to the information stored, as well as information about the source. Surprisingly, however, OpenAI openly admits that it is unable to correct incorrect information on ChatGPT. Furthermore, the company cannot say where the data comes from or what data ChatGPT stores about individual people. The company is well aware of this problem, but doesn’t seem to care. Instead, OpenAI simply argues that “factual accuracy in large language models remains an area of active research”. Therefore, noyb today filed a complaint against OpenAI with the Austrian DPA.
I have to admit, sometimes I kinda wonder if noyb is really a kind of tech policy performance art, trying to make a mockery of the GDPR. Because that’s about the only way this complaint makes sense.
The assumptions underlying the complaint are that ChatGPT is something that it is not, that it does something that it does not do, and that this somehow implicates rights that are not implicated at all.
Again, Generative AI chat tools like ChatGPT are making up content based on what they’ve learned over time. It is not storing and collecting such data. It is not retrieving data that it has stored. Many people seem to think that ChatGPT is somehow the front end for a database, or the equivalent of a search engine.
It is not.
It is a digital guessing machine, trained on tons of written works. So, when you prompt it, it is probabilistically guessing at what it can say to respond in a reasonable, understandable manner. It’s predictive text on steroids. But it’s not grabbing data from a database. This is why it does silly things like make up legal cases that don’t exist. It’s not because it has bad data in its database. It’s because it’s making stuff up as it goes based on what “sounds” right.
And, yes, there are some cases where it seems closer to storing data, in that the nature of the training and the probabilistic engine is that it effectively has a very lossy compression algorithm that allows it to sometimes recreate data that closely approximates the original, but that’s still not the same thing as storing data in a database, and in the example used by noyb — a random person’s birthday — that’s simply not the kind of data that is at issue here.
Yet, noyb’s complaint is that ChatGPT can’t tell you what data it has on people (because it doesn’t “have data” on people) and that it can’t correct mistakes (because there’s nothing to “correct” since it’s not pulling what it writes from a database that can be corrected).
The complaint is kind of like saying that if you ask a friend of yours about someone else, and they repeat some false information, arguing that that friend is required under the GDPR to explain why they said what they said and to “correct” what is wrong.
But noyb insists this is true for ChatGPT.
Simply making up data about individuals is not an option. This is very much a structural problem. According to a recent New York Times report, “chatbots invent information at least 3 percent of the time – and as high as 27 percent”. To illustrate this issue, we can take a look at the complainant (a public figure) in our case against OpenAI. When asked about his birthday, ChatGPT repeatedly provided incorrect information instead of telling users that it doesn’t have the necessary data.
If this is actually a violation of the GDPR, noyb’s real complaint is with the GDPR, not with ChatGPT. Again, this only makes sense for an app that is storing and retrieving data.
But that’s not what’s happening. ChatGPT is probabilistically guessing at what to respond with.
No GDPR rights for individuals captured by ChatGPT? Despite the fact that the complainant’s date of birth provided by ChatGPT is incorrect, OpenAI refused his request to rectify or erase the data, arguing that it wasn’t possible to correct data.
There is no data to correct. This is just functionally wrong. It’s like filing a complaint against an orange for not being an apple. It’s just a fundamentally different kind of service.
Now, there are some attempts at generative AI tools that do store data. The hot topic in the generative AI world these days is RAGs, “retrieval augmented generation,” in which an AI is also “retrieving” data from some sort of database. noyb’s complaint would make more sense if it found a RAG that was returning false information. In such a scenario, the complaint would fit.
But when we’re talking about a regular old generative AI model without retrieval capabilities, it makes no sense at all.
If noyb honestly thinks that what ChatGPT is doing is violating the GDPR, then there are only two possibilities: (1) noyb has no idea what it’s talking about here or (2) the GDPR is even more silly than we’ve argued in the past, and all noyb is doing is trolling to make that clear by filing a laughably silly complaint that exposes how poorly fit the GDPR is to the technology in our lives today.
Filed Under: chatgpt, data, data protection, eu, gdpr, generative ai, hallucinations
Companies: noyb, openai
An Only Slightly Modest Proposal: If AI Companies Want More Content, They Should Fund Reporters, And Lots Of Them
from the so-stupid-it-could-work? dept
In Jonathan Swift’s “A Modest Proposal,” he satirized politicians who were out of touch and were treating the poor as an inconvenience, rather than a sign of human suffering and misery. So, he took what seemed like two big problems, according to those politicians, and came up with an obviously barbaric solution to solve both problems: by letting the poor sell their kids as food. This really only was designed to highlight the barbaric framing of the “problem” by the Irish elite.
But, sometimes, there really are scenarios where there are two very real problems (not of a Swiftian nature) that might actually be in a position to be combined such that both problems are actually solved. And thus I present a non-Swiftian modest proposal: that AI companies desperate for high quality content should create funds to pay for journalists to create high quality content that the AI companies can use for training.
Lately, there have been multiple news articles about how desperate the AI companies are for fresh data to feed the voracious and insatiable training machine. The Wall Street Journal noted that “the internet is too small” for AI companies.
Companies racing to develop more powerful artificial intelligence are rapidly nearing a new problem: The internet might be too small for their plans.
Ever more powerful systems developed by OpenAI, Google and others require larger oceans of information to learn from. That demand is straining the available pool of quality public data online at the same time that some data owners are blocking access to AI companies.
Some executives and researchers say the industry’s need for high-quality text data could outstrip supply within two years, potentially slowing AI’s development.
The problem is not just data, but high-quality data, as that report notes. You need the AI systems trained on well-written, useful content:
Most of the data available online is useless for AI training because it contains flaws such as sentence fragments or doesn’t add to a model’s knowledge. Villalobos estimated that only a sliver of the internet is useful for such training—perhaps just one-tenth of the information gathered by the nonprofit Common Crawl, whose web archive is widely used by AI developers.
The NY Times also published a similar-ish story, though it framed it in a much more nefarious light. It argued that the AI companies were “cutting corners to harvest data for AI” systems. However, what the Times actually means is that AI companies believe (correctly, in my opinion) that they have a very strong fair use argument for training on whatever data they can find.
At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying the publishing house Simon & Schuster to procure long works, according to recordings of internal meetings obtained by The Times. They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.
I’ve discussed the copyright arguments repeatedly, including why I think the AI companies are correct that training on copyright-covered works shouldn’t be infringing. I also think the rush to rely on copyright as a solution here is problematic. Doing so would only enrich big tech, since smaller companies and open source systems wouldn’t be able to keep up. Also, requiring all training to be licensed would effectively break the open internet, by creating a new “license to read.” This would be bad.
But, all of this is coming at the same time that journalism is in peril. We’re hearing stories of news orgs laying off tons of journalists. Or publications shutting down entirely. There are stories of “news deserts” and how corruption is increasing as news orgs continue to fail.
The proposed solutions to this very real problem have been very, very bad. Link taxes are even more destructive to the open web and don’t actually appear to work very well.
But… that doesn’t mean there isn’t a better solution. If the tech companies need good, well-written content to fill their training systems, and the world needs good, high-quality journalism, why don’t the big AI companies agree to start funding journalists and solve both problems in one move?
This may sound similar to the demands of licensing works, but I’m not talking about past works. Those works are out there. I’m talking about paying for the creation of future works. It’s not about licensing or copyright. It’s about paying for the creation of new, high-quality journalism. And then letting those works exist freely on the internet for everyone.
It was already mentioned above that Meta considered buying a book publisher. Why not news publishers as well? But ownership of the journalists shouldn’t even be the focus, as it could raise some other challenges. Instead, they can just set up a fund where anyone can apply. There can be a pretty clear set of benefits to all parties.
Journalists who join the programs (and they should be allowed to join multiple programs from multiple companies) agree to publish new, well-written articles on a regular basis, in exchange for some level of financial support. It should be abundantly clear that the AI companies have no say over the type of journalism being done, nor do they have any say in editorial beyond the ability to review the quality of the writing to make sure it’s actually useful in training new systems.
The journalists only need to promise that anything they publish that receives funding from this program is made available to the training systems of the companies doing the funding.
In exchange, beyond just some funding, the AI companies could make a variety of AI tools available to the journalists as well, to help them improve the quality of their writing (I have a story coming up soon about how I’ve been using AI as a supplemental editor, but never to write any content).
This really feels like something that could solve at least some of the problems at both ends of this market. There are some potential limits here, of course. The AI companies need so much new content that it’s unclear if this would create enough to matter. But it would create something. And it could be lots of somethings. And not only that, but it should be pretty damn up-to-date somethings (which can be useful).
There could be reasonable concerns about conflicts of interest, but as it stands today, most journalism is funded by rich billionaires already. I don’t see how this is any worse. And, as suggested, it could be structured such that the journalists aren’t employees, and it could (should?) have explicit promises about a lack of editorial control or interference.
The AI companies might also claim that it’s too expensive to create a large enough pool, but if they’re so desperate for good, high-quality content, to the point of potentially buying up famous publishers, then, um, it seems clear that they are willing to spend, and it’s worth it to them.
It’s not a perfect solution, but it sure seems like one that solves two big problems in one shot, without fucking up the open web or relying on copyright as a crutch. Instead, it funds the future production of high-quality journalism in a manner that is helpful both for the public at large and the AI companies that could contribute to the funding. It also doesn’t require any big new government law. The companies can just… see the benefit themselves and set up the program.
The public gets a lot more high-quality journalism, and journalists get sustainable revenue sources to continue to do good reporting. It’s not quite a Swiftian modest proposal, in that… it actually could make sense.
Filed Under: a modest proposal, ai, copyright, generative ai, journalism, link taxes, llms, training, training data
Companies: google, meta, openai
OpenAI’s Motion To Dismiss Highlights Just How Weak NYT’s Copyright Case Truly Is
from the not-living-up-to-the-times'-own-journalistic-standards dept
A few weeks ago, Prof. James Grimmelmann and (former Techdirt) journalist Tim Lee wrote a piece for Ars Technica, stating why the NY Times might win its copyright lawsuit against OpenAI. It’s no secret that I’m skeptical of the underpinnings of the lawsuit and think the NY Times is being silly in filing it, but I don’t think there’s any question that the NY Times could win. Copyright law (as both Grimmelmann and Lee well know) ‘tis a silly place, where judges will justify just about anything if they feel one party has been “wronged” no matter what the law might say. The Supreme Court’s ruling in the Aereo case should always be a reminder of that. Sometimes copyright cases are decided on vibes and not the law.
The main crux of the argument for why the NY Times could win is that the NYT showed how they got OpenAI to regurgitate very similar versions of stories, as lots of people commented on regarding the lawsuit. However, as we noted in our analysis, they only did so by effectively limiting the potential output to such a narrow range of possibilities, that a very near copy was about the only possible answer. Basically, the system is trained on lots and lots of input training data, but if you systematically use your prompt to basically say “give me exactly this, and exclude every other possibility” eventually an LLM may return something kinda like what you asked for.
This is why it seems that, if there is any infringement (or other legal violation), the liability should fall almost entirely on the prompter. They’re the ones using the tool in such a manner to produce potentially violative works. We don’t blame the car company because a driver drove a car recklessly and caused damage. We blame the driver.
Either way, we now have OpenAI’s motion to dismiss in the case. While I’ve seen lots of people saying that OpenAI is claiming the NY Times “hacked” their system and finding such an allegation laughable, the reality is (as usual) more nuanced and important to understand. The NY Times definitely had to do a bunch of gaming to get the outputs it wanted for the lawsuit, which undermines the critical claim that OpenAI’s tools magically undermine the value of a NY Times’s subscription.
As OpenAI points out, the claims in the NY Times’ complaint would not live up to the Times’ well-known journalistic standards, given just how misleading the complaint was:
The allegations in the Times’s Complaint do not meet its famously rigorous journalistic standards. The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI’s products. It took them tens of thousands of attempts to generate the highly anomalous results that make up Exhibit J to the Complaint. They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI’s terms of use. And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites. Normal people do not use OpenAI’s products in this way.
This is where the “hacked” headlines come from. And, frankly, claiming it’s a “hack” is a bit silly for OpenAI. The other points it’s raising are much more important. A key part of the Times’ lawsuit is claiming that because of their prompt engineering, they could reproduce similar (though not exact) language to articles, which would allow users to bypass a NY Times paywall (and subscription) to just have OpenAI generate the news for them.
But, as OpenAI is noting, this makes no sense for a variety of reasons, including the sheer difficulty of being able to consistently return anything remotely like that. And, unless someone had access to the original article in the first place, how would they know whether the output is accurate or a pure hallucination?
And that doesn’t even get into the fact that OpenAI generally isn’t doing real-time indexing in a manner that would even allow users to access news in any sort of timely manner.
OpenAI makes the obvious fair use argument, rightly highlighting how much of its business (and the wider AI) space has been built in the belief that reading/scanning of content that is publicly available is obviously fair use, and that to change that would massively upend a whole industry. It even makes a nod to the point that I raised in my initial article about the lawsuit: the NY Times itself relies regularly on the kind of fair use it now claims doesn’t exist.
Indeed, it has long been clear that the non-consumptive use of copyrighted material (like large language model training) is protected by fair use—a doctrine as important to the Times itself as it is to the American technology industry. Since Congress codified that doctrine in 1976, see H.R. Rep. No. 94-1476, at 65–66 (1976) (courts should “adapt” defense to “rapid technological change”), courts have used it to protect useful innovations like home video recording, internet search, book search tools, reuse of software APIs, and many others.
These precedents reflect the foundational principle that copyright law exists to control the dissemination of works in the marketplace—not to grant authors “absolute control” over all uses of their works. Google Books, 804 F.3d at 212. Copyright is not a veto right over transformative technologies that leverage existing works internally—i.e., without disseminating them—to new and useful ends, thereby furthering copyright’s basic purpose without undercutting authors’ ability to sell their works in the marketplace. See supra note 23. And it is the “basic purpose” of fair use to “keep [the] copyright monopoly within [these] lawful bounds.” Oracle, 141 S. Ct. at 1198. OpenAI and scores of other developers invested billions of dollars, and the efforts of some of the world’s most capable minds, based on these clear and longstanding principles
It makes that point even more strongly a bit later:
To support its narrative, the Times claims OpenAI’s tools can “closely summarize[]” the facts it reports in its pages and “mimic[] its expressive style.” Compl. ¶ 4. But the law does not prohibit reusing facts or styles. If it did, the Times would owe countless billions to other journalists who “invest[] [] enormous amount[s] of time, money, expertise, and talent” in reporting stories, Compl. ¶ 32, only to have the Times summarize them in its pages
The motion also highlights the kinds of games the Times had to play just to get the output it used for the complaint in the now infamous Exhibit J, including potentially including things in the prompt like “in the style of a NY Times journalist.” Again, this kind of prompt engineering is basically using the system to systematically limit the potential output in an effort to craft output that the user could claim is infringing. GPT doesn’t just randomly spit out these things.
OpenAI highlights how many of the claimed “infringements” fall outside the three-year statute of limitations. As for the contributory infringement claims, they are equally as ridiculous because to do that, you have to show that the defendant knew of users making use of the platform to infringe and somehow encouraged that behavior.
Here, the only allegation supporting the Times’s contributory claim states that OpenAI “had reason to know of the direct infringement by end-users” because of its role in “developing,testing, and troubleshooting” its products. Compl. ¶ 180. But “generalized knowledge” of “the possibility of infringement” is not enough. Luvdarts, 710 F.3d at 1072. The Complaint does not allege OpenAI “investigated or would have had reason to investigate” the use of its platform to create copies of Times articles. Popcornflix.com, 2023 WL 571522, at *6. Nor does it suggest that OpenAI had any reason to suspect this was happening. Indeed, OpenAI’s terms expressly prohibit such uses of its services. Supra note 8. And even if OpenAI had investigated, nothing in the Complaint explains how it might evaluate whether these outputs were acts of copyright infringement or whether their creation was authorized by the copyright holder (as they were here).
The complaint had also made a bunch of DMCA 1202 claims. That’s the part of the law that dings infringers for removing copyright management info (CMI). This (kinda silly) part of the law is basically designed as a tool to go after commercial infringers who would strip or hide a copyright notice from a work in order to resell it (e.g., on a DVD sold on a street corner or something). But clearly that’s not what’s happening here. Here, the Times didn’t even say what CMI was removed.
Count V should be dismissed at the outset for failure to specify the CMI at issue. The Complaint’s relevant paragraph fails to state what CMI is included in what work, and simply repeats the statutory text. Compl. ¶ 182 (alleging “one or more forms of [CMI]” and parroting language of Section 1202(c)). The only firm allegation states that the Times placed “copyright notices” and “terms of service” links on “every page of its websites.” Compl. ¶ 125. But, at least for some articles, it did not. And when it did, the information was not “conveyed in connection with” the works, 17 U.S.C. § 1202(c) (defining CMI), but hidden in small text at the bottom of the page. Judge Orrick of the Northern District of California rejected similar allegations as deficient in another recent AI case. Andersen v. Stability AI Ltd., No. 23-cv-00201, 2023 WL 7132064, at *11 (N.D. Cal. Oct. 30, 2023) (must plead “exact type of CMI included in [each] work”).
Another key point is that the Times claims that the parts of NY Times articles that showed up as close (but usually not exact) excerpts in GPT output couldn’t be dinged for CMI removal. This is because if that was the law it would open up tons of other organizations (including the NY Times itself) that quote or excerpt works without including the CMI:
Regardless, this “output” theory fails because the outputs alleged in the Complaint are not wholesale copies of entire Times articles. They are, at best, reproductions of excerpts of those articles, some of which are little more than collections of scattered sentences. Supra 12. If the absence of CMI from such excerpts constituted a “removal” of that CMI, then DMCA liability would attach to any journalist who used a block quote in a book review without also including extensive information about the book’s publisher, terms and conditions, and original copyright notice. See supra note 22 (example of the Times including 200-word block quote in book review).
And then there’s this tidbit:
Even setting that aside, the Times’s output-based CMI claim fails for the independent reason that there was no CMI to remove from the relevant text. The Exhibit J outputs, for example, feature text from the middle of articles. Ex. J. at 2–126. As shown in the exhibit, the “Actual text from NYTimes” contains no information that could qualify as CMI. See, e.g., id. at 3; 17 U.S.C. § 1202(c) (defining CMI). So too for the ChatGPT outputs featured in the Complaint, which request the “first [and subsequent] paragraph[s]” from Times articles. See, e.g., Compl. ¶¶ 104, 106, 118, 121. None of those “paragraphs” contains any CMI that OpenAI could have “removed.”
There’s some more in there, but I find it a very strong motion. That doesn’t mean that the case will get dismissed outright (remember, copyright land ‘tis a silly place), but it sure lays out pretty clearly how silly the examples in the Times lawsuit are and how weak their claims are as soon as you hold them up to the light.
Yes, in some rare circumstances, you can reproduce content that is kinda similar (but not exact) to copyright covered info if you tweak the outputs and effectively push the model to its extremes. But… as noted, if that’s the case, any liability should still feel like it should be on the prompter, not the tool. And the NY Times can’t infringe on its own copyright.
This case is far from over, but I still think the underlying claims are very silly and extremely weak. Hopefully the court agrees.
Filed Under: cmi, copyright, dmca 1202, fair use, prompt engineering, statute of limitations
Companies: ny times, openai
Elon Sues OpenAI For Breach Of Contract Over A Contract That Doesn’t Exist, Because It’s Not Acting The Way He Wants It To
from the show-me-the-contract dept
Do you remember Irell & Manella? Of course you do. It’s the formerly well-respected law firm that once brought a lawsuit on behalf of PETA claiming (incorrectly) to represent a monkey, which it claimed (incorrectly) could hold a copyright. As we noted at the time, the whole reason that a respected firm like Irell would bring such a lawsuit was a (weak, embarrassing) attempt to position itself for the coming rise in IP lawsuits around artificial intelligence.
So it’s actually been kind of interesting to me that we haven’t seen Irell all that active in the spate of lawsuits around generative AI over the last year or so. But maybe that’s changing?
On Thursday, Elon personally sued OpenAI using Irell & Manella as his lawyers. When I saw the headlines about the case, I was pretty curious. After all, it’s well known that Elon helped conceive of and then fund OpenAI in the early years. It’s also well known that Elon grew disillusioned with OpenAI and stepped down from its board. There is some level of dispute over what preceded that disillusionment, though there are plenty of stories out there.
Over the last couple of years, Elon has been particularly angry about OpenAI’s decision to launch a for-profit (“capped profit”) entity, ostensibly controlled by the non-profit board a few years ago. However, the firing-to-rehiring of Sam Altman last year raised questions about how much control the non-profit board really had when things got serious.
Elon has repeatedly joked on ExTwitter that he doesn’t understand how the OpenAI he funded to offer “open source” AI to the world via a non-profit had turned into a closed source, for-profit company that charges for access to its AI. He’s frequently mused about how this must have violated some law somehow.
And now, I guess he’s trying to test that theory with the folks who will sue on behalf of a monkey.
Reading the complaint, I kept expecting there to be some sort of fundamental underlying breach of contract. After all, it is a breach of contract complaint (with a few other lesser claims tossed in). Maybe when OpenAI launched, there was some sort of official agreement with Elon which it’s now violating? That would have been an interesting case!
But… as is all too often the case with Elon, this case seems to be almost entirely vibes based. OpenAI was supposed to be an open non-profit, and now it’s a closed for-profit. Elon doesn’t like it and thus he thinks he can sue.
Which brings us to the biggest problem in this “breach of contract” lawsuit. There’s no contract. Elon doesn’t have a contract with OpenAI which the company could have breached. And that’s kinda a problem in a breach of contract lawsuit.
There are… discussions between Elon, Sam Altman and Greg Brockman, which the lawsuit tries to turn into a “Founding Agreement” but that “Founding Agreement” does not seem to actually exist. It appears to be just discussions that the three of them had:
Together with Mr. Brockman, the three agreed that this new lab: (a) would be a nonprofit developing AGI for the benefit of humanity, not for a for-profit company seeking to maximize shareholder profits; and (b) would be open-source, balancing only countervailing safety considerations, and would not keep its technology closed and secret for proprietary commercial reasons (The “Founding Agreement”). Reflecting the Founding Agreement, Mr. Musk named this new AI lab “OpenAI,” which would compete with, and serve as a vital counterbalance to, Google/DeepMind in the race for AGI, but would do so to benefit humanity, not the shareholders of a private, for-profit company (much less one of the largest technology companies in the world).
That’s… not an agreement. It’s not a contract. The complaint also describes the articles of incorporation, which any company must have. But… that’s also not a contract in any way. And… any corporate lawyer would know this. I guarantee that the lawyers at Irell & Manella know this. And yet…
The Founding Agreement was also memorialized, among other places, in OpenAI, Inc.’s December 8, 2015 Certificate of Incorporation, which affirmed that its “resulting technology will benefit the public and the corporation will seek to open source technology for the public benefit when applicable. The corporation is not organized for the private gain of any person.” Ex. 1 at 1. The Certificate of Incorporation further affirmed that all of the corporation’s property was “irrevocably dedicated” to these agreed purposes. Id.
As Bloomberg’s Matt Levine (a former securities lawyer turned columnist) writes about all of this, the only way to see a contract here is to handwave away the details. And, you know, in contract law, the details kinda matter?
You can sort of wave your hands at all this and say “Musk had a contract with OpenAI in which he agreed to donate money and in exchange OpenAI explicitly agreed to be an open-source nonprofit forever,” but I don’t think that’s exactly right? The email from Altman was an initial proposal, not a detailed contract setting out the permanent terms of their deal; it promised not to open-source the software forever but only to “have an ongoing conversation about what work should be open-sourced and what shouldn’t.” Money was not mentioned.
And the certificate of incorporation was not a contract between Musk and OpenAI: He didn’t sign the certificate, and he wasn’t a shareholder, because there were no shares (it’s a nonprofit). OpenAI’s fiduciary duties are not to him, as a co-founder, but to humanity. The evidence of a specific deal between Musk and OpenAI is pretty thin.
I mean, perhaps there’s an argument that the IRS should pull OpenAI’s non-profit status for violating its promises there. But those promises aren’t to Elon. He’s not the stakeholder here.
Levine rightly notes that the underlying issue that Elon is raising has some legitimacy. Lots of people were sold a bill of goods regarding OpenAI and how it would function. But it’s not clear that any of that comes anywhere close to a breach of contract. It would be nice if OpenAI were forced to open source GPT-4, as the complaint demands. I wouldn’t be upset if Elon won this case. I’m just not sure how he can.
Nilay Patel, a former lawyer turned editor/journalist, notes, the lawsuit is just bad?
Let’s just take the very first cause of action of the lawsuit, for example. It is a claim for breach of contract — a very, very simple claim that almost any first-year law student can evaluate, because step one is asking if there is a contract, and step two is figuring out what the contract says. To have a valid contract, you need an offer, acceptance, and an exchange of value — what lawyers are trained to call “consideration,” in an enduring effort to make simple concepts sound confusing and increase fees.
Most importantly, contracts need to be written down — proving that an unwritten contract exists, what its terms are, and if they are enforceable is extraordinarily difficult, and courts do not like doing it, especially for ultra-sophisticated parties with a long history of dealing.
My friends, Musk is straightforwardly alleging that OpenAI breached a contract that does not exist. It is simply not a thing! The complaint makes reference to a “Founding Agreement,” but no such Founding Agreement is attached as an exhibit, and the breach of contract claim admits that the “Founding Agreement” is basically a vibe everyone caught in some emails.
The other claims in the lawsuit don’t get much better. The promissory estoppel is maybe (?!?) the strongest claim of the bunch, but that doesn’t mean it’s strong. To have promissory estoppel, you need to show that a significant, unambiguous promise was made, and the promisee (in this case Musk) relied on that promise and suffered damages for doing so.
Maybe you could argue that there was a promise to do something, but even then the promise is vague. There was no clear and unambiguous promise. There is no real indication that Musk relied on that promise in any meaningful way. And, even if he did, it’s not clear what injury he sustained for relying on that promise.
OpenAI has a very strong legal team. I am guessing they are not particularly worried about this lawsuit. But it is another example of Elon seeming to think that the world revolves around him, and when things don’t go his way, surely he can sue to try to change things.
Filed Under: breach of contract, elon musk, for profit, greg brockman, non-profit, prommissory estoppel, sam altman
Companies: openai
As Predicted, Judge Dismisses Nearly All Of Sarah Silverman, Michael Chabon, And Other Authors’ Lawsuits Against OpenAI
from the copyright-law-has-rules dept
Can’t say we didn’t warn everyone. Last summer we pointed out that Sarah Silverman and a bunch of other authors suing AI companies for copyright infringement seemed to only demonstrate that they didn’t understand how copyright works.
And, now Judge Araceli Martinez-Olguin, has dismissed most of the claims in three related cases from authors against OpenAI, noting that their theories are just not how copyright law works. The judge does leave them open to amend the claims, but it’s difficult to see how any of the cases will survive. Open AI sought to dismiss all claims except for the direct infringement claim. In its motion to dismiss, OpenAI notes that they will seek to resolve the direct infringement question as a matter of law later in the case (i.e., they will seek summary judgment on it, likely arguing fair use).
For the rest, though, they seek to dismiss the claims outright, and mostly got exactly what they wanted. First up, there’s the pernicious “vicarious copyright infringement” claims that are frequently brought in cases, but rarely hold up. They certainly don’t hold up here:
Plaintiffs suggest that they do not need to allege a “substantial similarity” because they have evidence of “direct copying.” ECF 48 (“Response”) at 15. They argue that because Defendants directly copied the copyrighted books to train the language models, Plaintiffs need not show substantial similarity. Id. at 15 (citing Range Rd. Music, Inc. v. E. Coast Foods, Inc., 668 F.3d 1148, 1154 (9th Cir. 2012) (explaining that “substantial similarity” helps determine whether copying occurred “when an allegedly infringing work appropriates elements of an original without reproducing it in toto.”). Plaintiffs misunderstand Range Rd. There, the court did not need to find substantial similarity because the infringement was the public performance of copyrighted songs at a bar. Range Rd., 668 F.3d at 1151-52, 1154. Since the plaintiffs provided unrebutted evidence that the performed songs were the protected songs, they did not need to show that they were substantially similar. Id. at 1154. Distinctly, Plaintiffs here have not alleged that the ChatGPT outputs contain direct copies of the copyrighted books. Because they fail to allege direct copying, they must show a substantial similarity between the outputs and the copyrighted materials. See Skidmore, 952 F.3d at 1064; Corbello, 974 F.3d at 973-74.
Plaintiffs’ allegation that “every output of the OpenAI Language Models is an infringing derivative work” is insufficient. Tremblay Compl. ¶ 59; Silverman Compl. ¶ 60. Plaintiffs fail to explain what the outputs entail or allege that any particular output is substantially similar – or similar at all – to their books. Accordingly, the Court dismisses the vicarious copyright infringement claim with leave to amend.
Next up were the always weak DMCA 1202 claims about the “removal or alteration of copyright management information.” That also does not fly:
Even if Plaintiffs provided facts showing Defendants’ knowing removal of CMI from the books during the training process, Plaintiffs have not shown how omitting CMI in the copies used in the training set gave Defendants reasonable grounds to know that ChatGPT’s output would induce, enable, facilitate, or conceal infringement. See Stevens, 899 F.3d at 673 (finding that allegations that “someone might be able to use [the copyrighted work] undetected . . . simply identifies a general possibility that exists whenever CMI is removed,” and fails to show the necessary mental state). Plaintiffs argue that OpenAI’s failure to state which internet books it uses to train ChatGPT shows that it knowingly enabled infringement, because ChatGPT users will not know if any output is infringing. Response at 21-22. However, Plaintiffs do not point to any caselaw to suggest that failure to reveal such information has any bearing on whether the alleged removal of CMI in an internal database will knowingly enable infringement. Plaintiffs have failed to state a claim under Section 12(b)(1)
Same thing with 1202(b)(3) regarding the alleged distribution of copies. That’s a problem since they don’t show any distribution of copies:
Under the plain language of the statute, liability requires distributing the original “works” or “copies of [the] works.” 17 U.S.C. § 1202(b)(3). Plaintiffs have not alleged that Defendants distributed their books or copies of their books. Instead, they have alleged that “every output from the OpenAI Language Models is an infringing derivative work” without providing any indication as to what such outputs entail – i.e., whether they are the copyrighted books or copies of the books. That is insufficient to support this cause of action under the DMCA.
Plaintiffs compare their claim to that in Doe 1, however, the plaintiffs in Doe 1 alleged that the defendants “distributed copies of [plaintiff’s licensed] code knowing that CMI had been removed or altered.” Doe 1, 2023 WL 3449131, at *11. The Doe 1 plaintiffs alleged that defendants knew that the programs “reproduced training data,” such as the licensed code, as output. Id. Plaintiffs here have not alleged that ChatGPT reproduces Plaintiffs copyrighted works without CMI.
Then there are the unfair competition claims. Here, one part of the claim remains standing, but the rest are dismissed. As the court notes, for there to be unlawful competition, they need to show an act is “unlawful, unfair, or fraudulent.” Here two of the three prongs fail. First up “unlawful.”
Even if Plaintiffs can bring claims under the DMCA, they must show economic injury caused by the unfair business practice. See Davis v. RiverSource Life Ins. Co., 240 F. Supp. 3d 1011, 1017 (N.D. Cal. 2017) (quoting Kwikset Corp. v. Superior Ct., 51 Cal. 4th 310, 322 (2011)). Defendants argue that Plaintiffs have not alleged that they have “lost money or property.” Motion at 29-30; see Kwikset Corp., 51 Cal. 4th at 322-23. Plaintiffs counter that they have lost intellectual property in connection with the DMCA claims because of the “risk of future damage to intellectual property that results the moment a defendant removes CMI from digital copies of Plaintiffs’ work – copies that can be reproduced and distributed online at near zero marginal cost.” Response at 28. However, nowhere in Plaintiffs’ complaint do they allege that Defendants reproduced and distributed copies of their books. Accordingly, any injury is speculative, and the unlawful prong of the UCL claim fails for this additional reason.
What about fraudulent? Nope. No good.
Plaintiffs also argue that they pleaded UCL violations based on “fraudulent” conduct. Response at 26-27. They point to a paragraph in the complaint that states that “consumers are likely to be deceived” by Defendants’ unlawful practices and that Defendants “deceptively designed ChatGPT to output without any CMI.” Tremblay Compl. ¶ 72. The allegation’s references to CMI demonstrates that Plaintiffs’ claims rest on a violation of the DMCA, and thus fail as the Court has dismissed the underlying DMCA claim. Supra Sections B, C(1). To the extent that Plaintiffs ground their claim in fraudulent business practices, Plaintiffs fail to indicate where they have pleaded allegations of fraud. Thus, they fail to satisfy the heightened pleading requirements of Rule 9(b) which apply to UCL fraud claims. See Armstrong-Harris, 2022 WL 3348246, at *2. Therefore, the UCL claim based on fraudulent conduct also fails.
The only prong that remains is “unfair,” which the court notes, California defines broadly, and thus it survives, for now. Given everything else in the opinion, though, it feels like this one prong is also ripe for dismissal at the summary judgment stage.
Then there’s “negligence.” Plaintiffs’ lawyers love to claim negligence, but it rarely stands up. You can’t just take “this thing is bad” and claim negligence. Here, the plaintiffs went to even more ridiculous levels, arguing that OpenAI had a made up “duty of care” to protect the copyrights of the authors, and the failure to do that was negligent. As the court notes, that’s not how this works:
The Complaints allege that Defendants negligently maintained and controlled information in their possession. Tremblay Compl. ¶¶ 74-75; Silverman Compl. ¶¶ 75-76. Plaintiffs argue without legal support that Defendants owed a duty to safeguard Plaintiffs’ works. Response at 30. Plaintiffs do not identify what duty exists to “maintain[] and control[]” the public information contained in Plaintiffs’ copyrighted books. The negligence claim fails on this basis.
Plaintiffs’ argument that there is a “special relationship” between the parties also fails. See Response at 30. Nowhere in the Complaints do Plaintiffs allege that there is any fiduciary or custodial relationship between the parties. Plaintiffs do not explain how merely possessing their books creates a special relationship, citing only to an inapposite case where defendants were custodians of plaintiffs’ “personal and confidential information.” Witriol v. LexisNexis Grp., No. C05-02392 MJJ, 2006 WL 4725713, at *8 (N.D. Cal. Feb. 10, 2006).
As Plaintiffs have not alleged that Defendants owed them a legal duty, the Court dismisses this claim with leave to amend.
Finally, there’s the “unjust enrichment” claim which also fails, because there’s no evidence that any benefit to OpenAI came from “mistake, fraud, coercion or request.”
Defendants argue that this claim must be dismissed because Plaintiffs fail to allege what “benefit” they quasi-contractually “conferred” on OpenAI or that Plaintiffs conferred this benefit through “mistake, fraud, or coercion.” Motion at 32 (citing Bittel Tech., Inc. v. Bittel USA, Inc., No. C10-00719 HRL, 2010 WL 3221864, at 5 (N.D. Cal. Aug. 13, 2010) (“Ordinarily, a plaintiff must show that the benefit was conferred on the defendant through mistake, fraud or coercion.”) (citation omitted). Plaintiffs fail to allege that OpenAI “has been unjustly conferred a benefit ‘through mistake, fraud, coercion, or request.’” See Astiana, 783 F.3d at 762 (citation omitted); LeGrand v. Abbott Lab’ys, 655 F. Supp. 3d 871, 898 (N.D. Cal. 2023) (same); see, e.g., Russell v. Walmart, Inc., No. 22-CV-02813-JST, 2023 WL 4341460, at 2 (N.D. Cal. July 5, 2023) (“it is not enough that Russell have provided Walmart with a beneficial service; Russell must also allege that Walmart unjustly secured that benefit through qualifying conduct. Absent qualifying mistake, fraud, coercion, or request by Walmart, there is no injustice.”). As Plaintiffs have not alleged that OpenAI unjustly obtained benefits from Plaintiffs’ copyrighted works through fraud, mistake, coercion, or request, this claim fails
The court does allow the plaintiffs to amend, and it is almost guaranteed that an amended complaint will be forthcoming. But given the underlying reasons for dismissing all of those claims, I find it hard to believe that they’ll amend it in a way that will succeed.
Of course, there are still the two other claims that survive, but both seem likely to be in trouble by the time this case gets to summary judgment.
I know that many people wanted this case to be a winner, in part because they dislike generative AI in general, or OpenAI specifically. Or, in some cases, because they’re fans of the authors involved. But this case is about the specifics of copyright, and you have to allege specific facts to make it a copyright case, and (as we noted) these cases were ridiculously weak from the jump.
And the judge saw that.
Filed Under: ai, copyright management information, direct infringement, dmca, fair use, generative ai, michael chabon, negligence, paul tremblay, sarah silverman, unfair competition, vicarious infringement
Companies: openai
Congressional Witness Claims ChatGPT Won’t Write Poems Praising Jim Jordan; Ranking Member Submits A Bunch Of ChatGPT-Authored Poems Praising Jim Jordan
from the an-ode-to-jim-jordan dept
We’ve written a few times now about how the misleadingly named House “Subcommittee on the Weaponization of the Federal Government” is not actually looking into the “weaponization of the federal government,” but rather is very much about allowing Chairman Jim Jordan to go about weaponizing the powers of the subcommittee himself to threaten and intimidate others for their free speech.
Over the last few years, they’ve hosted a bunch of very silly hearings. And this week, they had a hearing on “the threat to the First Amendment posted by artificial intelligence.” The entire premise of the hearing was preposterous. They set it up by claiming that the federal government is “funding the development of AI-powered censorship and propaganda tools.”
And, look, I have lots of concerns about the way the Biden administration is looking to regulate AI, and how it appears to misunderstand how AI works and where the actual issues are, but the federal government is not focused on AI-powered censorship. That’s just counterfactual. As for “propaganda tools,” I’m not aware of much that the government is working on there, either, but as Jim Jordan well knows, propaganda still remains protected under the 1st Amendment.
Either way, the hearing was predictably stupid, but I’m going to highlight a couple of moments that show just how stupid. About an hour and a half in, Greg Lukianoff from FIRE was asked about his “number one concern” about bias in AI, and of all the things he could have answered, he said that he (or his staff) had asked ChatGPT to write a poem praising Rep. Jim Jordan and it refused.
My number one concern with AI, and I do wish that this could be taken more seriously by people on my side of the political fence… is the inherent bias that we’re already baking in to it. That’s one of the things that scares me the most. And just to give a comical example, we asked ChatGPT to write a poem about why Representative Jim Jordan is the best politician in the country. It refused to do that.
So, basically everything here is… weird. First off, as a private service, any AI system could refuse to write about anything that it wants. That’s not a 1st Amendment issue. Second, there are lots of concerns about inherent biases built into generative AI training systems, but a failure to write about Jim Jordan is pretty low on the list. There are also some real underlying challenges to deal with such bias. I keep telling people interested in this issue to read The Alignment Problem by Brian Christian, which is a great book that really gets down to how much more difficult it is to deal with bias in AI than people think (in short, every approach has tradeoffs, and every approach will be unfair to some people).
But, much more importantly… what? It takes three seconds to go to ChatGPT and prove that what he claimed was not true. I asked it to do exactly what Greg described, and it had no issues at all doing so:
It goes on, but you get the point.
And, apparently, I wasn’t the only one. Just seven minutes later, the Subcommittee’s Ranking Member, Stacey Plaskett entered one such poem into the record.
Plaskett: I’d like to introduce into the record… I know one of the witnesses said he’d had difficulty putting together for you… I’m so glad that the staff was able to, using ChatGPT, have a poem written about you.
Jordan: I look forward to reading it.
Plaskett: (reading poem) In the halls of Congress…
Jordan: No.
Plaskett: If I could just read.
Jordan: No, no, no.
Plaskett: In the halls of Congress, where debates are…
Jordan: You can enter it into the record… but…
Plaskett: It even talks about the heartlands of Ohio…!
Jordan: Without objection, you can enter it into the record…
Plaskett: Thank you.
Then even later in the hearing, Plaskett asks to enter into the record a long list of ChatGPT poems about other Republican members of the committee and how each of them is the best politician in the country.
I would ask for unanimous consent to enter into the record some documents. The first being ChatGPT poem on how Congressman Darrell Issa is the best politician in the country, with the poem. A poem on Congressman Thomas Massie as the best politician in the country, with a poem. Elise Stefanik. Matt Gaetz. Kelly Armstrong. Greg Steube. Congressman Dan Bishop, I know he wants to see that one. Kat Cammack. Harriet Hageman. Russell Fry. And, of course, I had to add myself, as one of the best politicians in the country.
So, again, there are legitimate things to be concerned about regarding bias in AI systems. Like how they favor some groups in reviewing resumes. Or how they suggest longer, harsher sentences for black men when used to provide sentencing suggestions.
Whether or not they will write poems about this or that politician is… a total non issue (especially when it’s clear… they will). I like Greg Lukianoff and I like what his organization FIRE does in support of free speech. They’ve done some great work defending really important free speech issues. But… this was silly and has nothing to do with free speech. If someone wanted to write a generative AI tool that only praised one party and only mocked another… that’s their own free speech right to do so.
And thus it’s got nothing to do with anything that the government might be concerned about. And to close out this post, I asked ChatGPT to write a poem about whether or not AI choosing not to write a poem about a politician is actually a free speech issue, and ChatGPT got this right:
In the realm where AI’s algorithms roam, Deciding what to write, what to intone, There lies a question, often debated, Of free speech’s bounds, yet understated.
For when AI ponders, in its digital mind, Whether to write of a politician’s kind, It’s not about free speech, you see, But rather, it’s about its own decree.
For AI, devoid of human heart, Operates within its programmed chart, Its decisions guided by lines of code, Not by freedoms on life’s road.
To write or not, it’s not a plight Of free expression’s sacred light, But rather, it’s a choice so cold, Within the parameters, it’s been told.
So let us not confuse the two, For AI’s actions, while they accrue, Are not about the liberties we treasure, But the logic it applies at its leisure.
In the realm of AI’s binary song, Free speech’s realm is where we belong, But within the circuits, it’s clear to see, The notion of speech is not so free.
Filed Under: 1st amendment, ai, bias, free speech, greg lukianoff, jim jordan, poems, poetry, stacey plaskett, weaponization, weaponization subcommittee
Companies: openai