data scraping – Techdirt (original) (raw)

Elon Loses His Favorite Law Firm In Data Scraping Case Because They Played Both Sides

from the least-shocking-news-of-the-day dept

It’s pretty rare to see a judge disqualify a law firm from taking on a case. But Judge William Alsup has done just that, disqualifying the litigation powerhouse law firm Quinn Emanuel from representing ExTwitter in a big data scraping case.

We wrote about this case back in May, highlighting both the importance and complexities of it. There are all sorts of questions about whether or not scraping content from the public web should be allowed or not. Companies like Facebook have fought against it for years, but other companies were less concerned about it until recently. And that’s because over the last couple of years, they’ve realized that AI companies are willing to pay millions of dollars to get access to that data.

Both Meta and ExTwitter have targeted Bright Data, one of a number of scraping companies. So far, Bright Data has won lawsuits against both companies. Courts have said, “hey, look, this is public information, and you can’t sue someone for collecting public information.”

In the ExTwitter case, Judge Alsup gave the company about a month to try to file an amended complaint to see if the company could salvage any sort of legitimate claim. Just before that deadline, some lawyers from Quinn Emanuel (one of Elon’s favorite law firms) made an appearance on behalf of ExTwitter (basically, because the original complaint was so trounced, Elon was handing the case over to Quinn in hopes they could rescue it).

However, the folks at Bright Data were a bit surprised by this and quickly filed a motion to disqualify Quinn from the case, noting that Quinn had been retained and done work with Bright Data in its nearly identical case against Meta. This meant that (1) they would have inside knowledge of Bright Data and its litigation strategy and (2) they were effectively “switching sides” in a case, which is a huge legal ethical problem.

When Meta and Bright Data filed dueling lawsuits against each other, Bright Data engaged Quinn, Emanuel, Urquhart & Sullivan, LLP for advice. Now, Quinn has switched sides, representing X in suing Bright Data to prevent public web scraping and to shut down the same services at issue in the Meta case. Doing so violates the core tenets of loyalty and confidentiality under California’s Rules of Professional Responsibility. Quinn must be disqualified.

Quinn shot back that it had really done nothing too big or important for Bright Data, hadn’t learned anything confidential, and that this was another case anyway (this one is about ExTwitter, not Meta!)

The fact that different Quinn Emanuel attorneys (who are now ethically walled), pursuant to a tailored engagement letter, previously billed 30.6 hours for discrete, peripheral advice regarding a specific lawsuit in which Bright Data allegedly breached a different social media platform’s different terms and conditions should not result in X being deprived of its chosen attorneys at Quinn Emanuel in a case concerning Bright Data’s breach of X’s terms of use and other claims.

Judge Alsup is not known for putting up with bullshit, and has disqualified Quinn, and did so in fairly stringent terms. I’d call this a minor benchslap.

X does not dispute the central facts or law undergirding Bright Data’s motion. In its own words, “disqualification becomes mandatory” if the X and Meta matters are “substantially related.” And it tallies a team of nine attorneys who provided an overall assessment of the Meta litigation. Save for what it acknowledges are “minor wording” differences between Meta’s and X’s Terms, it agrees the cases are virtually identical, involving the same Bright Data services and conduct, and the same legal issues for the overlapping claims. The cases are irrefutably “substantially related.”

X argues this is not dispositive because Quinn’s role in the Meta case was too short-lived, “discrete,” or “limited” to trigger any ethical obligation beyond that, no matter how related it is. But the disqualification inquiry focuses on the relatedness of the cases and whether the lawyer had a “direct and personal” contact with the former client. That Quinn was not lead counsel is irrelevant. Nor can Quinn escape disqualification by characterizing its engagement as “limited.” That is just lawyer argument. The facts are that Quinn was hired to provide an overall assessment of the Meta litigation, and its report covered all aspects of the case, including procedural issues, discovery, strengths and weakness of the claims, defenses, and other issues.

And what about the fact that that case was about Meta, and this one is about ExTwitter? Judge Alsup wasn’t born yesterday.

True, X says, but irrelevant, because X was not mentioned by name. But cases can be substantially related even if the plaintiff is different. X concedes it geared its advice to place Bright Data in the best position procedurally and substantively to defend future claims by other website operators. That these other operators were referred to categorically, rather than by name, does not render Quinn’s advice immaterial, given that the conduct and governing law are the same.

Faced with undeniable factual and legal similarity, X tries changing the test. “To determine whether a substantial relationship exists,” it says, Bright Data must show that “any information Quinn Emanuel acquired … is material to the X Matter.” Q.Br. 8-9. But “this type of inquiry is outlawed.” Farris v. Fireman’s Fund Ins. Co., 119 Cal. App. 4th 671, 683 n.10 (2004). Under Rule 1.9(a), access to confidential information is irrelevant; and under Rule 1.9(c), access is presumed in substantially related matters. Neither rule would make sense if Bright Data had to prove that Quinn has confidential information to avoid proving that fact.

Regardless, Quinn possesses a wealth of material information. X’s efforts to minimize this through lack of recollection and disregard for the connections between the two cases cannot overcome the facts. Elan Transdermal Ltd. v. Cygnus Therapeutic Sys., 809 F. Supp. 1383, 1392- 93 (N.D. Cal. 1992) (“The Court, reading the stack of declarations from Irell attorneys, all proclaiming their ignorance …, is reminded of the words of Hamlet’s mother: ‘The lady doth protest too much, methinks.’”). Quinn prepared and received attorney work product, and engaged in multiple attorney-client communications, going to the heart of the legality of the services at issue. Not even the Quinn Report’s primary author claims irrelevance.

And, just to drive the point home:

The Court, it says, should use its “equitable” powers to forgive its past transgression and sanction its continuing ethical violation. But California courts do “ not engage in a ‘balancing of equities’ between the former and current clients. The rights and interests of the former client will prevail.”

There’s a lot more in the opinion, but Judge Alsup is not at all impressed by the arguments Quinn lawyers are making here. I mean:

In any event, Quinn’s representation was not narrowly circumscribed. It was not asked to opine on some arcane or peripheral issue of, say, tax law or even copyright law. Bright Data retained Quinn because it was lead counsel in hiQ and familiar with Bright Data’s scraping technology from discovery in that case. Avisar ¶ 2. Quinn was given broad remit to advise on all of Bright Data’s defenses and strategies in the Meta cases. That is not “peripheral;” it is core.

Nor is there merit to X’s argument that Quinn’s advice was only tangentially relevant to this case. Quinn concedes it provided advice with an eye towards future suits by “other” website operators. Skibitsky ¶ 11. To minimize this fact, Ms. Skibitsky observes that Bright Data only quoted from the Quinn Report’s Executive Summary. Id. at ¶ 18. But the very purpose of an Executive Summary is to highlight the most important points in the Report. And this one was even in bold italics. One does not do that for unimportant points. Nor does Ms. Skibitsky deny that Quinn extensively discussed these issues with Bright Data’s Board. At most, she states that she can’t recall which “specific website operators” were discussed. Id. But that does not change the undisputed fact that she and her colleagues discussed Bright Data’s strategy for defending the identical services from identical claims by other website operators.

And thus, sorry Elon, Quinn needs to sit this one out.

Filed Under: both sides, conflicts of interest, data scraping, ethics, legal ethics, william alsup
Companies: bright data, quinn emanuel, twitter, x

from the another-one-down-the-drain dept

Is there any law that Elon Musk actually understands?

The latest is that he’s lost yet another lawsuit, this time (in part) for not understanding copyright law.

There have been a variety of lawsuits regarding data scraping over the past decade, and we’ve long argued that such scraping should be allowed under the law (though sites are free to take technical measures to try to block them). Some of these issues are at stake in the recent Section 230 lawsuit that Ethan Zuckerman filed against Meta. That one is more about middleware/API access.

But pure “scraping” has come up in a number of cases, most notably the LinkedIn / HiQ case, where the 9th Circuit has said that scraping of public information is not a violation of the CFAA, as it was not “unauthorized access.” But the follow-up to that case was that the court still blocked HiQ from scraping LinkedIn, in part because of LinkedIn’s user agreement.

This has created a near total mess, where it is not at all clear if scraping public data on the internet is actually allowed.

This has only become more important in the last few years with the rise of generative AI and the need to get access to as much data as possible to train on.

Internet companies have been pushing to argue that their terms of service can block all kinds of scraping, perhaps relying on the eventual injunction blocking HiQ. Both Meta and ExTwitter sued a scraping company, Bright Data, arguing that its scraping violated their terms of service.

In January, Meta’s case against Bright Data was dismissed at the summary judgment stage. The judge in that case, Edward Chen, found that Meta’s terms of service clearly do not prohibit logged-off scraping of public data.

Now, ExTwitter’s lawsuit against the same company has reached a similar conclusion.

This time, it’s Judge William Alsup, who has dismissed the case for failure to state a claim. Alsup’s decision is a bit more thorough. It highlights that there are two separate issues here: did it violate ExTwitter’s terms of service to access its systems for scraping, and then, separately, to scrape and sell the data.

On the access side, the judge is not convinced by any of the arguments. It’s not trespass to chattels, because that requires some sort of injury.

Critically, the instant complaint alleges no such impairment or deprivation. X Corp. parrots elements, reciting that Bright Data’s “acts have caused injury to X Corp. and . . . will cause damage in the form of impaired condition, quality, and value of its servers, technology infrastructure, services, and reputation” (Amd. Compl. ¶ 102). Its lone deviation from that parroting — a conclusory statement that Bright Data’s “acts have diminished the server capacity that X Corp. can devote to its legitimate users” — fails to move the needle (Amd. Compl. ¶ 98). To say nothing of the fact that, as alleged, Bright Data and its customers are legitimate X users (subject to the Terms), the scraping tools and services they use are reliant on X Corp.’s servers functioning exactly as intended.

It’s not fraud under California law, because there’s no misrepresentation:

Starting with the argument that Bright Data’s technology and tools misrepresented requests, remember X Corp. does not allege that Bright Data or its customers have used their own registered accounts, or any other registered accounts, to scrape data from X, i.e., to access X by sending requests to X Corp.’s servers (for extracting and copying data). Meanwhile, X Corp. acknowledges that one does not need a registered account to access X and send such requests (see Amd. Compl. ¶ 22). X Corp. also acknowledges that X users with registered accounts can access X and send such requests without logging in to their registered accounts

And it’s not tortious interference with a contract, because, again, there’s no damage:

Among the elements of a tortious-interference claim is resulting damage. Pac. Gas & Elec., 791 P.2d at 590. The only damage that X Corp. plausibly pleaded in the instant complaint is that resulting from scraping and selling of data and, by extension, inducing scraping. X Corp. has not alleged any damage resulting from automated access to systems and, by extension, inducing automated access. As explained above, X Corp. has pleaded no impairment or deprivation of X Corp. servers resulting from sending requests to those servers. And, thin allusions to server capacity that could be devoted to “legitimate users” and reputational harm — not redressable under trespass to chattels as a matter of law — are simply too conclusory to be redressable at all. X Corp. will be allowed to seek leave to amend to allege damage (if any) resulting from automated access, as set out at the end of this order. But the instant complaint has failed to state a claim for tortious interference based on such access.

As for the scraping and selling of data, well, there’s no breach there either. And here we get into the copyright portion of the discussion. The question is who has the rights over this particular data. ExTwitter is claiming, somehow, that it has the right to stop scrapers because it has some rights over the data. But, the content is from users. Not ExTwitter. And that’s an issue.

Judge Alsup notes that ExTwitter’s terms give it a license to the content users post, but that’s a copyright license. Not a license to then do other stuff, such as suing others for copying it.

Note the rights X Corp. acquires from X users under the non-exclusive license closely track the exclusive rights of copyright owners under the Copyright Act. The license gives X Corp. rights to reproduce and copy, to adapt and modify, and to distribute and display (Terms 3–4). Section 106 of the Act gives “the owner of copyright . . . the exclusive rights to do and to authorize any of the following”: “to reproduce . . . in copies,” “to prepare derivative works,” “to distribute copies . . . to the public by sale,” and “to display . . . publicly.” 17 U.S.C. § 106. But X Corp. disclaims ownership of X users’ content and does not acquire a right to exclude others from reproducing, adapting, distributing, and displaying it under the non-exclusive license

Alsup notes that ExTwitter could, in theory, acquire the copyright on all content published on the platform instead of licensing it. However, he claims that it probably doesn’t do this because it could impact the company’s Section 230 immunities:

One might ask why X Corp. does not just acquire ownership of X users’ content or grant itself an exclusive license under the Terms. That would jeopardize X Corp.’s safe harbors from civil liability for publishing third-party content. Under Section 230(c)(1) of the Communications Decency Act, social media companies are generally immune from claims based on the publication of information “provided by another information content provider.” 47 U.S.C. § 230(c)(1). Meanwhile, under Section 512(a) of the Digital Millenium Copyright Act (“DMCA”), social media companies can avoid liability for copyright infringement when they “act only as ‘conduits’ for the transmission of information.” Columbia Pictures Indus., Inc. v. Fung, 710 F.3d 1020, 1041 (9th Cir. 2013); 17 U.S.C. § 512(a). X Corp. wants it both ways: to keep its safe harbors yet exercise a copyright owner’s right to exclude, wresting fees from those who wish to extract and copy X users’ content.

I have to admit, I’m not sure that a copyright assignment would change the Section 230 analysis… but perhaps? Anyway, it’s a weird hypothetical to raise in this scenario.

The larger point is just that ExTwitter has no right to stop others from copying this data. That’s not part of the rights the company has over the content on the site put there by third-party users.

The upshot is that, invoking state contract and tort law, X Corp. would entrench its own private copyright system that rivals, even conflicts with, the actual copyright system enacted by Congress. X Corp. would yank into its private domain and hold for sale information open to all, exercising a copyright owner’s right to exclude where it has no such right. We are not concerned here with an arm’s length contract between two sophisticated parties in which one or the other adjusts their rights and privileges under federal copyright law. We are instead concerned with a massive regime of adhesive terms imposed by X Corp. that stands to fundamentally alter the rights and privileges of the world at large (or at least hundreds of millions of alleged X users). For the reasons that follow, this order holds that X Corp.’s statelaw claims against Bright Data based on scraping and selling of data are preempted by the Copyright Act

And thus, the claims here also fail.

Arguably, this complaint was less silly than some others (and, yes, Meta made a similar — and similarly failed — complaint). The mess of the HiQ decisions means that the issue of data scraping is still kind of a big unknown under the law. Eventually, the Supreme Court may need to weigh in on scraping, and that’s going to be yet another scary Supreme Court case…

Filed Under: cfaa, contract, data scraping, license, terms of service, william alsup
Companies: bright data, meta, twitter, x

Meta Sues Firm For Data Scraping; Claims That Signing Up For New Accounts After Being Banned Is Equivalent Of Hacking

from the why-is-this-a-problem? dept

For years we’ve talked about the infamous Facebook lawsuit against Power.com. As you may recall, this was a key CFAA case against a site, Power.com, that was trying to create a social media aggregator dashboard — in which you could login through a single interface, and access content from and post to a variety of different social media platforms. Facebook alleged that this was a form of hacking — claiming it was “unauthorized access” to Facebook. This was even though there was no actual unauthorized access. Individual users gave Power their login credentials, so everything was completely authorized. After years of winding through the courts, unfortunately, it was decided that this was a violation of the CFAA, mainly because Facebook sent a cease & desist letter, and somehow going against that now made it “unauthorized.” In my mind, this is one of the biggest reasons why Facebook has much less competition today than it otherwise might — because it used the CFAA and cases against Power.com to create a “you can check in, but you can’t check out” kind of data arrangement. Things like Power.com were an empowering system that might have made people much less reliant on Facebook — but it was killed.

In an age now where people are increasingly talking about the importance of data portability and interoperability, something like Power.com would be a useful tool.

So, it’s interesting (and a little disturbing) to see that Facebook’s new corporate identity, Meta, has now sued another company for data scraping. It is notable that in this case, the defendant, Social Data Trading Ltd., is a lot less sympathetic a character than Power.com was. And — more importantly — Facebook is not using the CFAA this time (other cases have suggested that what Facebook got away with in the Power case it would no longer be able to get away with under that law). However, it is trying to use California’s state law equivalent of the CFAA. And now matter how you look at it, it’s still at least a little worrisome that Facebook (ok, whatever, Meta) believes it has a legal right to stop scraping of otherwise public data.

So first, Social Data Trading is not sympathetic. It appears to be a sketchy service in its own right, scraping data on social media users to sell “in-depth insights into the demographics and psychographics of influencers and their audiences.” Meta put in place some technical blocks to try to stop the company from scraping (which seems like fair game), but SDT would then just register new domains and continue scraping. Facebook had apparently tried to stop a predecessor company to Social Data Trading called “Deep.Social,” though the complaint seems to imply that SDT is just a reworking of Deep.Social.

The more difficult issue here is that part of the way that SDT did its scraping was by creating fake accounts on Facebook and Instagram, and then using those fake accounts to scrape the data. And that does bring things into a legally more complex area, but also gives Meta the route around to go after these guys without using the CFAA.

At issue is that when you create one of those accounts… you agree to the terms of service, and those terms say you can’t use the site for “collecting information in an automated way.” Thus, the core argument here is that it’s a breach of contract case, and that the SDT folks agreed to the terms and then broke them by using their fake accounts to scrape.

Since January 2019, Defendant created and used multiple Instagram accounts and agreed to Instagram?s Terms. Defendant agreed to Instagram’s Terms no later than January 30, 2019.

In addition, since September 2020, Defendant has used thousands of Instagram accounts to scrape Instagram.

Defendant breached the Terms by using unauthorized automated means to access Instagram and collect data from Meta computers without permission, including after Meta revoked Defendant?s access to its platform.

Of course, it seems to me that if this is a breach, the remedy should simply be removal of service, not anything more. But Meta claims damages “in excess of $75,000” (the minimum needed to get into federal court).

The second claim in the lawsuit seems… a lot sketchier. It claims violations of California Penal Code Section 502, which is (more or less) California’s equivalent to the CFAA. While, apparently, Meta’s lawyers know enough to not go to the well again on the federal CFAA, the use of the state equivalent is still quite concerning.

Beginning no later than June 2021, Defendant, without permission, knowingly accessed and otherwise used Meta?s computers, computer system, and computer network in order to (a) devise or execute any scheme or artifice to defraud and deceive, and (b) to wrongfully obtain money, property, or data, in violation of California Penal Code ? 502(c)(1).

Beginning no later than June 2021, Defendant, without permission, knowingly accessed and took, copied, and made use of data from Meta?s computers, computer system, and computer network in violation of California Penal Code ? 502(c)(2).

Beginning no later than June 2021, Defendant knowingly and without permission used or caused to be used Meta?s computer services in violation of California Penal Code ? 502(c)(3).

Since June 2021, Defendant knowingly and without permission accessed and caused to be accessed Meta?s computers, computer systems, and/or computer networks in violation of California Penal Code ? 502(c)(7). Defendant accessed Meta?s computer network after Meta disabled its Instagram accounts, blocked its domain, and sent correspondence to Defendant revoking its access.

Because Meta suffered damages and losses as a result of Defendant?s actions and continues to suffer damages and losses as a result of Defendant?s actions, Meta is entitled to compensatory damages in an amount to be determined at trial, attorney fees, any other amount of damages proven at trial, and injunctive relief under California Penal Code ? 502(e)(1) and (2).

Because Defendant willfully violated California Penal Code ? 502, and there is clear and convincing evidence that Defendant committed ?fraud? as defined by section 3294 of the Civil Code, Meta is entitled to punitive and exemplary damages under California Penal Code ? 502(e)(4).

All of this should be concerning to folks. It basically says that if you get kicked off a site and then create a new account… you could face serious consequences (and while this is a civil suit, Section 502 violations can lead to criminal liability as well). This should be cause for alarm. Yes, even if the defendant is a sketchy data operation, and even if Meta really didn’t want them scraping their site, to turn around and use what is, ostensibly, a computer “hacking” law against them for setting up new accounts seems incredibly dangerous and could lead to very bad consequences.

Finally there’s an “unjust enrichment” claim which also seems a bit silly — especially for a company like Facebook, which makes so much of its money by collecting data in surreptitious ways, to argue that another firm doing that back to Facebook is somehow “unjustly” enriching itself is pretty rich.

Still, it’s claim two that should raise some eyebrows, and I wish that Facebook recognized what a dangerous game its playing in trying to argue that signing up for a new account after you’ve been banned somehow violates an anti-hacking law.

Filed Under: analytics, cfaa, data, data scraping, hacking, privacy, public information
Companies: facebook, meta, social data trading