Is Web Scraping Legal? Laws & Best Practices (original) (raw)

Legal regulations have changed in the web scraping market. While litigation once focused on unauthorized access, new lawsuits related to AI training and technical workarounds are shaping acceptable practices.

Disclaimer: Our work is for informational purposes and not legal advice; please get professional legal advice for specific guidance.

Web scraping is legal if you scrape publicly available data on the web. However, the legality of web scraping depends on how, what, and why you’re scraping.

In 2026, the EU Commission’s guidelines clarified the rules for scraping data for AI training in Europe. Developers are now required to honor machine-readable opt-outs. 1

Publishing summaries of training data increases the risk of lawsuits over undisclosed data collection. Companies also need to keep a Traceability Log that records whether each scraped URL was checked for copyright and personal data issues.

Web scraping can be legal when you:

For more on ethical data collection, check out our ethical & compliant web data benchmark.

Though web scraping can be legal, being scraped is not desired by companies. If these platforms can show that being scraped by a bot damages their infrastructure or operations, then that activity may be found illegal by the court.

Here, we have compiled the most significant lawsuits in which the court sided with the scraped website; these cases, especially from the U.S.

Reddit vs. Perplexity AI & scraping services

Court: U.S. District Court for the Southern District of New York
Timeline: October 2025 – Present (Active Case)

Reddit sued the AI search engine Perplexity AI and three major scraping/proxy providers (SerpApi, Oxylabs, AWMProxy) for industrial-scale data collection and bypassing technical barriers. 2

Conflict:
Reddit alleges that the defendants engaged in a “bank robbery-style” scheme to steal copyrighted content. Instead of entering into licensing agreements (like OpenAI and Google), Perplexity used specialized scraping tools to bypass Reddit’s defenses.

Legal arguments:

Current status:
As of late 2025, the case is ongoing, and no final ruling has been issued.

Reddit vs. Anthropic

Court: Superior Court of California in San Francisco
Timeline: Late 2025 – Present (Active Litigation)

Reddit sued the AI startup Anthropic, accusing it of unlawfully using data from its 100 million daily users to train its AI systems.3

Unlike Google and OpenAI, who have paid licensing deals with Reddit, Anthropic allegedly declined to enter into an agreement. Reddit’s legal team argues that without a formal agreement, there are no guardrails to ensure user privacy protections.

Current status: As of late 2025**, there has been no final court ruling.** The case is currently in the pre-trial discovery phase. Anthropic has moved to have parts of the case dismissed, arguing that factual data is not copyrightable.

Linkedin vs hiQ Labs Case

Court: U.S. District Court / Ninth Circuit Court of Appeals
Timeline: 2017–2022

LinkedIn sued hiQ Labs, a data analytics company, for scraping publicly available profiles to conduct a professional skill analysis.4 Several courts, including the Supreme Court, reviewed the case:

The court ruled that LinkedIn’s actions to block hiQ were lawful. Despite CFAA considerations, breaching a website’s terms of service can result in legal consequences. hiQ’s violations of LinkedIn’s user agreement played a significant role in the final judgment.

Meta vs Bright Data

Court: U.S. District Court for the Northern District of California
Timeline: 2023–2024

Case Type: Civil lawsuit involving breach of contract and unauthorized data scraping

In January 2023, Meta initiated a lawsuit against Bright Data, alleging that it had illegally extracted data from Meta’s Facebook and Instagram platforms. Interestingly, Bright Data contested Meta’s claims about its data scraping rights, leading both parties to court.

The court ruled in favor of Bright Data, finding insufficient evidence to show that Bright Data had scraped nonpublic data or accessed data while logged into user accounts. In February 2024, Meta decided to drop the remaining claims against Bright Data.6

Does Meta (Facebook/Instagram) prohibit all automated data collection?

If you’ve read the Instagram terms of use, you’ve likely seen the clause stating that ‘scraping by automated means is prohibited.’

However, the legal reality is more complex. In the landmark Meta v. Bright Data (2024) case, the court ruled that if you are scraping public data while logged out, Meta’s terms do not necessarily apply because you never signed a contract by logging in.

Many websites include a Facebook terms, automated data collection, scraping prohibited’ warning. But as seen in recent web scraping legal updates, courts are increasingly distinguishing between data behind a login wall and data available to the open web.

X Corp., formerly Twitter vs Bright Data

Court: U.S. District Court for the Northern District of California

Timeline: 2023–ongoing

Case Type: Unauthorized data access under computer fraud statutes, intellectual property violations

In July 2023, X Corp. filed a lawsuit against Bright Data, alleging that Bright Data violated its terms of service by scraping and selling vast amounts of data from the X platform. 7 The legal action in California was about Bright Data’s access to public data on Twitter.

The case was dismissed, and the judge ruled that X failed to plausibly allege that Bright Data had violated its user agreement. The court held that terms of service could not prevent data scraping since X Corp was not the owner of the content and therefore could not enforce its copyright.

Owning user content would invalidate X Corp’s safe harbor protection, which enables social media companies to distance themselves from copyright infringement and other crimes committed by their users. Therefore, courts again ruled in favor of a party that collected public data from a social network.

eBay vs Bidder’s Edge Case

Court: United States District Court for the Northern District of California

Timeline: 1999–2000

Case type: Civil lawsuit for trespass to chattels, in which eBay accused Bidder’s Edge of unlawfully scraping its site using automated data collection bots.

Bidder’s Edge (BE), an online price comparison website, used web scraping tools to aggregate auction listings from various platforms, including eBay, without permission. 8 eBay claimed that BE’s automated bots caused unauthorized use of its systems.

The court order was preventing Bidger’s Edge from scraping eBay content again. The main argument eBay won was that Bidger’s Edge was overloading their system, and that others following Bidger’s Edge could cause further harm to eBay’s system.

Facebook vs Power Ventures Case

Court: U.S. District Court for the Northern District of California
Later, it appealed to the U.S. Court of Appeals for the Ninth Circuit

Timeline: 2008–2017

Case Type: Civil lawsuit under the CFAA and California’s anti-hacking law, with Facebook alleging unauthorized access to its platform.

In 2009, Facebook sued Power Ventures for scraping content from its users’ uploaded websites. This example set is for a case in which web scraping was evaluated from an intellectual property standpoint. The court sided with Facebook and ordered a fiscal penalty for Power Ventures.9

Latest regulations on web scraping by country

United States

Legal Status: The web scraping of publicly available data is considered legal.

There are no federal laws against web scraping in the United States as long as the scraped data is publicly available and the scraping activity does not harm the website being scraped. There is one specific act from 2016 against purchasing an excessive number of tickets at once using bots to prevent black markets.10

European Union and the UK

Legal Status: In the EU and UK, web scraping of publicly available, non-personal, and non-copyrighted content is legal, but scraping personal data without a lawful basis is prohibited under GDPR.

The EU recently passed the Digital Services Act, which aims to bring all EU countries under the Digital Single Market, sharing the same regulations. According to Articles 3 and 4 of this regulation, “reproduction of publicly available content” is not illegal.11 12

This regulation approaches the topic from an intellectual property perspective, and, needless to say, would deem any web scraping involving personal data illegal under the GDPR. Apart from it, the situation is similar to the US in the EU markets and the UK.

From a legal standpoint, one question businesses should ask themselves is whether their scraping acts harm the scraped website. If the scraping activity:

The website would have grounds to file a lawsuit against the scraper.

From an ethical standpoint, given that web scraping has many use cases and professional providers in the market, there is no shame in using it for business purposes. There are technical web scraping best practices that will ease the traffic load on the scraped website, such as:

As long as you find a trusted web scraper to work with or make sure your technical resources consider these, you can defend your web scraping as ethical for your business purposes.

Dos:

Don’ts:

Organizations for ethical web scraping

Leading web data infrastructure companies have formed associations to align their industry and stakeholders on the ethical use of web scraping. These associations are:

The legal status of data scraping depends on the type of data, its location, and the methods used to access it. Many relevant laws are being interpreted and established by courts.

For example, In the United States, courts have held that scraping publicly accessible data without requiring a login or bypassing security measures does not violate the Computer Fraud and Abuse Act (CFAA). Cases such as hiQ v. LinkedIn, Meta v. Bright Data, and Van Buren v. United States confirm that scraping public data does not breach the CFAA.

However, violating a website’s terms of service or scraping data behind login walls may still create liability. The method of access is critical, as logging in or bypassing technical barriers significantly changes the legal analysis.

FAQs

If a website’s terms of service (ToS) explicitly prohibit scraping, accessing, or collecting data from that site through automated means, doing so may constitute a violation of those terms.

For instance, in the United States, unauthorized access to a computer system can be a federal offense under the Computer Fraud and Abuse Act (CFAA). You can contact the site owner to request permission or use the official APIs to access data.

Not by itself. Courts treat terms-of-service violations as a civil contract matter, not a criminal offense. However, a violation can support breach-of-contract claims and strengthen claims under other laws, particularly after explicit notice, such as a cease-and-desist notice.

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Gulbahar Karatas (2026) - "Is Web Scraping Legal? Laws & Best Practices". Published online at AIMultiple.com. Retrieved June 2, 2026, from: https://aimultiple.com/is-web-scraping-legal [Online Resource]

Karatas, G. (2026, June 2). Is Web Scraping Legal? Laws & Best Practices. AIMultiple. https://aimultiple.com/is-web-scraping-legal

@misc{karatas2026, author = {Karatas, Gulbahar}, title = {{Is Web Scraping Legal? Laws & Best Practices}}, year = {2026}, month = jun, howpublished = {\url{https://aimultiple.com/is-web-scraping-legal}}, note = {AIMultiple. Retrieved June 2, 2026} }

Gulbahar Karatas

Gulbahar Karatas

Industry Analyst

Gülbahar is an AIMultiple industry analyst focused on web data collection, applications of web data and application security.

View Full Profile