Is Web Scraping Legal? Laws & Best Practices (original) (raw)

Legal regulations have changed in the web scraping market. While litigation once focused on unauthorized access, new lawsuits related to AI training and technical workarounds are shaping acceptable practices.

Disclaimer: Our work is for informational purposes and not legal advice; please get professional legal advice for specific guidance.

Is web scraping legal?

Web scraping is legal if you scrape publicly available data on the web. However, the legality of web scraping depends on how, what, and why you’re scraping.

In 2026, the EU Commission’s guidelines clarified the rules for scraping data for AI training in Europe. Developers are now required to honor machine-readable opt-outs. 1

Publishing summaries of training data increases the risk of lawsuits over undisclosed data collection. Companies also need to keep a Traceability Log that records whether each scraped URL was checked for copyright and personal data issues.

Web scraping can be legal when you:

Prioritize logged-out scraping: Scrape publicly available data from webpages accessible without a login, subscription, or payment.
Avoid technical circumvention: Respect the website’s terms of service, robots.txt file, and copyright laws.
Align with commercial use policies: Ensure your scraping intent (e.g., search indexing vs. AI model training) aligns with the site’s commercial use policies. Cases like Reddit v. Anthropic are currently defining new boundaries for “Fair Use” when data is explicitly scraped for AI development.
Comply with global privacy laws: Don’t collect personal or sensitive data, such as names or contact information, in a manner that violates privacy laws, including the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).

For more on ethical data collection, check out our ethical & compliant web data benchmark.

Latest web scraping legal updates

Though web scraping can be legal, being scraped is not desired by companies. If these platforms can show that being scraped by a bot damages their infrastructure or operations, then that activity may be found illegal by the court.

Here, we have compiled the most significant lawsuits in which the court sided with the scraped website; these cases, especially from the U.S.

Reddit vs. Perplexity AI & scraping services

Court: U.S. District Court for the Southern District of New York
Timeline: October 2025 – Present (Active Case)

Reddit sued the AI search engine Perplexity AI and three major scraping/proxy providers (SerpApi, Oxylabs, AWMProxy) for industrial-scale data collection and bypassing technical barriers. 2

Conflict:
Reddit alleges that the defendants engaged in a “bank robbery-style” scheme to steal copyrighted content. Instead of entering into licensing agreements (like OpenAI and Google), Perplexity used specialized scraping tools to bypass Reddit’s defenses.

Legal arguments:

Indirect scraping via Google: Defendants bypassed Reddit’s own blocks by scraping Reddit’s content directly from Google Search Results (SERPs).
DMCA violations: Unlike previous “public data” cases (such as hiQ), Reddit is invoking the Digital Millennium Copyright Act (DMCA) Section 1201. They argue that the defendants didn’t “access” data, but purposefully bypassed “technological measures” (rate limits, captchas, and SearchGuard).
Refusal to license: Reddit highlights that while other AI giants pay for data access, Perplexity increased its scraping volume 40-fold after receiving a cease-and-desist letter, choosing “circumvention over cooperation.”

Current status:
As of late 2025, the case is ongoing, and no final ruling has been issued.

Reddit vs. Anthropic

Court: Superior Court of California in San Francisco
Timeline: Late 2025 – Present (Active Litigation)

Reddit sued the AI startup Anthropic, accusing it of unlawfully using data from its 100 million daily users to train its AI systems.3

Unlike Google and OpenAI, who have paid licensing deals with Reddit, Anthropic allegedly declined to enter into an agreement. Reddit’s legal team argues that without a formal agreement, there are no guardrails to ensure user privacy protections.

Current status: As of late 2025**, there has been no final court ruling.** The case is currently in the pre-trial discovery phase. Anthropic has moved to have parts of the case dismissed, arguing that factual data is not copyrightable.

Linkedin vs hiQ Labs Case

Court: U.S. District Court / Ninth Circuit Court of Appeals
Timeline: 2017–2022

LinkedIn sued hiQ Labs, a data analytics company, for scraping publicly available profiles to conduct a professional skill analysis.4 Several courts, including the Supreme Court, reviewed the case:

The court initially sided with hiQ, ruling that scraping public data does not violate the Computer Fraud and Abuse Act (CFAA).5
In 2022, the Ninth Circuit reaffirmed this, stating that accessing publicly available data without authorization is not “unauthorized access” under CFAA.

The court ruled that LinkedIn’s actions to block hiQ were lawful. Despite CFAA considerations, breaching a website’s terms of service can result in legal consequences. hiQ’s violations of LinkedIn’s user agreement played a significant role in the final judgment.

Meta vs Bright Data

Court: U.S. District Court for the Northern District of California
Timeline: 2023–2024

Case Type: Civil lawsuit involving breach of contract and unauthorized data scraping

In January 2023, Meta initiated a lawsuit against Bright Data, alleging that it had illegally extracted data from Meta’s Facebook and Instagram platforms. Interestingly, Bright Data contested Meta’s claims about its data scraping rights, leading both parties to court.

The court ruled in favor of Bright Data, finding insufficient evidence to show that Bright Data had scraped nonpublic data or accessed data while logged into user accounts. In February 2024, Meta decided to drop the remaining claims against Bright Data.6

Does Meta (Facebook/Instagram) prohibit all automated data collection?

If you’ve read the Instagram terms of use, you’ve likely seen the clause stating that ‘scraping by automated means is prohibited.’

However, the legal reality is more complex. In the landmark Meta v. Bright Data (2024) case, the court ruled that if you are scraping public data while logged out, Meta’s terms do not necessarily apply because you never signed a contract by logging in.

Many websites include a Facebook terms, automated data collection, scraping prohibited’ warning. But as seen in recent web scraping legal updates, courts are increasingly distinguishing between data behind a login wall and data available to the open web.

X Corp., formerly Twitter vs Bright Data

Court: U.S. District Court for the Northern District of California

Timeline: 2023–ongoing

Case Type: Unauthorized data access under computer fraud statutes, intellectual property violations

In July 2023, X Corp. filed a lawsuit against Bright Data, alleging that Bright Data violated its terms of service by scraping and selling vast amounts of data from the X platform. 7 The legal action in California was about Bright Data’s access to public data on Twitter.

The case was dismissed, and the judge ruled that X failed to plausibly allege that Bright Data had violated its user agreement. The court held that terms of service could not prevent data scraping since X Corp was not the owner of the content and therefore could not enforce its copyright.

Owning user content would invalidate X Corp’s safe harbor protection, which enables social media companies to distance themselves from copyright infringement and other crimes committed by their users. Therefore, courts again ruled in favor of a party that collected public data from a social network.

eBay vs Bidder’s Edge Case

Court: United States District Court for the Northern District of California

Timeline: 1999–2000

Case type: Civil lawsuit for trespass to chattels, in which eBay accused Bidder’s Edge of unlawfully scraping its site using automated data collection bots.

Bidder’s Edge (BE), an online price comparison website, used web scraping tools to aggregate auction listings from various platforms, including eBay, without permission. 8 eBay claimed that BE’s automated bots caused unauthorized use of its systems.

The court order was preventing Bidger’s Edge from scraping eBay content again. The main argument eBay won was that Bidger’s Edge was overloading their system, and that others following Bidger’s Edge could cause further harm to eBay’s system.

Facebook vs Power Ventures Case

Court: U.S. District Court for the Northern District of California
Later, it appealed to the U.S. Court of Appeals for the Ninth Circuit

Timeline: 2008–2017

Case Type: Civil lawsuit under the CFAA and California’s anti-hacking law, with Facebook alleging unauthorized access to its platform.

In 2009, Facebook sued Power Ventures for scraping content from its users’ uploaded websites. This example set is for a case in which web scraping was evaluated from an intellectual property standpoint. The court sided with Facebook and ordered a fiscal penalty for Power Ventures.9

Latest regulations on web scraping by country

United States

Legal Status: The web scraping of publicly available data is considered legal.

There are no federal laws against web scraping in the United States as long as the scraped data is publicly available and the scraping activity does not harm the website being scraped. There is one specific act from 2016 against purchasing an excessive number of tickets at once using bots to prevent black markets.10

European Union and the UK

Legal Status: In the EU and UK, web scraping of publicly available, non-personal, and non-copyrighted content is legal, but scraping personal data without a lawful basis is prohibited under GDPR.

The EU recently passed the Digital Services Act, which aims to bring all EU countries under the Digital Single Market, sharing the same regulations. According to Articles 3 and 4 of this regulation, “reproduction of publicly available content” is not illegal.11 12

This regulation approaches the topic from an intellectual property perspective, and, needless to say, would deem any web scraping involving personal data illegal under the GDPR. Apart from it, the situation is similar to the US in the EU markets and the UK.

Dos and don’ts of legal and ethical web scraping

From a legal standpoint, one question businesses should ask themselves is whether their scraping acts harm the scraped website. If the scraping activity:

It is too intense, which can interrupt the services of the scraped website
The scraped data is used to duplicate the activity or service of that website, even though no regulations exist.

The website would have grounds to file a lawsuit against the scraper.

From an ethical standpoint, given that web scraping has many use cases and professional providers in the market, there is no shame in using it for business purposes. There are technical web scraping best practices that will ease the traffic load on the scraped website, such as:

Using the website’s APIs rather than web scraping, when available.
Integrating web scrapers with proxy servers.
Using headless browsers.

As long as you find a trusted web scraper to work with or make sure your technical resources consider these, you can defend your web scraping as ethical for your business purposes.

Dos:

Scrape the data you need by defining the exact business case and customizing your web crawler technology accordingly. This will minimize your risk of exhausting the scraped website with unwanted traffic.
Always read the terms of use of the scraped website. In addition to commercial terms of use, websites also have a robots.txt file that specifies permissions for the website’s content. Your web crawling solution or technical experts should help you comply with these permissions.
Be transparent about your web scraping and be ready to explain your scraping process to assure others that your approach is legal and ethical.

Don’ts:

Do not exhaust the scraped website too often and with too extensive pulls. This will also increase the likelihood that the scraped website will block your crawler.
Do not collect personally identifiable information, or if robot.txt allows you to collect it, ensure that you mask the data to minimize exposure during processing.
Do not expose the scraped data to the public. Make sure that it is stored securely, like your own company data. You never know what purposes it may be used for if it is leaked.

Organizations for ethical web scraping

Leading web data infrastructure companies have formed associations to align their industry and stakeholders on the ethical use of web scraping. These associations are:

Alliance for Responsible Data Collection, which includes Bright Data and Common Crawl among other stakeholders.
Ethical Web Data Collection Initiative (EWDCI), which includes Oxylabs, NetNut, ProxyEmpire, Zyte, among others.

Is scraping data for AI training legal?

The legal status of data scraping depends on the type of data, its location, and the methods used to access it. Many relevant laws are being interpreted and established by courts.

For example, In the United States, courts have held that scraping publicly accessible data without requiring a login or bypassing security measures does not violate the Computer Fraud and Abuse Act (CFAA). Cases such as hiQ v. LinkedIn, Meta v. Bright Data, and Van Buren v. United States confirm that scraping public data does not breach the CFAA.

However, violating a website’s terms of service or scraping data behind login walls may still create liability. The method of access is critical, as logging in or bypassing technical barriers significantly changes the legal analysis.

FAQs

If a website’s terms of service (ToS) explicitly prohibit scraping, accessing, or collecting data from that site through automated means, doing so may constitute a violation of those terms.

For instance, in the United States, unauthorized access to a computer system can be a federal offense under the Computer Fraud and Abuse Act (CFAA). You can contact the site owner to request permission or use the official APIs to access data.

Not by itself. Courts treat terms-of-service violations as a civil contract matter, not a criminal offense. However, a violation can support breach-of-contract claims and strengthen claims under other laws, particularly after explicit notice, such as a cease-and-desist notice.

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Gulbahar Karatas (2026) - "Is Web Scraping Legal? Laws & Best Practices". Published online at AIMultiple.com. Retrieved June 2, 2026, from: https://aimultiple.com/is-web-scraping-legal [Online Resource]

Karatas, G. (2026, June 2). Is Web Scraping Legal? Laws & Best Practices. AIMultiple. https://aimultiple.com/is-web-scraping-legal

@misc{karatas2026, author = {Karatas, Gulbahar}, title = {{Is Web Scraping Legal? Laws & Best Practices}}, year = {2026}, month = jun, howpublished = {\url{https://aimultiple.com/is-web-scraping-legal}}, note = {AIMultiple. Retrieved June 2, 2026} }

Gulbahar Karatas

Industry Analyst

Gülbahar is an AIMultiple industry analyst focused on web data collection, applications of web data and application security.

View Full Profile