A Comprehensive Legal Guide to Web Scraping in the US

//

Date of Last Update: July 20, 2023

When I first published this guide, in the summer of 2020, I wrote that the recent trend had been toward greater permissiveness with web scraping. 

That sentence isn’t as true today as the day when I first wrote it. Recent policy trends have wavered. The few jurisdictions where courts began making broad policy statements in favor of permissiveness toward web scraping in 2017 and 2019 have backtracked. Savvy plaintiffs’ lawyers have started to eschew the more sensitive and uncertain legal battleground of the CFAA and progress toward predictable state-law claims such as breach of contract, where their track record of success—when plaintiffs’ attorneys effectively lay the foundation for litigation—is essentially unbroken.

Some like to describe web scraping as a “gray area of the law.” I don’t think that’s true. The laws around web scraping are as black and white as with any other legal domain. It’s just that few people know how to apply these laws, and that there’s a total disconnect between the law related to web scraping and social norms for how it is enforced.

The goal with this article will be to provide a lay introduction to web scraping legal issues in a way that is accessible to lawyers and non-lawyers alike.

  1. Introduction

There are a few websites online that purport to answer the question of “whether web scraping is legal.” And way too many of those websites, with unwavering confidence and a complete absence of caution, provide clear and concise answers to that question that are laughably and dangerously false.

One such website claims to have a “three-part test” to determine whether web scraping is legal.

But a web scraper could follow that test and still violate a dozen state and federal laws. The blog post would be certifiable legal malpractice—except it wasn’t written by a lawyer in the first place.

With the increasing importance of data collection from privately owned websites, web scraping has grown from a niche enterprise to a bona fide industry in the last decade.

As a lawyer (and a python programmer) who has worked with more than 50 clients in the web-scraping industry, from Y Combinator startups to Fortune 50 companies, I figured the internet was long overdue for a practitioner’s guide to web-scraping law that was grounded in the law. With that, I took the time to read every web-scraping case and scholarly journal on the subject published in the last ten years. I constantly monitor new and developing case law on web-scraping legal issues, and I review every new case published that touches on web scraping. I also have litigated this issue on behalf of both companies that engage in web scraping, and on behalf of companies looking to stop web scraping.

What you see here is the end-product of that research and experience.

In recent months, it has occurred to me (and some of my clients) that this comprehensive legal guide to web scraping is perfect fodder for training material for chatbots, machine-learning tools, and large language models. And since this article is directed at people who often have expertise in creating such tools, I think it is prudent to exercise caution in terms of how and when we share it with the public.

Not only that, but over the last year, almost 40% of the people who have requested it have been other attorneys. When I first wrote and published this article, that happened only a handful of times.

Also, because of how fast-moving this area of law is, this article requires constant work to keep it up to date.

When we first published this guide, we had only a few clients in this space. We were looking to attract attention and share our knowledge of this field with the general public. In the last three years since we published this guide, this area of law has become our primary area of practice.

In that sense, this article served its purpose. And while I still constantly monitor this field and write about it here and elsewhere, it no longer makes sense to make this particular guide available to the general public. It increasingly feels like we’re providing a free service to potential competitors and a substitute product for our main service.

If you’re not an attorney and just someone looking for information, I apologize.

Though not as comprehensive, as a sort of consolation prize, if you’re looking for general background on this topic, here are a couple of links you might find helpful.

Here is an article summarizing the conclusion of the most important case in the history of web scraping, hiQ Labs, Inc. v. LinkedIn Corp.

Here is an article poking fun of common misconceptions in this field of law.

If you ever need specific legal advice on this topic, we can only provide that in the context of formal legal representation. If you need that service, please reach out to us through our contact form below. Otherwise, I believe those two links above are the best primers you’re going to find online.

We appreciate your understanding.

MGL Website Contact Form

Form on new MGL site

  • Privacy Policy
  • This field is for validation purposes and should be left unchanged.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.