State-law interpretations of “knowledge” for online terms of use

//

With web scraping and the law, there are some things that almost everyone knows to be true. And there are some things that almost everyone thinks they know to be true that are actually false.

One of the things that almost everyone knows to be true is that web scraping is prohibited in nearly every online commercial terms-of-use agreement. But web scraping is still happening everywhere on massive commercial scale. Which leads to the obvious question of how all these people can be scraping so much online traffic when it is disallowed on almost every commercial website.

From a legal perspective, the key question is, “Are these online terms of use enforceable?”

The thing that people think they know to be true about scraping that is actually false is that scraping of public data is allowed and that scraping of logged-in data is prohibited. That truism is partly oversimplification and partly just wrong.

The public-private distinction matters in web scraping litigation. It’s an important fact that litigants will scream to the rooftops when arguing about what types of scraping should be allowed. But the reality is that “logged in” and “logged out” data is only legally outcome-determinative in a small fraction of web-scraping cases.

This is particularly true with online terms of use agreements. While there are differences as to how courts across the country interpret online terms in the context of web scraping, the public vs. private distinction does not explicitly factor into the analysis in any jurisdiction (though it may implicitly factor into judges’ reasoning). With online terms of use enforcement, the two factors that matter are: 1) notice and 2) assent.

The two legal questions to answer to determine whether an online agreement is enforceable are: 1) Did the user have actual or “constructive” notice of the online terms and 2) did the user agree to those terms?

On the question of notice, the main divide you’ll see is between 1) courts that are skeptical of browsewrap agreements and that will find reasons not to enforce them and 2) cases where courts are more willing to impute knowledge from repeated use, cease-and-desist letters, or technical blocks.

(For a more general legal background and a more complete analysis of online contractual issues, go here)

If you’re looking for a skeptical court in the context of scraping, pretty much the only place you’ll find them is in California, where Courts often refuse to enforce browsewrap terms unless the user had reasonably conspicuous notice and unambiguous assent. In the Ninth Circuit’s Nguyen v. Barnes & Noble (2014), a footer link to terms—even near the checkout button—was not enough to charge a user with constructive notice. The court emphasized conspicuousness and some tie between the notice and the user’s act.

That approach was reiterated in Berman v. Freedom Financial (9th Cir. 2022), which set up a two‑part test: 1) the site must provide clear and conspicuous notice that using the site means assent to terms, and 2) the user’s act (e.g., clicking “Continue”) must unambiguously manifest assent. Poor contrast, small fonts, or ambiguous labels defeat constructive notice.

With web scraping, the most recent examples of cases where courts declined to enforce online terms against scrapers were X Corp. v. CCDH and Meta vs. Bright Data, both cases out of the Northern District of California. In the CCDH case, the court refused to enforce online terms because the X Corp.’s alleged damages (mostly reputational harm) were not “reasonably foreseeable” from the alleged breach of the terms (caused by scraping). In the Meta v. Bright Data case, the court declined to enforce the online terms because of nuances in how the terms were drafted that made the contract unenforceable. But it was clear from the overall language and tenor of the judges’ opinions, that those courts did not want to enforce those online agreements.

But that’s rarely how things play out. Scraping cases often involve sophisticated repeat actors rather than one‑off consumers. Several courts have imputed knowledge (or found actual knowledge) in ways that differ from consumer browsewrap cases.

In Cairo v. Crossmedia (N.D. Cal. 2005), a scraper’s repeated and automated use of pages bearing “By continuing past this page… you agree” language was enough to impute knowledge and bind the scraper to a forum‑selection clause—even though no “I agree” button was clicked.

In Register.com v. Verio (2d Cir. 2004), the scraper queried the WHOIS service daily and each query came with a notice of terms. The Second Circuit found continued access with that notice created knowledge and supported enforcement.

In what is a very common fact-pattern for scrapers, cease‑and‑desist letters can and often do create (actual) knowledge and make browsewraps enforceable. In Southwest Airlines v. BoardFirst (N.D. Tex. 2007), the court enforced a browsewrap and issued an injunction partly because the scraper received cease‑and‑desist letters spelling out that using the site meant assent to terms—thus establishing at least actual knowledge; the court articulated the “actual or constructive knowledge” test for browsewrap validity. Courts in Texas have a history of enforcing these agreements even under very dubious circumstances.

In CouponCabin LLC v. Savings.com, Inc. (N.D. Ind. 2016 & 2017), during a motion to dismiss, the court refused to declare a browsewrap unenforceable where the complaint plausibly alleged the defendants kept scraping after technical blocking. When one defendant later sought judgment on the pleadings, the court again said that even if that defendant hadn’t received a direct C&D—revoking access through measures like IP blocks can supply constructive notice that further access is “without authorization” and support related state claims.

In DHI Group, Inc. v. Kent (S.D. Tex. 2017), the court denied a motion to dismiss a breach claim based on a browsewrap. Allegations that the scraper “knew or should have known” about the no-scraping terms—including because it used a similar browsewrap on its own site—were enough to plead assent. The court expressly characterizes the site terms as browsewrap and overrules the dismissal objection. The “scraper-has-its-own terms of use” rationale has been applied in other cases in Texas as well, which basically means that every company with a commercial website is bound by every other company’s online terms, always. (That’s not quite true, of course, but it might be close to true if you’re a web scraper litigating in front of a Texas judge).

When the defendant is a repeat, sophisticated scraper (or a bot operator), courts are more willing to find constructive (or actual) knowledge from repeated access, on‑page notices, and especially cease‑and‑desist correspondence—a posture notably different from the consumer‑design cases.

But a lot of that analysis often depends on whether the judge is sympathetic to open-internet arguments. Historically, California courts have been much more sympathetic to arguments like that than courts in other parts of the country.