Probably around half of all internet traffic comes from bots or scraping. Despite that fact, there isn’t much guidance out there for how to comply with privacy laws when scraping.
Notably, the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) impose strict compliance requirements on companies collecting, storing, and processing user data. Non-compliance can result in severe penalties, including hefty fines and legal action.
As an attorney who has advised over 100 clients on web scraping and data access issues, privacy isn’t usually the first risk to mitigate. The first risk to mitigate as a scraper is being sued out of existence.
But privacy legal issues are an important secondary risk. The bigger and more successful you are, the more that privacy laws could affect you. And the more you’re in flagrant violation of privacy laws, namely by collecting and selling PII without consent, the more you’re likely to not only face the brunt of regulators, but also class-action attorneys. That can get expensive fast!
This post outlines the key steps web-scraping companies must take to maximize compliance with key privacy laws such as GDPR and CCPA.
1. Figure Out Whether You Are Required to Comply with the Various Privacy Laws
Step one is to figure out whether you meet the threshold requirements that make you subject to the various privacy laws. For example, if you don’t collect the PII of data subjects in Europe, or process or control data in the EU, then GDPR likely doesn’t apply to you. There are threshold requirements for the CCPA, COPPA, Canadian privacy laws, the privacy laws of Virginia and Colorado, and nearly every other law. If you aren’t within the relevant thresholds, you can likely stop there.
There are also data broker laws in Texas, California, and Vermont. You need to know if those apply to you.
2. Understand What Constitutes Personal Data
GDPR Definition:
Under the GDPR, personal data includes any information that can directly or indirectly identify an individual. This includes names, IP addresses, email addresses, and location data.
CCPA Definition:
CCPA defines personal data similarly but extends it to include household data and browsing history, making compliance even broader.
3. Draft and Post a Privacy Policy that is Fully Transparent and Accurately Reflects What You Actually Do
If you collect any PII, your privacy policy should clearly state:
- What data you collect
- How you collect it (including the use of web scraping tools)
- How the data is used and stored
- How users can request data deletion or opt-out of collection
But perhaps most importantly, your privacy policy must be an accurate reflection of your business and what you actually do. Too many businesses copy and paste privacy policies from other companies’ sites or obtain them from online generators without doing the diligence necessary to confirm whether they are an accurate reflection of their business.
Ensure the policy is easily accessible on your website and regularly updated to reflect compliance with evolving regulations, and even more importantly, how your business evolves.
4. Emphasize Compliance When You Can
When viewed in totality, many web-scraping businesses cannot achieve 100% compliance with all states and international governments’ privacy laws. It’s just not compatible with their business model. But even if you cannot obtain 100% compliance, you can always emphasize the ways in which you do comply with those laws.
For example, under GDPR, users have rights such as:
- Right to Access: Users can request what data has been collected about them.
- Right to Rectification: Users can correct inaccurate data.
- Right to Erasure (Right to Be Forgotten): Users can request their data be deleted.
- Right to Object: Users can oppose data collection.
Under CCPA, businesses must:
- Provide a “Do Not Sell My Personal Information” link on their homepage.
- Allow consumers to opt out of data collection and sales.
- Respond to data access and deletion requests within 45 days.
It’s rare for businesses to get these requests, so the burden associated with them is usually minimal. But if you can show regulators that you comply with the laws when you can, it may go a long way to mitigating your risk if/when you do experience a conflict with regulators.
5. Get Help
The GDPR and the CCPA are among the most complicated laws in existence. This is not an area of the law where it’s likely prudent to go it alone, particularly if you’re a web scraper where the odds of compliance are stacked against you.
What’s more, class-action lawyers are coming up with creative ways to tie in various state laws to address privacy-law violations. And so it’s not just regulators that are getting worked up over privacy issues.
Conclusion
Web-scraping companies must take proactive steps to comply with the GDPR, the CCPA, and other state privacy laws. By understanding privacy laws, providing transparency, and implementing smart cost-benefit analyses, you can minimize legal risks while continuing to extract valuable data legally. And you may be able to sleep better knowing you’ve done all you can to comply with the relevant laws.