GitHub - brave/cookiecrumbler: Automatically detect cookie consent notices (original) (raw)

License

Cookiecrumbler automatically detects cookie consent notices on Web pages. It's intended to help with both detection of cookie consent notices that we don't currently block, and to identify webcompat reports as being related to cookie consent notice blocking.

Warning

This tool navigates to arbitrary URLs using a real browser and should only be run in isolated environments. Exposing it on a network may introduce Server-Side Request Forgery (SSRF) risks, as an attacker could use it to reach internal services or metadata endpoints. Always deploy behind appropriate network-level controls and never run it on a host with access to sensitive internal resources.

Deployment status

Cookiecrumbler is currently being developed as a Web app which will help us run crawls and/or integrate with the webcompat reporter backend. It can also be used as a library. In the future, we could even bundle it in the browser.

Setup

Local Setup

  1. Install Brave Browser.
  2. Install dependencies and setup browser profiles:

npm install npm run setup -- /path/to/brave

Note: If browser profiles need to be updated, remove the profiles directory and run the setup again.

Docker Setup

  1. Ensure you have Docker and Docker Compose installed.
  2. Run the initial setup:

cp .env.example .env docker compose run --rm --entrypoint ./docker_setup.sh brave

Note: If browser profiles need to be updated, run the setup command again.

Running

Local

Start the server:

You can customize the browser and port:

npm run serve -- /usr/bin/brave-nightly 8000

Docker

  1. Basic setup (without LLM support):
  2. With LLM support via Ollama:

docker compose --profile ollama up brave_ollama

  1. With LLM support via AWS Bedrock:

aws-vault exec cookiecrumbler-bedrock -- docker compose --profile litellm up

Note

You will need to set up the AWS profile in your environment.

For all setups, visit localhost:3000 in your browser.

Using as a library

import { checkPage } from 'cookiecrumbler';

const result = await checkPage({ url: 'https://example.com', // URL to visit seconds: 4, // delay before checking for a notice interactive: false, // show the browser while running? executablePath: '/path/to/binary', // what browser to run adblockLists: { // enable/disable filter lists by component id 'cdbbhgbmjhfnhnmgeddbliobbofkgdhe': false, }, screenshot: true, // return images of detected notices or full page (see Screenshots section) markup: true, // return markup of detected notices });

Screenshots

The screenshot parameter can be set to true, false, always, fullPage, the behavior is summarized in the following table:

value element detected no element detected
true 🎯
false
always 🎯 📄
fullPage 📄 📄

Legend:
🎯 - Screenshot of detected element
📄 - Screenshot of full page
❌ - No screenshot

Testing

You can also pass a path to a different browser binary if necessary:

npm run test -- /usr/bin/brave-nightly

Generating test cases

There is a set of regression test cases in the testcases directory.

Each test case is a single self-contained HTML file. These files can be generated using a tool like nodeSavePageWE.