GitHub - brave/cookiecrumbler: Automatically detect cookie consent notices (original) (raw)
Cookiecrumbler automatically detects cookie consent notices on Web pages. It's intended to help with both detection of cookie consent notices that we don't currently block, and to identify webcompat reports as being related to cookie consent notice blocking.
Warning
This tool navigates to arbitrary URLs using a real browser and should only be run in isolated environments. Exposing it on a network may introduce Server-Side Request Forgery (SSRF) risks, as an attacker could use it to reach internal services or metadata endpoints. Always deploy behind appropriate network-level controls and never run it on a host with access to sensitive internal resources.
Deployment status
Cookiecrumbler is currently being developed as a Web app which will help us run crawls and/or integrate with the webcompat reporter backend. It can also be used as a library. In the future, we could even bundle it in the browser.
Setup
Local Setup
- Install Brave Browser.
- Install dependencies and setup browser profiles:
npm install npm run setup -- /path/to/brave
Note: If browser profiles need to be updated, remove the profiles directory and run the setup again.
Docker Setup
- Ensure you have Docker and Docker Compose installed.
- Run the initial setup:
cp .env.example .env docker compose run --rm --entrypoint ./docker_setup.sh brave
Note: If browser profiles need to be updated, run the setup command again.
Running
Local
Start the server:
You can customize the browser and port:
npm run serve -- /usr/bin/brave-nightly 8000
Docker
- Basic setup (without LLM support):
- With LLM support via Ollama:
docker compose --profile ollama up brave_ollama
- With LLM support via AWS Bedrock:
aws-vault exec cookiecrumbler-bedrock -- docker compose --profile litellm up
Note
You will need to set up the AWS profile in your environment.
For all setups, visit localhost:3000 in your browser.
Using as a library
import { checkPage } from 'cookiecrumbler';
const result = await checkPage({ url: 'https://example.com', // URL to visit seconds: 4, // delay before checking for a notice interactive: false, // show the browser while running? executablePath: '/path/to/binary', // what browser to run adblockLists: { // enable/disable filter lists by component id 'cdbbhgbmjhfnhnmgeddbliobbofkgdhe': false, }, screenshot: true, // return images of detected notices or full page (see Screenshots section) markup: true, // return markup of detected notices });
Screenshots
The screenshot parameter can be set to true, false, always, fullPage, the behavior is summarized in the following table:
| value | element detected | no element detected |
|---|---|---|
| true | 🎯 | ❌ |
| false | ❌ | ❌ |
| always | 🎯 | 📄 |
| fullPage | 📄 | 📄 |
Legend:
🎯 - Screenshot of detected element
📄 - Screenshot of full page
❌ - No screenshot
Testing
You can also pass a path to a different browser binary if necessary:
npm run test -- /usr/bin/brave-nightly
Generating test cases
There is a set of regression test cases in the testcases directory.
Each test case is a single self-contained HTML file. These files can be generated using a tool like nodeSavePageWE.