GitHub - brave/sugarcoat-pipeline: CLI that implements the SugarCoat pipeline (original) (raw)
SugarCoat is a tool that allows filterlist authors to automatically patch JavaScript scripts to restrict their access to sensitive data according to a custom privacy policy. Check out the blog post and paper!
This repo is an implementation of the SugarCoat pipeline. It uses pagegraph-crawl to crawl a given website and generate PageGraph graphs, pagegraph-rust-cli to get JavaScript script sources that match adblock rules from the generated graphs, and sugarcoat for the actual patching of JavaScript scripts.
You can specify which sensitive Web APIs to block access to in policy.json (example). All SugarCoat pipeline output is generated in output/ by default (can be changed via CLI argument). Patched scripts go in output/sugarcoated_scripts and the generated EasyList-style filter rules in output/sugarcoat_rules.txt.
Setup
- Git clone this repo:
git clone https://github.com/brave-experiments/sugarcoat-pipeline cd sugarcoat-pipeline
- You need the Rust and Cargo toolchain setup in order to use the SugarCoat pipeline. The
pagegraph-rust-cliRust binary is built using Cargo as part of the post-installation phase. - To install the NPM dependencies:
Note that the minimum Node version required is 14.18.1.
- You will also need a working PageGraph binary (an instrumented version of the Brave browser) to crawl the website you want to sugarcoat and generate
.graphmlfiles that are then analyzed for scripts. You can build a binary following the wiki instructions, or you can download one for Intel Macs from the Release page here. Remember to unzip it! Alternatively, on the command line:
For Mac
Download the latest Mac Intel zip (and follow redirect)
curl -L https://github.com/brave-experiments/sugarcoat-pipeline/releases/latest/download/pagegraph-mac-intel.zip -o pagegraph-mac-intel.zip unzip pagegraph-mac-intel.zip rm pagegraph-mac-intel.zip
- (optional) You will need a local copy of a filter list - you can get the latest copy of the easylist filterlist here, easyprivacy here or uBlockOrigin Unbreak here. Alternatively, there's copies in the repo.
curl -s https://easylist.to/easylist/easylist.txt -o easylist.txt curl -s https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/unbreak.txt -o unbreak.txt curl -s https://easylist.to/easylist/easyprivacy.txt -o easyprivacy.txt
Usage
npm run sugarcoat-pipeline -- -b -u -t -l
The filterlists can be space-separated i.e. -l easylist.txt unbreak.txt.
Example:
For Mac
npm run sugarcoat-pipeline -- -b pagegraph-mac-intel.app/Contents/MacOS/Brave\ Browser\ Development -t 10 -l easylist.txt unbreak.txt easyprivacy.txt -o output -u https://metacritic.com
(note that on macOS the binary has to be the executable under the .app).
Now check output/ (is auto-generated).
Help
$ npm run sugarcoat-pipeline -- -h
sugarcoat-pipeline@0.1.0 sugarcoat-pipeline node sugarcoat-pipeline.js "-h"
usage: sugarcoat-pipeline.js [-h] [-b BINARY] [-u URL] [-t SECS] [-d] -l FILTER_LISTS [FILTER_LISTS ...] [-p POLICY] [-o OUTPUT] [-g GRAPHS_DIR_OVERRIDE] [-k] [-r RETRIES] [-m] [-s]
SugarCoat pipeline CLI
optional arguments: -h, --help show this help message and exit -b BINARY, --binary BINARY Path to the PageGraph-enabled build of Brave -u URL, --url URL The URL to record. -t SECS, --secs SECS The dwell time in seconds. Default: 30 seconds -d, --debug Print debugging information -l FILTER_LISTS [FILTER_LISTS ...], --filter-lists FILTER_LISTS [FILTER_LISTS ...] Filter lists to use -p POLICY, --policy POLICY Path to policy file. Default: policy.json -o OUTPUT, --output OUTPUT Path to output directory. All generated files go here. Default: output -g GRAPHS_DIR_OVERRIDE, --graphs-dir-override GRAPHS_DIR_OVERRIDE Path to graphs directory. If set, skips PageGraph generation -k, --keep Do not erase intermediary files generated in output for sugarcoat -r RETRIES, --retries RETRIES Number of times a URL is attempted to be re-crawled on failure. Default: 5 -m, --no-minify Do not minify generated SugarCoat script. -s, --keep-original-script-name Keep original script name instead of setting it to be hash of contents.
Feedback
Something not working? Please raise an issue.
Testing
This project uses mocha for tests.
To run in debug mode,
To run a specific test,
npm run test -- -g "simple"
To run tests in debug mode: