GitHub - brave/sugarcoat-pipeline: CLI that implements the SugarCoat pipeline (original) (raw)

SugarCoat is a tool that allows filterlist authors to automatically patch JavaScript scripts to restrict their access to sensitive data according to a custom privacy policy. Check out the blog post and paper!

This repo is an implementation of the SugarCoat pipeline. It uses pagegraph-crawl to crawl a given website and generate PageGraph graphs, pagegraph-rust-cli to get JavaScript script sources that match adblock rules from the generated graphs, and sugarcoat for the actual patching of JavaScript scripts.

You can specify which sensitive Web APIs to block access to in policy.json (example). All SugarCoat pipeline output is generated in output/ by default (can be changed via CLI argument). Patched scripts go in output/sugarcoated_scripts and the generated EasyList-style filter rules in output/sugarcoat_rules.txt.

Setup

  1. Git clone this repo:

git clone https://github.com/brave-experiments/sugarcoat-pipeline cd sugarcoat-pipeline

  1. You need the Rust and Cargo toolchain setup in order to use the SugarCoat pipeline. The pagegraph-rust-cli Rust binary is built using Cargo as part of the post-installation phase.
  2. To install the NPM dependencies:

Note that the minimum Node version required is 14.18.1.

  1. You will also need a working PageGraph binary (an instrumented version of the Brave browser) to crawl the website you want to sugarcoat and generate .graphml files that are then analyzed for scripts. You can build a binary following the wiki instructions, or you can download one for Intel Macs from the Release page here. Remember to unzip it! Alternatively, on the command line:

For Mac

Download the latest Mac Intel zip (and follow redirect)

curl -L https://github.com/brave-experiments/sugarcoat-pipeline/releases/latest/download/pagegraph-mac-intel.zip -o pagegraph-mac-intel.zip unzip pagegraph-mac-intel.zip rm pagegraph-mac-intel.zip

  1. (optional) You will need a local copy of a filter list - you can get the latest copy of the easylist filterlist here, easyprivacy here or uBlockOrigin Unbreak here. Alternatively, there's copies in the repo.

curl -s https://easylist.to/easylist/easylist.txt -o easylist.txt curl -s https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/unbreak.txt -o unbreak.txt curl -s https://easylist.to/easylist/easyprivacy.txt -o easyprivacy.txt

Usage

npm run sugarcoat-pipeline -- -b -u -t -l

The filterlists can be space-separated i.e. -l easylist.txt unbreak.txt.

Example:

For Mac

npm run sugarcoat-pipeline -- -b pagegraph-mac-intel.app/Contents/MacOS/Brave\ Browser\ Development -t 10 -l easylist.txt unbreak.txt easyprivacy.txt -o output -u https://metacritic.com

(note that on macOS the binary has to be the executable under the .app).

Now check output/ (is auto-generated).

Help

$ npm run sugarcoat-pipeline -- -h

sugarcoat-pipeline@0.1.0 sugarcoat-pipeline node sugarcoat-pipeline.js "-h"

usage: sugarcoat-pipeline.js [-h] [-b BINARY] [-u URL] [-t SECS] [-d] -l FILTER_LISTS [FILTER_LISTS ...] [-p POLICY] [-o OUTPUT] [-g GRAPHS_DIR_OVERRIDE] [-k] [-r RETRIES] [-m] [-s]

SugarCoat pipeline CLI

optional arguments: -h, --help show this help message and exit -b BINARY, --binary BINARY Path to the PageGraph-enabled build of Brave -u URL, --url URL The URL to record. -t SECS, --secs SECS The dwell time in seconds. Default: 30 seconds -d, --debug Print debugging information -l FILTER_LISTS [FILTER_LISTS ...], --filter-lists FILTER_LISTS [FILTER_LISTS ...] Filter lists to use -p POLICY, --policy POLICY Path to policy file. Default: policy.json -o OUTPUT, --output OUTPUT Path to output directory. All generated files go here. Default: output -g GRAPHS_DIR_OVERRIDE, --graphs-dir-override GRAPHS_DIR_OVERRIDE Path to graphs directory. If set, skips PageGraph generation -k, --keep Do not erase intermediary files generated in output for sugarcoat -r RETRIES, --retries RETRIES Number of times a URL is attempted to be re-crawled on failure. Default: 5 -m, --no-minify Do not minify generated SugarCoat script. -s, --keep-original-script-name Keep original script name instead of setting it to be hash of contents.

Feedback

Something not working? Please raise an issue.

Testing

This project uses mocha for tests.

To run in debug mode,

To run a specific test,

npm run test -- -g "simple"

To run tests in debug mode: