a2aj/canadian-case-law · Datasets at Hugging Face (original) (raw)

Last updated: 2026-06-14

Maintainer: Access to Algorithmic Justice (A2AJ)


Dataset Summary

The A2AJ Canadian Case Law dataset provides bulk, open-access full-text decisions from Canadian courts and tribunals. Each row corresponds to a single case and contains the English and French versions of the decision where both are publicly available.

Rows also include citation network fields (cases_cited, cases_citing, and citing_cases_count), enabling citation analysis, precedent mapping, and network-based research across the corpus. See Citation Network below for details.

The project builds on an earlier version that was maintained by the Refugee Law Lab (RLL) and is now maintained by A2AJ, a research project co-hosted by York University's Osgoode Hall Law School and Toronto Metropolitan University's Lincoln Alexander School of Law.

The dataset is intended to support empirical legal research, legal-tech prototyping, and language-model pre-training in the public interest—especially work that advances access to justice for marginalised and low-income communities.


Dataset Structure (~ 223k cases)

Code Court / Tribunal / Reporter First document - Last document Rows
SCC Supreme Court of Canada 1877-01-15 – 2026-06-12 10,884
FCA Federal Court of Appeal 2001-02-01 – 2026-06-11 7,769
BCCA British Columbia Court of Appeal 1999-01-04 – 2026-06-12 14,590
ONCA Ontario Court of Appeal 1998-06-08 – 2026-06-12 23,899
NSCA Nova Scotia Court of Appeal 1993-01-04 – 2026-06-09 4,724
YKCA Yukon Court of Appeal 2000-05-15 – 2026-04-27 275
FC Federal Court 2001-02-01 – 2026-06-12 35,598
TCC Tax Court of Canada 2003-01-21 – 2026-06-05 8,066
CMAC Court Martial Appeal Court 2001-01-19 – 2026-05-19 154
BCSC Supreme Court of British Columbia 2000-01-04 – 2026-06-11 51,854
NSSC Nova Scotia Supreme Court 2001-01-04 – 2026-06-12 9,175
NSPC Nova Scotia Provincial Court 2001-01-15 – 2026-05-26 1,599
NSFC Nova Scotia Family Court 2001-02-02 – 2023-11-06 323
NSSM Nova Scotia Small Claims Court 2001-08-30 – 2026-03-20 1,648
CHRT Canadian Human Rights Tribunal 2003-01-10 – 2026-06-10 1,151
CIRB Canada Industrial Relations Board 1995-12-08 – 2026-05-01 1,171
CITT Canadian International Trade Tribunal 1980-01-01 – 2026-06-10 5,411
CT Competition Tribunal 2000-02-17 – 2026-06-05 624
FPSLREB Federal Public Sector Labour Relations and Employment Board 2003-01-03 – 2026-06-10 3,429
OHSTC Occupational Health and Safety Tribunal Canada 1992-01-09 – 2025-03-06 811
OIC Information Commissioner of Canada 2019-08-26 – 2026-04-09 325
PSDPT Public Service Disclosure Protection Tribunal 2011-06-10 – 2025-05-21 29
RAD Refugee Appeal Division (IRB) 2013-02-19 – 2025-09-12 14,153
RPD Refugee Protection Division (IRB) 2002-07-16 – 2020-12-14 6,729
RLLR Refugee Law Lab Reporter (RPD, IRB) 2019-01-07 – 2024-12-13 927
SST Social Security Tribunal 2013-03-08 – 2026-05-27 17,729

Note: Counts are approximate and will drift as the dataset is updated.

Data Fields

Field Type Description
dataset string Abbreviation identifying the court/tribunal (see above)
citation_en / citation_fr string Neutral citation in English / French
citation2_en / citation2_fr string Secondary citation(s) where available
name_en / name_fr string Style of cause
document_date_en / document_date_fr datetime64[ns, UTC] Decision date
url_en / url_fr string Source URL for the official online version
scraped_timestamp_en / scraped_timestamp_fr datetime64[ns, UTC] Timestamp when the page was scraped
unofficial_text_en / unofficial_text_fr string Full unofficial text of the decision
cases_cited_en / cases_cited_fr list[string] Neutral citations referenced in the decision text (outbound)
cases_citing_en / cases_citing_fr list[string] Cases in the corpus whose text cites this decision (inbound)
citing_cases_count int64 Number of distinct cases in the corpus that cite this decision
upstream_license string License terms from the source court/tribunal

Missing values are represented as empty strings ("") or NaN for the string and datetime fields, and as nulls (None / <NA>) for the citation network fields (cases_cited_*, cases_citing_*, citing_cases_count).

Citation Network

Case law rows include automatically extracted citation network data:

Citation network limitations:

Data Languages

Where available, rows include both English and French texts. Where only one language is published, the fields for the other language are empty.

Data Splits

All rows are provided in a single train split.


Data Loading

from datasets import load_dataset
import pandas as pd

# load decisions for a specific court / tribunal (e.g. Supreme Court of Canada)
cases = load_dataset("a2aj/canadian-case-law", data_dir = "SCC", split="train")

## ALTERNATIVELY
## load the entire corpus
# cases = load_dataset("a2aj/canadian-case-law", split="train")

# convert to df
df = cases.to_pandas()

df.head(5)

The dataset is also offered in Parquet format for fast local use. Files are in subfolders with the court/tribunal names.


Dataset Creation

Curation Rationale

Building on the RLL's earlier bulk-data programme, A2AJ is collecting and sharing Canadian legal data to:

We scrape data only where we are permitted to do so by terms of service of websites that host the data. We also obtain some additional data directly from courts and tribunals.

Source Data & Normalisation

Cases are scraped directly from the official websites of the respective courts and tribunals, or are obtained directly from the tribunals through email or other distribution processes. Where possible, text is stored verbatim with minimal normalisation (e.g. HTML → plain text, whitespace cleanup).

Personal & Sensitive Information

Court and tribunal decisions can contain sensitive personal information. Although all documents are already public, easy bulk access increases privacy risk—particularly for refugees, criminal-justice-involved persons and other marginalised groups. Users who reproduce the data from this dataset are responsible for complying with relevant privacy laws, as well as other laws relating to disseminating information such as publication bans.

Non-Official Versions & Disclaimer

The texts here are unofficial copies. For authoritative versions, consult the URLs in url_en / url_fr.

Non-Affiliation / Endorsement

A2AJ and the production of this dataset are not affiliated with, nor endorsed by, the Government of Canada, provincial courts, or the listed tribunals.


Considerations for Using the Data


Licensing Information

The code used to create the dataset by the A2AJ and any work on the dataset undertaken by the A2AJ is subject to an open source MIT license.

Users must also comply with upstream licenses found in the upstream_license field in the dataset for each document, which reflects the licenses through which the A2AJ obtained the document. These upstream licenses may include limits on commercial use, as well as other limitations.

The A2AJ is committed to open source methodologies, and we are actively working to obtain more permissive licenses for this data.


Warranties / Representations

While we make best efforts to ensure the completeness and accuracy of the dataset, we provide no warranties regarding completeness or accuracy. The data were collected through automated processes and may contain errors. Always verify data against the official source.


Dataset Curators


Citation

Sean Rehaag & Simon Wallace, "A2AJ Canadian Case Law" (2025), online: Hugging Face Datasets https://huggingface.co/datasets/a2aj/canadian-case-law (updated 2026).


Acknowledgements

This research output is supported in part by funding from the Law Foundation of Ontario and the Social Sciences and Humanities Research Council of Canada, by in-kind compute from the Digital Research Alliance of Canada and by administrative support from the Centre for Refugee Studies, the Refugee Law Lab, and Osgoode Hall Law School. We also thank the Canadian judiciary and tribunal staff who publish decisions in open formats.

Downloads last month

32,354