a2aj/canadian-case-law · Datasets at Hugging Face (original) (raw)
Last updated: 2026-06-14
Maintainer: Access to Algorithmic Justice (A2AJ)
Dataset Summary
The A2AJ Canadian Case Law dataset provides bulk, open-access full-text decisions from Canadian courts and tribunals. Each row corresponds to a single case and contains the English and French versions of the decision where both are publicly available.
Rows also include citation network fields (cases_cited, cases_citing, and citing_cases_count), enabling citation analysis, precedent mapping, and network-based research across the corpus. See Citation Network below for details.
The project builds on an earlier version that was maintained by the Refugee Law Lab (RLL) and is now maintained by A2AJ, a research project co-hosted by York University's Osgoode Hall Law School and Toronto Metropolitan University's Lincoln Alexander School of Law.
The dataset is intended to support empirical legal research, legal-tech prototyping, and language-model pre-training in the public interest—especially work that advances access to justice for marginalised and low-income communities.
Dataset Structure (~ 223k cases)
| Code | Court / Tribunal / Reporter | First document - Last document | Rows |
|---|---|---|---|
| SCC | Supreme Court of Canada | 1877-01-15 – 2026-06-12 | 10,884 |
| FCA | Federal Court of Appeal | 2001-02-01 – 2026-06-11 | 7,769 |
| BCCA | British Columbia Court of Appeal | 1999-01-04 – 2026-06-12 | 14,590 |
| ONCA | Ontario Court of Appeal | 1998-06-08 – 2026-06-12 | 23,899 |
| NSCA | Nova Scotia Court of Appeal | 1993-01-04 – 2026-06-09 | 4,724 |
| YKCA | Yukon Court of Appeal | 2000-05-15 – 2026-04-27 | 275 |
| FC | Federal Court | 2001-02-01 – 2026-06-12 | 35,598 |
| TCC | Tax Court of Canada | 2003-01-21 – 2026-06-05 | 8,066 |
| CMAC | Court Martial Appeal Court | 2001-01-19 – 2026-05-19 | 154 |
| BCSC | Supreme Court of British Columbia | 2000-01-04 – 2026-06-11 | 51,854 |
| NSSC | Nova Scotia Supreme Court | 2001-01-04 – 2026-06-12 | 9,175 |
| NSPC | Nova Scotia Provincial Court | 2001-01-15 – 2026-05-26 | 1,599 |
| NSFC | Nova Scotia Family Court | 2001-02-02 – 2023-11-06 | 323 |
| NSSM | Nova Scotia Small Claims Court | 2001-08-30 – 2026-03-20 | 1,648 |
| CHRT | Canadian Human Rights Tribunal | 2003-01-10 – 2026-06-10 | 1,151 |
| CIRB | Canada Industrial Relations Board | 1995-12-08 – 2026-05-01 | 1,171 |
| CITT | Canadian International Trade Tribunal | 1980-01-01 – 2026-06-10 | 5,411 |
| CT | Competition Tribunal | 2000-02-17 – 2026-06-05 | 624 |
| FPSLREB | Federal Public Sector Labour Relations and Employment Board | 2003-01-03 – 2026-06-10 | 3,429 |
| OHSTC | Occupational Health and Safety Tribunal Canada | 1992-01-09 – 2025-03-06 | 811 |
| OIC | Information Commissioner of Canada | 2019-08-26 – 2026-04-09 | 325 |
| PSDPT | Public Service Disclosure Protection Tribunal | 2011-06-10 – 2025-05-21 | 29 |
| RAD | Refugee Appeal Division (IRB) | 2013-02-19 – 2025-09-12 | 14,153 |
| RPD | Refugee Protection Division (IRB) | 2002-07-16 – 2020-12-14 | 6,729 |
| RLLR | Refugee Law Lab Reporter (RPD, IRB) | 2019-01-07 – 2024-12-13 | 927 |
| SST | Social Security Tribunal | 2013-03-08 – 2026-05-27 | 17,729 |
Note: Counts are approximate and will drift as the dataset is updated.
Data Fields
| Field | Type | Description |
|---|---|---|
| dataset | string | Abbreviation identifying the court/tribunal (see above) |
| citation_en / citation_fr | string | Neutral citation in English / French |
| citation2_en / citation2_fr | string | Secondary citation(s) where available |
| name_en / name_fr | string | Style of cause |
| document_date_en / document_date_fr | datetime64[ns, UTC] | Decision date |
| url_en / url_fr | string | Source URL for the official online version |
| scraped_timestamp_en / scraped_timestamp_fr | datetime64[ns, UTC] | Timestamp when the page was scraped |
| unofficial_text_en / unofficial_text_fr | string | Full unofficial text of the decision |
| cases_cited_en / cases_cited_fr | list[string] | Neutral citations referenced in the decision text (outbound) |
| cases_citing_en / cases_citing_fr | list[string] | Cases in the corpus whose text cites this decision (inbound) |
| citing_cases_count | int64 | Number of distinct cases in the corpus that cite this decision |
| upstream_license | string | License terms from the source court/tribunal |
Missing values are represented as empty strings ("") or NaN for the string and datetime fields, and as nulls (None / <NA>) for the citation network fields (cases_cited_*, cases_citing_*, citing_cases_count).
Citation Network
Case law rows include automatically extracted citation network data:
- cases_cited (outbound): Unique neutral citations (e.g.,
2020 SCC 5) detected in the decision's full text, listed in order of first appearance, with self-citations excluded. Cited cases are included whether or not they appear in the corpus (e.g., citations to decisions of courts we do not cover). The field is null when there is no text in that language. - cases_citing (inbound): Cases in the corpus whose full text cites this decision's neutral citation. Inbound citations can only be detected for decisions that themselves have a neutral citation, so this field (and
citing_cases_count) is null for decisions identified only by docket-style citations (e.g., CITT/TCCE dockets) or other non-neutral citations. - citing_cases_count: The number of distinct cases in the corpus that cite this decision. A case citing both the English and French versions of a decision is counted once.
Citation network limitations:
- Extraction is based on pattern matching of neutral citations only. Citations using traditional reporters (e.g.,
[1999] 2 S.C.R. 817) are not captured, so decisions pre-dating neutral citations (generally pre-2000) are under-represented in the network, as are tribunals and courts that do not use neutral citations (e.g., CITT/TCCE). - Inbound citation data (
cases_citing,citing_cases_count) reflects only the corpus: citations from courts, tribunals, or time periods not covered by our datasets are not counted. - Text extraction artifacts (e.g., OCR or formatting issues) may cause occasional missed or spurious citations.
Data Languages
Where available, rows include both English and French texts. Where only one language is published, the fields for the other language are empty.
Data Splits
All rows are provided in a single train split.
Data Loading
from datasets import load_dataset
import pandas as pd
# load decisions for a specific court / tribunal (e.g. Supreme Court of Canada)
cases = load_dataset("a2aj/canadian-case-law", data_dir = "SCC", split="train")
## ALTERNATIVELY
## load the entire corpus
# cases = load_dataset("a2aj/canadian-case-law", split="train")
# convert to df
df = cases.to_pandas()
df.head(5)
The dataset is also offered in Parquet format for fast local use. Files are in subfolders with the court/tribunal names.
Dataset Creation
Curation Rationale
Building on the RLL's earlier bulk-data programme, A2AJ is collecting and sharing Canadian legal data to:
- democratise access to Canadian jurisprudence;
- enable large-scale empirical legal studies; and
- support responsible AI development for the justice sector.
We scrape data only where we are permitted to do so by terms of service of websites that host the data. We also obtain some additional data directly from courts and tribunals.
Source Data & Normalisation
Cases are scraped directly from the official websites of the respective courts and tribunals, or are obtained directly from the tribunals through email or other distribution processes. Where possible, text is stored verbatim with minimal normalisation (e.g. HTML → plain text, whitespace cleanup).
Personal & Sensitive Information
Court and tribunal decisions can contain sensitive personal information. Although all documents are already public, easy bulk access increases privacy risk—particularly for refugees, criminal-justice-involved persons and other marginalised groups. Users who reproduce the data from this dataset are responsible for complying with relevant privacy laws, as well as other laws relating to disseminating information such as publication bans.
Non-Official Versions & Disclaimer
The texts here are unofficial copies. For authoritative versions, consult the URLs in url_en / url_fr.
Non-Affiliation / Endorsement
A2AJ and the production of this dataset are not affiliated with, nor endorsed by, the Government of Canada, provincial courts, or the listed tribunals.
Considerations for Using the Data
- Social Impact. Open legal data can reduce information asymmetries but also facilitate surveillance or discriminatory profiling. We encourage downstream users to collaborate with community organisations and ensure that derivative tools advance—rather than undermine—access to justice.
- Bias & Representativeness. Published decisions are not a random sample of disputes. For example, positive administrative decisions are less likely to be appealed and thus under-represented in court records. Models trained on this corpus may therefore skew negative.
- Limitations. The dataset excludes annexes, schedules and appendices that are sometimes attached as separate PDFs. Long historical gaps exist for some courts (e.g. ONCA pre-1990). Citation network fields are extracted automatically and are approximate (see Citation Network above).
Licensing Information
The code used to create the dataset by the A2AJ and any work on the dataset undertaken by the A2AJ is subject to an open source MIT license.
Users must also comply with upstream licenses found in the upstream_license field in the dataset for each document, which reflects the licenses through which the A2AJ obtained the document. These upstream licenses may include limits on commercial use, as well as other limitations.
The A2AJ is committed to open source methodologies, and we are actively working to obtain more permissive licenses for this data.
Warranties / Representations
While we make best efforts to ensure the completeness and accuracy of the dataset, we provide no warranties regarding completeness or accuracy. The data were collected through automated processes and may contain errors. Always verify data against the official source.
Dataset Curators
- Sean Rehaag - Co-Director, A2AJ
- Simon Wallace - Co-Director, A2AJ
- Contact: a2aj@yorku.ca
Citation
Sean Rehaag & Simon Wallace, "A2AJ Canadian Case Law" (2025), online: Hugging Face Datasets https://huggingface.co/datasets/a2aj/canadian-case-law (updated 2026).
Acknowledgements
This research output is supported in part by funding from the Law Foundation of Ontario and the Social Sciences and Humanities Research Council of Canada, by in-kind compute from the Digital Research Alliance of Canada and by administrative support from the Centre for Refugee Studies, the Refugee Law Lab, and Osgoode Hall Law School. We also thank the Canadian judiciary and tribunal staff who publish decisions in open formats.
Downloads last month
32,354