Social Security Tribunals Bulk Decisions Dataset (original) (raw)

Description: This is a bulk open-access dataset in JSON, parquet and Hugging Face dataset formats with the full text of Social Security Tribunals of Canada (SST) decisions. The process through which data is processed and code snippets for loading the data are available in a repository on the Refugee Law Lab GitHub.

Data: https://github.com/Refugee-Law-Lab/sst_bulk_data/blob/master/DATA/yearly

Code Repository: https://github.com/Refugee-Law-Lab/sst_bulk_data

Current Coverage: 2013-Present

Number of Decisions: ~26,500

Languages: English & French

Format: JSON, Parquet, Hugging Face Dataset

License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Citation: Sean Rehaag, “SST Bulk Decisions Dataset” (2023), online: Refugee Law Laboratory https://refugeelab.ca/bulk-data/sst

Programmatic Access in Python (via Hugging Face Datasets):

from datasets import load_dataset import pandas as pd

dataset = load_dataset("refugee-law-lab/canadian-legal-data", "SST", split="train")

convert to dataframe

df = pd.DataFrame(dataset) df

Programmatic Access to in Python (via Parquet):

import pandas as pd import requests from io import BytesIO

url = 'https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data/resolve/main/SST/train.parquet'

load data

results = requests.get(url)

convert to dataframe

df = pd.read_parquet(BytesIO(results.content)) df

Programmatic Access in Python (JSON via GitHub):

import pandas as pd
import requests
import json

start_year = 2013 # First year of data sought (2013+)
end_year = 2023 # Last year of data sought (2023 -)
base_ulr = 'https://raw.githubusercontent.com/Refugee-Law-Lab/sst_bulk_data/master/DATA/YEARLY/' 

results = [] 
     for year in range(start_year, end_year+1): 
     url = base_ulr + f'{year}.json'
     results.extend(requests.get(url).json()) 

df = pd.DataFrame(results)
df