LegalVis: Exploring and Inferring Precedent Citations in Legal Documents (original) (raw)

Using textual similarity to identify legal precedents: appraising machine learning models for administrative courts

Precedent is the cornerstone of the Common law system. Even in jurisdictions that follow Civil law, precedents constrain decisions when case law is sufficiently uniform. A systematic disregard of precedents makes judgments less coherent and the law less just. Nevertheless, relying on precedents can also make courts more efficient, whereas recent advances in natural language processing (NLP) and machine learning (ML) open doors for solutions to automated and reliable identification of similar cases. In this study, we investigated more than a hundred combinations of document representations and textual vectorization models to assess whether pairs of cases identified by the machine satisfy the human notion of similarity. To this point, analogous models have been evaluated using tiny validation samples. We used a statistically significant sample evaluated by legal experts from an administrative court in Brazil, constituting a gold standard sample. We also propose using evaluation metric...

Automatic identification of similar judicial precedents

Anais do X Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2022)

Brazilian Code of Civil Procedure has been reformulated in 2015 and created new institutes of judicial precedents to allow the Courts of Appeal to decide about similar cases based on one main case, which is considered the paradigm for similar cases that remain suspended. This mechanism aims to avoid legal uncertainty in the lower courts, but, uncertainty can be taken to the Courts of Appeal, since different courts can judge similar legal matter in the opposite way. The identification of similar judicial cases is hard because Courts of Appeal work independently and the number of cases is high. We propose the use of computational intelligence techniques to automatically identify similar judicial precedents. Our hypothesis is that algorithms based on semantic approaches, such as Latent Semantic Indexing and Latent Dirichlet Allocation, perform better than those that use only syntactic approach, as (Okapi) BM25 ranking function. The best-performing model is extended with named entities ...

Interpretable Approach in the Classification of Sequences of Legal Texts

arXiv (Cornell University), 2020

The objective of this paper is to develop predictive models to classify Brazilian legal proceedings in three possible classes of status: (i) archived proceedings, (ii) active proceedings, and (iii) suspended proceedings. This problem's resolution is intended to assist public and private institutions in managing large portfolios of legal proceedings, providing gains in scale and efficiency. In this paper, legal proceedings are made up of sequences of short texts called "motions. " We combined several natural language processing (NLP) and machine learning techniques to solve the problem. Although working with Portuguese NLP, which can be challenging due to lack of resources, our approaches performed remarkably well in the classification task, achieving maximum accuracy of .93 and top average F1 Scores of .89 (macro) and .93 (weighted). Furthermore, we could extract and interpret the patterns learned by one of our models besides quantifying how those patterns relate to the classification task. The interpretability step is important among machine learning legal applications and gives us an exciting insight into how black-box models make decisions.

Automation of legal precedents retrieval: findings from a rapid literature review

Judges frequently rely their reasoning on precedents. In every circumstance, courts must preserve uniformity in case law and, depending on the legal system, previous cases compel rulings. The search for methods to accurately identify similar previous cases is not new and has been a vital input, for example, to case-based reasoning (CBR) methodologies. Innovations in language processing and machine learning (ML) brought momentum to identifying precedents while providing tools for automating this task. This rapid literature review investigated how research on the identification of legal precedents has evolved. It also examined the most promising automation strategies for this task and confirmed the growing interest in using artificial intelligence for legal precedents retrieval. The findings demonstrate that no artificial intelligence solution currently stands out as the most effective at finding past similar cases. Also, existing results require validation with statistically signific...

Mining the Harvard Caselaw Access Project

SSRN Electronic Journal, 2020

This Essay illustrates how machine learning can disrupt legal scholarship through the algorithmic extraction and analysis of big data. Specifically, we utilize data from Harvard Law School's Caselaw Access Project, which has digitized all U.S. case law, to model how courts tackle two thorny question in antitrust: the measure of market power and the balance between antitrust and regulation. We have built a machine learning platform that can analyze large datasets through algorithmic topic modeling, a form of natural language processing. The platform creates visualizations that depict how thousands of market power and antitrust-regulation cases cluster around different terms-as well as how these clusters have evolved over time. Ultimately, our project aims to push information technology firms such as Westlaw and Lexis, as well as their insurgent challengers, to keep legal research transparent and cost-effective.

Recognizing cited facts and principles in legal judgements

Artificial Intelligence and Law, 2017

In common law jurisdictions, legal professionals cite facts and legal principles from precedent cases to support their arguments before the court for their intended outcome in a current case. This practice stems from the doctrine of stare decisis, where cases that have similar facts should receive similar decisions with respect to the principles. It is essential for legal professionals to identify such facts and principles in precedent cases, though this is a highly time intensive task. In this paper, we present studies that demonstrate that human annotators can achieve reasonable agreement on which sentences in legal judgements contain cited facts and principles (respectively, j ¼ 0:65 and j ¼ 0:95 for inter-and intra-annotator agreement). We further demonstrate that it is feasible to automatically annotate sentences containing such legal facts and principles in a supervised machine learning framework based on linguistic features, reporting per category precision and recall figures of between 0.79 and 0.89 for classifying sentences in legal judgements as cited facts, principles or neither using a Bayesian classifier, with an overall j of 0.72 with the human-annotated gold standard.

Challenges in Machine Understanding of Legal Text

2020

The development of good models for representing legal text in order to make them suitable for machine-understanding and of models that incorporate human legal expertise into automatic tools, still pose great difficulties. In this research, we tackled the specific task of (a) creating a structured body of court judgments by annotating with key markup, legal citations and legal terms and (b) the problem of classifying court judgments according to the specific legal points. We document the creation of a corpus of Malawi criminal judgments (MWCC) and highlight opportunities and challenges in constructing a machine understanding of this text. We developed a pipeline which takes scanned images of criminal court judgments and creates structured documents in TEI format containingmarkups such as case name, case number, parties, coram and annotations of references to laws and other court cases which can be hyperlinked. We discuss the possibility of using these annotations and the Internationa...

Solon: A Holistic Approach for Modelling, Managing and Mining Legal Sources

Algorithms

Recently there has been an exponential growth of the number of publicly available legal resources. Portals allowing users to search legal documents, through keyword queries, are now widespread. However, legal documents are mainly stored and offered in different sources and formats that do not facilitate semantic machine-readable techniques, thus making difficult for legal stakeholders to acquire, modify or interlink legal knowledge. In this paper, we describe Solon, a legal document management platform. It offers advanced modelling, managing and mining functions over legal sources, so as to facilitate access to legal knowledge. It utilizes a novel method for extracting semantic representations of legal sources from unstructured formats, such as PDF and HTML text files, interlinking and enhancing them with classification features. At the same time, utilizing the structure and specific features of legal sources, it provides refined search results. Finally, it allows users to connect a...

Using Information Technology to Examine the Communication of Precedent: Initial Findings and Lessons From the CITE-IT Project 1

2005

The CITE-IT project employs information technologies in innovative ways to investigate the development and dissemination of precedent in the American legal system, based on a study of the issue of "regulatory takings." This manuscript describes the initial phases of this multidisciplinary project, specifically the methodologies we have developed to identify the corpus on which the study itself will build -all federal-level regulatory takings decisions following the 1978 Penn Central Supreme Court decision. While a comprehensive, clearly identified collection of decisions -pertinent to a single area of law -presents a great resource for legal scholars, defining such a collection is in fact quite challenging, and has rarely (if ever) been attempted. Using a combination of conventional research techniques and computer automation, we identified 2,780 decisions, triangulating across multiple search approaches to identify "best candidates" for the pool. By exploiting formatting patterns across these decision texts, we then automatically extracted additional data from each (e.g., formal citation, court location, date, and prior decisions cited), which we then converted to graphical form (as well as more formal metrics for further analysis). This manuscript describes these processes, as well as a review of the scholarship on precedent and citation analysis, and a summary of the history of regulatory takings. We conclude with our future research goals, including expanding this pool to include all federal cases since 1922, as well as all relevant decisions handed down by the state supreme courts.