Reference Extraction and Resolution for Legal Texts (original) (raw)

Dealing with automatic reference extraction in the legal domain digital libraries

Palabras clave : Referencias, extracción de información, textos jurídicos Résumé Nous présentons une application de l'extraction d'information au domaine juridique. Le but est d'automatiser l'extraction de références des documents juridiques (par un analyse du contenu). Les informations concernant les références extraites sont stockées, et utilisées par des services offerts dans les bibliothèques électroniques. Le traitement couvre l'analyse du domaine juridique à l'implantation des logiciels, et quelques expérimentations. Ce travail est fait en collaboration avec des juristes.

Automated Detection of Reference Structures in Law

2006

Combining legal content stores of different providers is usually time, effort and money intensive due to the usually 'hard-wired' links between different parts of the constituting sources within those stores. In practice users of legal content are confronted with a vendor lock-in situation and have to find work-arounds when they want to combine their own content with the content provided by others. In the BSN project we developed a parser that enables the creation of a referential structure on top of a legal content store. We empirically tested the parsers' effectiveness and found an over 95% accuracy even for complex references.

Extraction of Legal Documents for Assistance to Lawyers

The functioning of court cases results in the engenderment of many documents, most of them in the form of digital copy or text documents. Many of them are licitly relevant and involute, which makes their understanding difficult. They are indicted in natural language for lawyers, which is hard for computer processing and hence hard for further analysis. Documents represent an abundance of data in unstructured form, So, We describe an information extraction and retrieval system, which extracts data and retrieves relevant information of prior cases from a database predicated on a query passed by the user. Hence with the motive of engendering such a system wherein, the input would be given as a keyword or number of keywords to obtain the desired result in the form of a relationship between the query and all the case documents. Our system employs a cumulation of information retrieval, Information Extraction, Natural Language Processing techniques like Term Frequency, Inverse Document Frequency, and Cosine Similarity which will rank case documents according to the query. The goal is to facilitate the work of professionals in terms of processing large magnitudes of documents. Incremented productivity should be propitious for both natural and legal professionals when working with textual and licit issues.

Problems of automatic processing and analysis of information from legal texts

2012

In the paper, problems of legal information digitalization are investigated. Conditions for extraction information from legal texts related to the common ones processing (non-legal terms) are outlined. Sample results of similarity analysis are presented. Further research aimed at semantic analysis of legal texts are outlined.

Automatic extraction of semantics in law documents

Proceedings of the V …, 2007

In this paper we address the problem of automatically enriching legal texts with semantic annotation, an essential pre-requisite to effective indexing and retrieval of legal documents. This is done through illustration of a computational system developed for automated semantic annotation of (Italian) law texts. This tool is an incremental system using Natural Language Processing techniques to perform two tasks: i) classify law paragraphs according to their regulatory content, and ii) extract relevant text fragments corresponding to specific semantic roles that are relevant for the different types of regulatory content. The paper sketches the overall architecture of the tool and reports results of a preliminary case study on a sample of Italian law texts.

Information Extraction from Legal Documents Using Linguistic Knowledge and Ontologies

Information extraction in legal texts is an important part of a broader set of enabling tools to assist users in accessing relevant information. Existing approaches deal with difficulties regarding proper treatment of text aspects. Knowledge acquisition rules, based on the linguistic treatment of specific aspects of legal documents would be useful for improving the results in this task. Additionally, domain knowledge representation can provide an even broader set of possibilities. This paper presents a model for addressing Information Extraction from texts in the legal domain in which both of the aforementioned aspects are considered. It outlines the proposed fundamental components, describes Brazilian law document use cases and discusses the methodology and initial results, as well as future works.

Corpus for Automatic Structuring of Legal Documents

ArXiv, 2022

In populous countries, pending legal cases have been growing exponentially. There is a need for developing techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated with a label coming from a list of pre-defined Rhetorical Roles. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. Further, we show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction. We release the corpus and baseline model code along with the paper.

Automatic semantics extraction in law documents

Proceedings of the 10th …, 2005

Normative texts can be viewed as composed by formal partitions (articles, paragraphs, etc.) or by semantic units containing fragments of a regulation (provisions). Provisions can be described according to a metadata scheme which consists of provision types and their arguments. This semantic annotation of a normative text can make the retrieval of norms easier. The detection and description of the provisions according to the established metadata scheme is an analytic intellectual activity aiming at classifying portions of a normative text into provision types and to extract their arguments. Automatic facilities supporting this intellectual activity are desirable. Particularly, in this paper, two modules able to qualify fragments of a normative text in terms of provision types and to extract their arguments are presented.

Linking European Case Law: BO-ECLI Parser, an Open Framework for the Automatic Extraction of Legal Links

International Conference on Legal Knowledge and Information Systems, 2017

In this paper we present the BO-ECLI Parser, an open framework for the extraction of legal references from case-law issued by judicial authorities of European member States. The problem of automatic legal links extraction from texts is tackled for multiple languages and jurisdictions by providing a common stack which is customizable through pluggable extensions in order to cover the linguistic diversity and specific peculiarities of national legal citation practices. The aim is to increase the availability in the public domain of machine readable references metadata for case-law by sharing common services, a guided methodology and efficient solutions to recurrent problems in legal references extraction, that reduce the effort needed by national data providers to develop their own extraction solution. Keywords. natural language processing, legal references, case law databases, linked open data 1 Council conclusions inviting the introduction of the European Case Law Identifier (ECLI) and a minimum set of uniform metadata for case law (CELEX:52011XG0429(01)).

Automatic Legal Document Analysis: Improving the Results of Information Extraction Processes Using an Ontology

2019

Information Extraction (IE) is a pervasive task in the industry that allows to obtain automatically structured data from documents in natural language. Current software systems focused on this activity are able to extract a large percentage of the required information, but they do not usually focus on the quality of the extracted data. In this paper we present an approach focused on validating and improving the quality of the results of an IE system. Our proposal is based on the use of ontologies which store domain knowledge, and which we leverage to detect and solve consistency errors in the extracted data. We have implemented our approach to run against the output of the AIS system, an IE system specialized in analyzing legal documents and we have tested it using a real dataset. Preliminary results confirm the interest of our approach.