Integration of semantic, metadata and image search engines with a text search engine for patent retrieval (original) (raw)

Developing Semantic Search for the Patent Domain

The patent domain is a very important source of scientific information that is currently not used to its full potential. Issues such as high numbers of patents, complicated language style and inconsistently used vocabulary make the task of searching for relevant patents extremely complex. While this is already a problem for patent professionals who have to invest a lot of time and effort into their search, it is even more problematic for academic scientists with little experience in this domain. Semantic search functionality has been demonstrated to provide large advantages for document search in other domains. As an example, the search engine GoPubMed offers advanced search functionality for the biomedical domain based on annotating documents with relevant concepts from various ontologies. In this paper, we report on our efforts to provide comparable advances for the patent domain. We introduce the patent search prototype GoPatents, and we describe the experiments that we performed during its development in the areas of term extraction, term and IPC class co-occurrence analysis, automated patent categorization, and automated annotation with ontology concepts.

A Patent Retrieval Method using Semantic Annotations

Automatic annotation of key phrases for their semantic categories can help improving effectiveness of a variety of text-based systems including information retrieval, summarization, question answering, etc. In this paper, we exploit semantic annotations for patent retrieval (i.e., patent invalidity search). We first annotated key phrases for two semantic categories, PROBLEM (e.g. "pattern matching") and SOLUTION (e.g. "dynamic programming") in a patent document, which constitute a particular technology. Semantic clusters are formed by grouping patent documents with the same PROBLEM or SOLUTION tag. A language modelling approach to information retrieval is extended to consider the semantically oriented clusters as well as document models. Our retrieval evaluation of the proposed approach using a collection of United States patent documents shows a 22% improvement over the baseline, a smoothed language modelling approach without using the semantic annotations.

Developing a Comprehensive Patent Related Information Retrieval Tool

Journal of theoretical and applied electronic commerce research, 2011

In recent years, there has been a massive growth of regulatory and related information available online. This information is distributed across many different domains creating a problem for accessing and managing this data. This paper proposes a framework to access information across two such domainspatents and court cases. The framework is designed to boost the value of a set of patents based on information available in court cases by identifying and cross-referencing mutual information in the two domains. We test our framework by constructing a use case involving the hormone erythropoietin. A corpus of 1150 patents (including 135 closely related patents) and 30 court cases is gathered. Challenges associated with such integration and future plans are briefly discussed.

The patents retrieval prototype in the MOLTO project

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012

This paper describes the patents retrieval prototype developed within the MOLTO project. The prototype aims to provide a multilingual natural language interface for querying the content of patent documents. The developed system is focused on the biomedical and pharmaceutical domain and includes the translation of the patent claims and abstracts into English, French and German. Aiming at the best retrieval results of the patent information and text content, patent documents are preprocessed and semantically annotated. Then, the annotations are stored and indexed in an OWLIM semantic repository, which contains a patent specific ontology and others from different domains. The prototype, accessible online at http://molto-patents. ontotext.com, presents a multilingual natural language interface to query the retrieval system. In MOLTO, the multilingualism of the queries is addressed by means of the GF Tool, which provides an easy way to build and maintain controlled language grammars for interlingual translation in limited domains. The abstract representation obtained from the GF is used to retrieve both the matched RDF instances and the list of patents semantically related to the user's search criteria. The online interface allows to browse the retrieved patents and shows on the text the semantic annotations that explain the reason why any particular patent has matched the user's criteria.

Patent retrieval system using document filtering techniques

2000

Abstract Existing patent retrieval systems are difficult to use by amateur users because of several drawbacks. We have developed a novel type of patent retrieval system which anybody can make use of easily, and have been using it in our laboratories since the beginning of this year. Our system uses document ltering techniques. These ltering techniques are based on a probabilistic model which searches documents relevant to the user's interest.

NLP-based Patent Information Retrieval

The paper presents an approach to information retrieval for a specific domain of patent claims. The approach rests on utilizing predicate-argument structure representation of full-text documents for organizing a fine-tuned retrieval of semantic information. Indexing and search rely on a versatile data from a thoroughly elaborated linguistic knowledge base. The procedure of automatic indexing resulting in forming the predicate-argument structure of a patent claim is presented. The predicate-argument structure is conceived as a structural unit of the indexed document. The search strategy based on mapping predicate-arguments structures of the documents and query representations is described.

A patent system ontology for facilitating retrieval of patent related information

Proceedings of the 6th International Conference on Theory and Practice of Electronic Governance - ICEGOV '12, 2012

The recent years have seen a tremendous growth in research and developments in science and technology, and an emphasis in obtaining Intellectual Property (IP) protection for one's innovations. Information pertaining to IP for science and technology is siloed into many diverse sources and consists of laws, regulations, patents, court litigations, scientific publications, and more. Although a great deal of legal and scientific information is now available online, the scattered distribution of the information, combined with the enormous sizes and complexities, makes any attempt to gather relevant IP-related information on a specific technology a daunting task. This paper describes a knowledge-based software framework to facilitate retrieval of patents and related information across multiple diverse and uncoordinated information sources in the US patent system. The document corpus covers issued US patents, court litigations, scientific publications, and patent file wrappers in the biomedical technology domain.

Towards content-based patent image retrieval: A framework perspective

World Patent Information, 2010

In this article, we discuss the potential benefits, the requirements and the challenges involved in patent image retrieval and subsequently, we propose a framework that encompasses advanced image analysis and indexing techniques to address the need for content-based patent image search and retrieval. The proposed framework involves the application of document image pre-processing, image feature and textual metadata extraction in order to support effectively content-based image retrieval in the patent domain. To evaluate the capabilities of our proposal, we implemented a patent image search engine. Results based on a series of interaction modes, comparison with existing systems and a quantitative evaluation of our engine provide evidence that image processing and indexing technologies are currently sufficiently mature to be integrated in real-world patent retrieval applications.

Development of an information retrieval tool for biomedical patents

Computer methods and programs in biomedicine, 2018

The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conve...

DCU@ CLEF-IP 2009: Exploring standard IR techniques on patent retrieval

2010

Abstract This paper presents the experiments and results for our participation in CLEF-IP 2009, which in newly launched this year. Our work applied standard information retrieval (IR) techniques to patent search. Different experiments tested various methods for the patent retrieval, including query formulation, structured index, weighted fields, filtering, and relevance feedback.