Text Indexing Research Papers - Academia.edu (original) (raw)

Top Papers
Most Cited Papers
Most Downloaded Papers
Newest Papers
People
- by
- •
- Cultural Heritage, Mathematical Morphology, Region growing, Page Segmentation
- by Wolfgang Gerlach
- •
- Applied Mathematics, Pure Mathematics, Sequence Analysis, Data Structure
- by Ana Lelescu
- •
- Decision Making, Data Mining, Data Warehousing, Information Processing

The SPIRIT search engine provides a test bed for the development of web search technology that is specialised for access to geographical information. Major components include the user interface, a geographical ontology, maintenance and retrieval functions for a test collection of web documents, textual and spatial indexes, relevance ranking and metadata extraction. Here we summarise the functionality and interaction between these components before focusing on the design of the geo-ontology and the development of spatio-textual indexing methods. The geo-ontology supports functionality for disambiguation, query expansion, relevance ranking and metadata extraction. Geographical place names are accompanied by multiple geometric footprints and qualitative spatial relationships. Spatial indexing of documents has been integrated with text indexing through the use of spatio-textual keys in which terms are concatenated with spatial cells to which they relate. Preliminary experiments demonstrate considerable performance benefits when compared with pure text indexing and with text indexing followed by a spatial filtering stage.

- by Alia Abdelmoty
- •
- Web search, User Interface, Geographic Information Science, Search Engine
- by Rodrigo Gonzalez
- •
- Latin, Text Indexing

Conventional Information Retrieval Systems (IRSs), also called text indexers, deal with plain text documents or ones with a very elementary structure. These kinds of system are able to solve queries in a very efficient way, but they cannot take into account tags which mark different sections, or at best this capability is very limited. In contrast with this, nowadays, documents which are part of a corpus often have a rich structure. They are structured using XML (Extensible Markup Language)[1] or in some other format which can be converted to XML in a more or less simple way. So, building classical IRSs to work with these kinds of corpus will not benefit from this structure and results will not be improved. In addition, several of these corpora are very large and include hundreds or thousands of documents which in turn include millions or hundreds of millions of words. Therefore, there is the need to build efficient and flexible IRSs which work with large structured corpora.

- by Arie Shoshani
- •
- Information Retrieval, Open Source, Structured data, Text Indexing
- by Matthias Petri
- •
- Data Structure, Suffix Tree, Data storage, Query processing
- by C. Du Mouza
- •
- Performance Analysis, Search Algorithm, Text Indexing, Indexation
- by Daniela Florescu
- •
- Engineering, Computer Science, Technology, Computer Networks

The information environment is seen to be one of the predominant factors for effective maintenance and inspection systems in the operation of commercial aircraft. The design issues can be stated simply as decisions on what information to present, when to present this information, and how to present this information. It is desirable that in answering these questions, the designer accounts for the cognitive abilities of humans and the demands that the task requirements generate. This paper provides a framework for information design by combining the concepts from the human factors knowledge base with the specific needs of aircraft inspection. This framework captures the interaction between the inspection task and its information requirements, leading to an analysis of the information needs of aircraft inspectors, using this framework and the cognitive control categories of Skill-Rule-Knowledge based behaviors. Based on this analysis, guidelines for information systems design have been suggested.

- by Nur Hazirah Sa'ari
- •
- Engineering, Information Retrieval, Project Management, Semantics
- by Matthias Petri
- •
- Data Structure, Suffix Tree, Data storage, Query processing

A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution ...

- by Hà Nguyễn
- •
- Computer Science, Information Retrieval, XML, Automata

We consider a set of natural language processing techniques based on finite-state technology that can be used to analyze huge amounts of texts. These techniques include an advanced tokenizer, a part-of-speech tagger that can manage ambiguous streams of words, a system for containing words by means of derivational mechanisms, and a shallow parser to extract syntactic-dependency pairs. We propose to use these techniques in order to improve the performance of standard indexing engines.

- by Fco. Mario Barcala
- •
- Information Retrieval, Natural Language Processing, Morphology, Parsing
- by Diego Arroyuelo
- •
- Data Structure, Text Indexing, Indexation, Lempel Ziv
- by Kashyap Dixit
- •
- Applied Mathematics, Pure Mathematics, Sequence Analysis, Data Structure
- by Johann van Reenen
- •
- Peer Review, Collaborative Research, Real Time, Knowledge base
- by Joachim Hasebrook
- •
- Human Memory, Electronic Media, Human Cognition, Web Based Training
- by Rodrigo González
- •
- Data Structure, Text Indexing, Indexation
- by ankur gupta
- •
- Theoretical Computer Science, Data Structure, Text Indexing, Lower Bound
- by thi nguyễn
- •
- Information Retrieval, Text Indexing, Tree Structure, Indexation
- by Taher Zaki
- •
- Radial Basis Function, Dictionary, Similarity, Tf-Idf
- by Rodrigo Gonzalez
- •
- Text Indexing, Indexation
- by Christos Tsalidis and +1
- •
- Text Mining, Information Extraction, Question Answering System, Corpus linguistic

In this article we present the MIRTO platform -under development at the University Stendhal of Grenoble- and how it addresses common flaws of CALL software. This platform led to another project: the creation of a pedagogically indexed text base. We introduce here the notion of pedagogical indexation, and confront the particular case of pedagogical indexation for language learning with the existing pedagogical resource description standards, before proposing leads towards the implementation of the former. (http://www.formatex.org/micte2005/165.pdf)

- by Mathieu Loiseau
- •
- Computer Science, Language Teaching, Text Indexing, Indexation

Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data of which the structure is unknown, loose, or absent. While these languages are rich enough to allow for... more

- by Daniela Florescu
- •
- Engineering, Computer Science, Technology, Computer Networks
- by Anish Shrestha
- •
- Algorithms, Computational Biology, Software, Data Structure

In this paper we describe the geographic information retrieval system developed by the Multimedia & Information Systems team for GeoCLEF 2006 and the results achieved. We detail our methods for generating and applying co-occurrence models for the purpose of place name disambiguation, our use of named entity recognition tools and text indexing applications. The presented system is split into two stages: a batch text & geographic indexer and a real time query engine. The query engine takes manually crafted queries where the text component is separated from the geographic component. Two monolingual runs were submitted for the GeoCLEF evaluation, the first constructed from the title and description, the second included the narrative also. We explain in detail our use of co-occurrence models for place name disambiguation using a model generated from Wikipedia. The paper concludes with a full description of future work and ways in which the system could be optimised.

- by Daniel Vargas
- •
- Peer Review, Collaborative Research, Real Time, Knowledge base

- by Kim Chi Nguyen
- •
- Computer Science, Information Retrieval, XML, Automata
- by Bảo Đặng Quốc
- •
- Cognitive Science, Information Retrieval, Graphics, Images
- by norma herrera
- •
- Suffix Tree, Text Indexing
- by rodrigo gonzalez
- •
- Applied Mathematics, Information Theory, Pure Mathematics, Data Structure
- by ankur gupta
- •
- Entropy, Text Indexing, Indexation, Experimental Analysis
- by Rodrigo Rampoldi González
- •
- Latin, Text Indexing, Binary relation
- by Wing-kai Hon
- •
- Information Retrieval, Data Structure, Suffix Tree, Succinct Data Structures
- by Dimitrios Tsoumakos
- •
- Cloud Computing, Open Source, Text Indexing, Indexation
- by Christos Tsalidis and +1
- •
- Data Mining, Text Mining, Information Extraction, Question Answering System
- by Srinivasa Satti
- •
- Computational Biology, Data Structure, Suffix Tree, Text Indexing
- by Witold Litwin
- •
- Performance Analysis, Search Algorithm, Text Indexing, Indexation

Terminology management is a key component of many natural language processing activities such as machine translation (Langlais and Carl, 2004), text summarization and text indexation. With the rapid development of science and technology continuously increasing the number of technical terms, terminology management is certain to become of the utmost importance in more and more content-based applications.

- by Kim Nguyễn
- •
- Computer Science, Information Retrieval, XML, Automata