Information Retrieval System Research Papers (original) (raw)

2025, Document numérique

Distribution électronique Cairn pour Lavoisier. © Lavoisier. Tous droits réservés pour tous pays. La reproduction ou représentation de cet article, notamment par photocopie, n'est autorisée que dans les limites des conditions générales d'utilisation du site ou, le cas échéant, des conditions générales de la licence souscrite par votre établissement. Toute autre reproduction ou représentation, en tout ou partie, sous quelque forme et de quelque manière que ce soit, est interdite sauf accord préalable et écrit de l'éditeur, en dehors des cas prévus par la législation en vigueur en France. Il est précisé que son stockage dans une base de données est également interdit.

2025

World Wide Web (WWW) is a mine of information for most people. Due to the huge amount of ‎information and documents available on the internet, the process ‎of retrieving documents that are most relevant to user needs become a tremendous problem. In addition to ‎that, the time needed for retrieving what a user searches for ‎increases dramatically. In this paper, island genetic algorithm (IGA) is applied to achieve parallelism and speed up the web information retrieval ‎process‎. To retrieve pages most relevant to user needs, four different islands are developed. Each island has different selection method, and different ‎fitness function. These islands are executed ‎independently on different servers to achieve the parallel ‎behavior. Finally, the results obtained by the four islands are combined and passed to a decision making phase to choose the documents most relevant to user query. Cosine similarity measure is used to evaluate the performance of the proposed technique.

2025, the Leibniz Center for Research in Computer Science

In this paper, we compared four self-developed user interfaces that display document search results using different methods. The two information elements that we used in order to create the four interfaces were document categories and lines from the document. A user study conducted compared our four interfaces. The study showed that the category addition to the interface is beneficial in both objective and subjective measures. It also showed that displaying the relevant lines from the document instead of its first lines increases the effectiveness and shorten the search time in all cases and tasks. The participants liked the interface containing categories and relevant lines better than all other interfaces checked. The search time in this interface was not only perceived as faster by the participants but also was proved faster in the objective manner. Organizing search results using text element enables users to focus on items that are related to search query rather than to browse through the entire displayed search results list. Another sub-research that we conducted, showed that the most important parameter for the users is the confidence level that the answer is accurate; the second parameter in terms of importance is the time of search, and the least important parameter is the feel of comfort while conducting a search.

2025, Document numérique

20 ans, plusieurs pays ont essayé de mettre en place un dossier médical standardisé centralisé (au niveau régional ou national). La plupart de ces tentatives ont échoué, principalement pour deux raisons liées aux difficultés de mise en place d'un identifiant unique pour le patient et à l'absence de standardisation des dossiers médicaux. ABSTRACT. For more than 20 years, many countries have been trying to set up a standardized medical record (regional or national). Most have not reached this goal, essentially due to two main difficulties related to patient identification, and the standardization of medical records. We propose here the non-centralized management of medical records relying on a specific procedure that gives the patient access to his distributed medical data, wherever it is located. The originality of this procedure relies on new advances in technology, which make it possible to envisage access to medical records anywhere and anytime, thanks to Grid and watermarking methodologies. Of course, all existing standardised information could be more easily centralised. As a consequence, a mixed system (decentralised for unstructured data and centralised for already structured data) could be proposed. MOTS-CLES : dossier médical, sécurité des données, identification du patient, grille, tatouage.

2025

Nous présentons dans cet article un logiciel permettant d’assister l’usager, de manière personnalisée lors de la recherche documentaire sur le Web. L’architecture du logiciel est basée sur l’intégration d’outils numériques de traitements des langues naturelles (TLN). Le système utilise une stratégie de traitement semi-automatique où la contribution de l’utilisateur assure la concordance entre ses attentes et les résultats obtenus.

2025

Cet article presente les travaux pour le projet interdisciplinaire GEMINAT (GeoConnaisances des milieux naturels) qui a pour but d'integrer et d'exploiter des donnees environnementales heterogenes par l'application du web semantique. A partir d'un cas d'etude mene sur un observatoire environnemental base a Chize, nous resumons les besoins d'analyse spatio-temporelle des experts biologistes et ecologues envers les bases de donnees de l'assolement et de la biodiversite. Nous montrons comment la mise en oeuvre d'un framework avec une ontologie spatio-temporelle jouant le role d'un mediateur semantique peut resoudre les difficultes d'analyse et de maintenance qu'induisent ces systemes, amenes a de constantes evolutions de leurs modeles. En particulier, la demonstration de la faisabilite d'un tel systeme est faite, et nous mesurons sa capacite a repondre a des requetes complexes melant plusieurs sources de donnees et les dimensions spatiale...

2025, … and information systems: JURIX 2002: the …

Abstract. Intelligent text information retrieval systems need the capability to deal with the semantics of the content of their text bases. In order to satisfy this requisite it is necessary to extract semantic information from the... more

2025

Este trabajo ha sido subvencionado por el proyecto TEXT-ENTERPRISE 2.0 (TIN2009-13391-C04-03) y por el programa "Juan de la Cierva" del Ministerio de Ciencia y Tecnología. Resumen: Por su enorme interés en la genética, medicina y farmacología, detectar interacciones entre proteínas (PPI) es una de las áreas de investigación más importantes en el campo de las investigaciones biomédicas. De ahí que revista especial importancia el análisis (semi) automático de textos biomédicos que permita recuperar y mantener descripciones experimentales que justifican la presencia o ausencia tales interaccione. Tal es la finalidad del sistema on-line que se describe en el presente trabajo: dado un texto, se verifica su afinidad al tema sobre PPI, y varias entidades biomédicas son reconocidas y devueltas al usuario. Palabras clave: extracción de información, reconocimiento de entidades.

2025

The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find" the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than ...

2025

Text mining has gained quite a significant importance during the past few years. Data, now-a-days is available to users through many sources like electronic media, digital media and many more. This data is usually available in the most unstructured form and there exists a lot of ways in which this data may be converted to structured form. In many real-life scenarios, it is highly desirable to classify the information in an appropriate set of categories. News contents are one of the most important factors that have influence on various sections. In this paper we have considered the problem of classification of news articles. This paper presents algorithms for category identification of news and have analysed the shortcomings of a number of algorithm approaches.

2025, arXiv (Cornell University)

The practice of evidence-based medicine (EBM) urges medical practitioners to utilise the latest research evidence when making clinical decisions. Because of the massive and growing volume of published research on various medical topics, practitioners often find themselves overloaded with information. As such, natural language processing research has recently commenced exploring techniques for performing medical domain-specific automated text summarisation (ATS) techniques-targeted towards the task of condensing large medical texts. However, the development of effective summarisation techniques for this task requires crossdomain knowledge. We present a survey of EBM, the domain-specific needs for EBM, automated summarisation techniques, and how they have been applied hitherto. We envision that this survey will serve as a first resource for the development of future operational text summarisation techniques for EBM.

2025, ADCS 2010

The medical domain has an abundance of textual resources of varying quality. The quality of medical articles depends largely on their publication types. However, identifying high-quality medical articles from search results is till date a manual and time-consuming process. We present a simple, rule-based, post-retrieval approach to automatically identify medical articles belonging to three high-quality publication types. Our approach simply uses title and abstract information of the articles to perform this. Our experiments show that such ...

2025

In this paper, we address the problem of selection collections. This is important for locating responses in digital libraries. The aim of methods, which deal with the area of information retrieval, is to reduce the amount of the exchanged messages by selecting the best servers from the beginning of the search. We propose a new function of selection based on statistics. Our function takes into account the relevance degree of documents in order to rank collections. We have implemented and compared our function with other methods. The experimentations have shown that our proposition is very competitive.

2025, Proceedings of the …

2025, Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004.

Processing (NLP) techniques have been used in Information Retrieval, the results is not encouraging. Proper names are problematic for cross language information retrieval (CLIR), detecting and extracting proper noun in Arabic language is a primary key for improving the effectiveness of the system. The value of information in the text usually is determined by proper nouns of people, places, and organizations, to collect this information it should be detected first. The proper nouns in Arabic language do not start with capital letter as in many other languages such as English language so special treatment is required to find them in a text. Little research has been conducted in this area; most efforts have been based on a number of heuristic rules used to find proper nouns in the text. In this research we use a new technique to retrieve proper nouns from the Arabic text by using set of keywords and particular rules to represent the words that might form a proper noun and the relationships between them. To extract proper nouns from the retrieved document, we need some information about it and where it was found. First, we mark the phrases that might include proper nouns; second, we apply rules to find the proper noun and we use simple methods (stop wording and stemming) usually yield significant improvements. To test the system we have used 20 articles extracted from the Al-Raya newspaper published in Qatar and Alrai newspaper published in Jordan.

2025, Journal of the American Society for Information Science and Technology

IR systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the WWW. Our aim was to find out how corpus-based CLIR manages in retrieving highly relevant documents. We created a Finnish-Swedish comparable corpus and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels -liberal, regular, and stringent -were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of our Comparable Corpus Translation system (Cocot) was compared to that of a dictionary-based query translation program; the two translation methods were also combined. The results indicate that corpus-based CLIR performs particularly well with highly relevant documents. In average precision, Cocot even matched the monolingual baseline on the highest rele-vance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.

2025, Journal of Documentation

The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non‐relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept‐based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were d...

2025, Text REtrieval Conference

This study investigated the effect on retrieval performance of two methods of combination of multiple representations of TREC topics. Five separate Boolean queries for each of the 50 TREC routing topics and 25 of the TREC ad hoc topics were generated by 75 experienced online searchers. Using the INQUERY retrieval system, these queries were both combined into single queries, and used to produce five separate retrieval results, for each topic. In the former case, results indicate that progressive combination of queries leads to progressively improving retrieval performance, significantly better than that of single queries, and at least as good as the best individual single query formulations. In the latter case, data fusion of the ranked lists also led to performance better than that of any single list.

2025, Lecture Notes in Computer Science

This paper presents the design and implementation details of an information retrieval system. A multi-agent architecture has been adopted to allow for extended flexibilities over similar traditional systems. Since the system is consisted of various types of agents a short agent information will be given first. It will also explained that how these agents communicate with each other in order to accomplish information gathering through an agent messaging router which is a part of a software package called JATlite (Java Agent Template). JATLite provides the basic infrastructure for creation of agents and their communication.

2025, Logic, Rationality and Interaction - Proceedings of the Third International Workshop on Logic, Rationality and Interaction, Lecture Notes in Computer Science

This paper introduces a semantic model for vague quantifiers (VQs) combining Fuzzy Theory (FT) and Supervaluation Theory (ST), which are the two main theories on vagueness, a common source of uncertainty in natural language. After comparing FT and ST, I will develop the desired model and a numerical method for evaluating truth values of vague quantified statements, called the Modified Glöckner's Method, that combines the merits and overcomes the demerits of the two theories. I will also show how the model can be applied to evaluate truth values of complex quantified statements with iterated VQs.

2025, Journal of the American Society for Information Science

We describe a prototype Information Retrieval system, SENTINEL, under development at Harris Corporation's Information Systems Division. SENTINEL is a fusion of multiple information retrieval technologies, integrating n-grams, a vector space model, and a neural network training rule. One of the primary advantages of SENTINEL is its 3-dimenstional visualization capability that is based fully upon the mathematical representation of information within SENTINEL. This 3-dimensional visualization capability provides users with an intuitive understanding, with relevance feedback/query refinement techniques that can be better utilized, resulting in higher retrieval accuracy (precision).

2025

Le concept d'intelligence d'un système s'appuie actuellement le plus souvent sur des notions intuitives ou se réfère à des domaines aux contours flous dans lesquels se trouvent reléguées aussi bien des questions théoriques complexes que des solutions techniques mal maîtrisées, telles que, par exemple, celles relatives à la reconnaissance des formes. La confusion qui régne entre les buts poursuivis par l'automatisation et ceux de l'intelligence artificielle se traduit, en pratique, par une frontière imprécise entre systèmes intelligents et non intelligents. Cet article tente d'appréhender globalement la notion d'intelligence d'un système avec une approche typologique faisant abstraction de la nature et de la complexité des domaines d'application. L'analyse décisionnelle des systèmes propose, en effet, des critères formels permettant de caractériser et de différencier les systèmes intelligents des systèmes non intelligents. I.

2025, Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages - Semitic '05

The goal of many natural language processing platforms is to be able to someday correctly treat all languages. Each new language, especially one from a new language family, provokes some modification and design changes. Here we present the changes that we had to introduce into our platform designed for European languages in order to handle a Semitic language. Treatment of Arabic was successfully integrated into our cross language information retrieval system, which is visible online.

2025, 25th IEEE International Conference on Distributed Computing Systems Workshops

2025

Une universite abrite differents acteurs qui ont recours a des systemes de ressources documentaires, des systemes de production d'information, des systemes de recherche d'information. Le recours a l'entrepot de donnees(ED) permet de proposer des solutions pour faire evoluer un Systeme d'Information(SI) en un Systeme d'Information Strategique (SIS). La prise en compte de notre modele de representation de l'utilisateur RU=(T, F, B, A) est traduit par des meta donnees. Nous elaborons egalement les meta donnees propres a l'(ED) qui definissent les meta donnees structurelles et d'accessibilite propres au systeme de pilotage. Afin de proceder au mieux au developpement de notre contribution au (SIS), la meta modelisation de l'(ED) permet d'elaborer un schema directeur pour la construction de l'(ED).

2025

RÉSUMÉ Une université abrite différents acteurs qui ont recours à des systèmes de ressources documentaires, des systèmes de production d'information, des systèmes de recherche d'information. Le recours à l'entrepôt de données(ED) permet de proposer des solutions pour faire évoluer un Système d'Information(SI) en un Système d'Information Stratégique (SIS). La prise en compte de notre modèle de représentation de l'utilisateur RU=(T, F, B, A) est traduit par des méta données. Nous élaborons également les méta données propres à l'(ED) qui définissent les méta données structurelles et d'accessibilité propres au système de pilotage. Afin de procéder au mieux au développement de notre contribution au (SIS), la méta modélisation de l'(ED) permet d'élaborer un schéma directeur pour la construction de l'(ED). ABSTRACT . A university have various actors who use systems of documentary resources, systems of production of information, systems of informa...

2025

Une université abrite différents acteurs qui ont recours à des systèmes de ressources documentaires, des systèmes de production d'information, des systèmes de recherche d'information. Au sein de cette même université, cohabitent de nombreux Systèmes d'Information (SI) spécifiques aux besoins des composantes qui la constituent. Ces (SI) éparses abritent des informations qui peuvent être utiles aux composantes voisines. Le recours à l'entrepôt de données permet de proposer des solutions pour faire évoluer un Système d'Information en un Système d'Information Stratégique (SIS) voire en un Système d'Information Décisionnel (SID). Nous nous intéressons plus particulièrement à la classification des acteurs de l'université basée sur leurs activités pour la construction des bases métiers et d'un entrepôt de données. Dans notre cas, il s'agit de mettre à disposition des décideurs de l'université des informations synthétiques autour d'indicateurs choisis par eux, pour leur permettre de réaliser des tableaux de bord, afin de procéder à des constats, des suivis d'opérations et de prévisions ou pour mettre en évidence des causes de certains faits. L'analyse des rôles des différents acteurs, en situation de recherche d'information, nous permet de dresser des métadonnées, afin de prendre en compte le comportement des utilisateurs lors de la constitution de l'entrepôt de données et l'amélioration du Système d'Information.

2025

Meta datos, sistemas de información estratégica, almacén de datos, modelado de Meta. Les sources documentaires sous forme d'informations primaires, d'informations secondaires, d'informations tertiaires et d'informations à valeur ajoutée sont désormais disponibles par les nouvelles technologies d'information. Pour caractériser correctement les informations issues de base de données et permettre de produire de la connaissance, une première étape, communément admise, est de caractériser les données par des métas données c'est-à-dire des données sur les données. Une vision intéressante dans l'approche du décisionnel consiste à mettre l'acteur du système d'informations (SI) au centre du problème. La méta modélisation permet de décrire les objectifs de l'utilisateur, ses différentes activités et ses besoins dans la modélisation d'un système -d'informations stratégiques (S-IS). Les métas connaissances pour la représentation des connaissances du domaine d'application relèvent de la méta base chargée de contenir les structures des bases métiers. Ces bases métiers sont utilisées pour l'exploration des contenus des bases et pour des analyses décisionnelles. Les connaissances ou les informations contenues dans la méta base portent sur les attributs nécessaires pour les explorations et les analyses multidimensionnelles. Dans cet article nous explicitons les trois niveaux se modélisation d'un SIS et des acteurs impliqués dans le SIS.

2025

Our study relates to the constitution of a pole of documentary resources, within a teaching framework intended for the training and research taking account the user. From the EQuA²te model, elaborated within the SITE team, we observe several processes in information search activities, from which we will benefit for building our datamarts. We propose a system of investigation based on the activities of university actors where the decision-making is allocated to the user.

2025

Je remercie Odile Thiéry, responsable permanent de l'équipe SITE, professeur à l'Université Nancy 2 et directrice du Service Commun du Pôle de gestion, pour avoir encadré mes travaux de recherche. Je la remercie pour la confiance qu'elle m'a accordée en me laissant toute latitude et en orientant mes recherches aux bons moments. Je lui sais gré d'avoir rassemblé toutes les conditions favorables pour mener une expérimentation. Je remercie Amos David, responsable scientifique de l'équipe SITE, professeur à l'Université Nancy 2, de m'avoir accueillie dans son équipe et lui exprime toute ma reconnaissance pour ses nombreux conseils pour la construction de ma thèse et pour son soutien tout au long de ces années de recherche. J'exprime ma sincère reconnaissance aux membres du Jury : Monsieur Laid BOUZIDI Professeur à l'

2025

Nous abordons une nouvelle gouvernance des universites en empruntant le processus d'intelligence economique pour faire evoluer un systeme d'information universitaire en un systeme d'information strategique universitaire. Le transfert d'un systeme d'information en systeme d'information decisionnel repose sur les bases metiers orientees vers les acteurs de l'universite par la prise en compte de la modelisation des utilisateurs. Le developpement d'un systeme d'information global de l'universite doit tenir compte de l'evolutivite du contexte de l'universite et de la prise en compte des systemes d'information alternatifs. Par notre contribution, nous etudions les processus propres a l'organisation, les processus propres a l'enseignant et les processus propres a l'etudiant pour modeliser les utilisateurs d'un systeme d'information strategique universitaire. La description des ressources electroniques inspiree en parti...

2025

Le transfert d'un système d'information en système d'information décisionnel repose sur les bases métiers orientées vers les acteurs de l'université par la prise en compte de la modélisation des utilisateurs. Le développement d'un portail de l'université tient compte de l'évolutivité du contexte de l'université et de la prise en compte des systèmes d'information alternatifs. La description des ressources électroniques et la modélisation des acteurs montrent aujourd'hui la complémentarité de deux univers : les mondes de l'indexation et les mondes du décisionnel reliés par les entrepôts de données. A partir d'une expérimentation pour aider l'acteur «enseignant-chercheur» à enrichir un état de l'art par une fouille de données à partir d'un entrepôt de données, nous montrons les possibilités d'exploitation de données qui font partie de la partie immergée d'un système d'information universitaire. Nous portons les résultats de l'expérimentation au travers d'une application développée à partir d'un produit en open source Openi pour enrichir un espace numérique de travail d'un service d'analyses multidimensionnelles adaptées aux différents acteurs du portail de l'université.

2025

Résumé. Nous abordons une nouvelle gouvernance des universités pour faire évoluer un système d'information universitaire en un système d'information stratégique universitaire. Le transfert d'un système d'information en système d'information décisionnel repose sur les bases métiers orientées vers les acteurs de l'université par la prise en compte de la modélisation des utilisateurs. Le développement d'un système d'information global de l'université doit tenir compte de l'évolutivité du contexte de l'université et de la prise en compte des systèmes d'information alternatifs. La question «Comment intégrer la représentation de l'utilisateur dans un Système d'Information Stratégique ?» guide notre démarche. Notre modèle «RUBICUBE» sert de point d'ancrage pour une expérimentation qui met en relief les difficultés techniques et organisationnelles qu'implique la construction d'un entrepôt avec la prise en compte du contexte...

2025

Notre approche conceptuelle de la modélisation du savoir est envisagée comme l'ensemble des méthodes et des techniques de gestion de l'information et d'utilisation des flux d'information pour l'anticipation des évolutions, pour l'action d'apprentissage organisationnel et pour l'activité stratégique d'adaptation de l'institution à l'environnement et aux besoins des utilisateurs. Nos réflexions à forte connotation systèmes d'informations stratégiques rejoignent les préoccupations des sciences de l'information et de la communication. La conception de systèmes d'informations stratégiques implique une modélisation complexe. Ce travail de conception en recourant à l'outil «entrepôt de données» favorise un travail collaboratif des acteurs pour mettre en commun des ressources qu'ils ont besoin de partager. Nos récents travaux de recherche permettent de mettre en évidence que «le monde de l'indexation et le monde du décisionnel sont reliés par les entrepôts de données». Une période expérimentale puis une phase applicative ont abouti à la constatation de l'importance d'une relation entre le monde du référentiel et le monde du décisionnel. En effet c'est dans la nécessité à réfléchir au référencement sous l'angle du choix du référencement et de sa mise en application que seront facilitées les possibilités d'analyses multidimensionnelles. Il ne s'agit plus seulement d'indexer pour répondre à des requêtes à l'aide de mots clés par les utilisateurs, mais de référencer pour favoriser des analyses faites par les acteurs en vue de passer d'un processus de recherche d' «information» à un processus de recherche de «connaissances» qui confère une intelligence au système d'information. Nouvelle gouvernance des universités, Système d'information stratégique, Système organisationnel, Modélisation des acteurs, Intégration de services ENT Title A datawarehouse of data seen as a base of knowledge integrating the modeling of the actors : A datawarehouse of data seen as a base of knowledge integrating the modeling of the actors : A datawarehouse of data seen as a base of knowledge integrating the modeling of the actors : A datawarehouse of data seen as a base of knowledge integrating the modeling of the actors : application to the university information system application to the university information system application to the university information system application to the university information system

2025

Une université abrite différents acteurs qui ont recours à des systèmes de ressources documentaires, des systèmes de production d'information, des systèmes de recherche d'information. L'intérêt que nous portons à la classification des acteurs de l'université, fondée sur notre modèle de représentation de l'utilisateur RU=(T, F, B, A), pour la construction d'un entrepôt de données et des bases métiers associées, permet de faire évoluer un système d'information en un système d'information stratégique. Nous réfléchissons à la conception d'un entrepôt de données de ressources documentaires dans un cadre pédagogique intégrant la modélisation de l'utilisateur. La description de ressources, en vue de leur réutilisation dans des parcours de formation, évoquent les difficultés rencontrées et formulent des propositions pour combler des manques dans les normes existantes et rendre plus opérationnels certains descriptifs. La modélisation des acteurs d'une part et des types de documents d'autre part, permettent d'élaborer des corrélations afin d'améliorer les réponses. La mise en relation des acteurs et des documents est possible par les méta données de l'entrepôt de données et la méta modélisation de l'entrepôt de données. Université, système d'information stratégique, entrepôt de données, entrepôt de ressources documentaires, classification des utilisateurs, modèle de l'utilisateur, modèle de document électronique, méta données, méta modélisation.

2025, Information Processing & Management

Transliteration is used to phonetically translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Because transliterations are usually representative index terms for documents, proper handling of the transliterations is important for an effective information retrieval system. However, there are limitations on handling transliterations depending on dictionary lookup, because transliterations are usually not registered in the dictionary. For this reason, many researchers have been trying to overcome the problem using machine transliteration. In this paper, we propose a method for improving machine transliteration using an ensemble of three different transliteration models. Because one transliteration model alone has limitation on reflecting all possible transliteration behaviors, several transliteration models should be complementary used in order to achieve a high-performance machine transliteration system. This paper describes a method about transliteration production using the several machine transliteration models and transliteration ranking with web data and relevance scores given by each transliteration model. We report evaluation results for our ensemble transliteration model and experimental results for its impact on IR effectiveness. Machine transliteration tests on English-to-Korean transliteration and Englishto-Japanese transliteration show that our proposed method achieves 78-80% word accuracy. Information retrieval tests on KTSET and NTCIR-1 test collection show that our transliteration model can improve the performance of an information retrieval system about 10-34%.

2025, AkiNik Publications

The exponential growth of multimedia data in diverse fields, including
entertainment, education, healthcare, and surveillance, necessitates efficient systems for storing, indexing, and retrieving complex multimedia content.
This chapter explores the foundational data structures and algorithms critical for managing multimedia databases effectively. It discusses spatial, temporal, and semantic challenges, emphasizing advanced indexing mechanisms like R-trees, quadtrees, and graph-based representations for scalable and accurate data management. Techniques for content-based retrieval, real-time processing, and adaptive compression are examined in detail, showcasing their applications in content recommendation systems, digital libraries, AR/VR platforms, and surveillance.
The integration of artificial intelligence and distributed architectures
emerges as a pivotal future direction, enabling semantic understanding, real-time analytics and enhanced scalability. The chapter also highlights
challenges, including the semantic gap, computational demands, and bias in AI systems, while proposing innovative solutions such as neural semantic search, edge computing, and immersive multimedia experiences. By bridging theoretical advancements and practical applications, this chapter provides a comprehensive framework for developing intelligent multimedia database systems that are efficient, adaptive, and future-ready.

2025, Visar Agushi

This paper Academic examines the challenges and innovative solutions in video information modeling and the use of query languages for managing big data. With advancements in multimedia technologies and the exponential growth of data... more

2025

Nowadays Internet offers such an extensive amount of information, that it becames very difficult for the users to take advantage of it. Searching of a solution, the present work proposes a methodology of implementation of an Intelligent Agent for the personalized information filtering. The idea is to develop a set of autonomous, non-mobile, adaptative agents.The learning mechanism adopted by the agents is "relevance feedback". The search space where the infomlation will be recoveredfrom are the articles and works found in the WWW pages, selected and classified in agreement with subjects of interest. The system was developed to help the academic public, composed by teachers, graduation and masters degreé students. The initial results were quite promising. The proposed agent demonstrated to be a powerful tool of information filtering, reducing the time wasted in that activity. Atualmente, várias aplicações estão sendo desenvolvidas na área de redes de computadores, mais especificamente, no âmbito da rede mundial Internet. Sistemas como o "World Wide Web" (WWW) tornaram a Internet acessível a uma gama muito grande de usuários leigos. A todo momento, estão surgindo servidores de informação oferecendo os mais diversos tipos de dados; pesquisadores estão tentando encontrar meios confiáveis para o pagamento eletrônico, tornando a rede um importante "mercado virtual". A grande quantidade de informações disponíveis torna a rede difícil de se manipular. Surgem questões sobre como os usuários serão capazes de localizar a informação que eles precisam, ou como poderão encontrar a melhor oferta para um determinado serviço. Uma possível solução para este problema consiste no uso de agentes.

2025, 2010 2nd IEEE International Conference on Information Management and Engineering

2025

optymalizacji systemu informacji bibliograficznej z zakresu językoznawstwa slawistycznego iSybislaw Streszczenie referatu wprowadzającego 1. Wstęp Celem bloku Komisji Bibliografii Lingwistycznej jest, ogólnie mówiąc, przedyskutowanie najważniejszych problemów dotyczących nowoczesnej postaci bibliografii w środowisku sieciowym. Chodzi o podzielenie się doświadczeniami i wymianę opinii na temat znaczenia bibliografii cyfrowej dla językoznawczych badań naukowych oraz sposobów jej optymalizacji. Recesja tradycyjnych papierowych bibliografii otwiera perspektywy, które były dotąd niedostępne (lub wręcz niemożliwe do osiągnięcia) dla twórców nowoczesnych bibliografii. Wykorzystanie nowych technologii i metodologii umożliwiło powstawanie nowoczesnych systemów informacji bibliograficznej, stanowiących cenne narzędzie w pracy badawczej slawistów. Odpowiednie zarządzanie bardzo dużymi zasobami danych i informacji stwarza nieznane dotychczas możliwości analiz statystycznych, różnorodnych badań dotyczących zmian językowych (zmian w językach słowiańskich), a także ich prognozowania i in. Zakładamy, że referaty zgłoszone do bloku tematycznego będą poświęcone zarówno kwestiom metodologicznym, jak i praktycznym, czyli skoncentrują się na metodach tworzenia i udoskonalania bibliograficznego systemu informacyjno-wyszukiwawczego, a także efektywnego korzystania z istniejących językoznawczych baz bibliograficznych, w tym bazy światowego językoznawstwa slawistycznego iSybislaw. 2. Referat wprowadzający przedstawia przyjęte w systemie iSybislaw rozwiązania merytoryczne, a także techniczne i technologiczne, optymalizujące funkcjonalności tego systemu. Gwoli ścisłości, skupimy się na nowych rozwiązaniach zastosowanych w ciągu 10 lat funkcjonowania systemu, jednak dla pełności obrazu przypomnimy też najważniejsze cechy innowacyjne zaimplementowanego w końcu 2007 r. systemu informacyjno-wyszukiwawczego. 2.1. Już na wstępie trzeba podkreślić, że efektywność systemu iSybislaw i jego optymalizacja są możliwe dzięki hipertekstowości, stanowiącej konstytutywną cechę nowoczesnych systemów informacyjno-wyszukiwawczych. Przykładowo, w systemie iSybislaw opartym na hipertekście wykorzystywane jest narzędzie intertekstualności, pozwalające m.in. dołączać, poprzez specjalne hiperłącza, pełne teksty dokumentów do ich opisów bibliograficznych. W konsekwencji baza iSybislaw zmienia swój charakter z bazy bibliograficznej na bibliograficzno-pełnotekstową. Należy brought to you by CORE View metadata, citation and similar papers at core.ac.uk

2025

Лектор и коректор радова на српском језику Марија Селаковић Kоректура резимеа на енглеском језику Clare McGinn Zubac Превод и коректура резимеа на руском језику Светлана Гољак, Эльвира Анатольевна Сорокина Коректура радова на руском језику Эльвира Анатольевна Сорокина Припрема за штампу Милан Тасић Тираж 400 примерака Штампа Службени гласник, Београд © Српска академија наука и уметности, 2017 САДРЖАЈ Уводно слово .

2025, Proceedings. 20th International Conference on Data Engineering

Data presented on commerce sites runs into thousands of pages, and is typically delivered from multiple back-end sources. This makes it difficult to identify incorrect, anomalous, or interesting data such as $9.99 air fares, missing links, drastic changes in prices and addition of new products or promotions. In this paper, we describe a system that monitors Websites automatically and generates various types of reports so that the content of the site can be monitored and the quality maintained. The solution designed and implemented by us consists of a site crawler that crawls dynamic pages, an information miner that learns to extract useful information from the pages based on examples provided by the user, and a reporter that can be configured by the user to answer specific queries. The tool can also be used for identifying price trends and new products or promotions at competitor sites. A pilot run of the tool has been successfully completed at the ibm.com site.

2025, Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages - Semitic '05

2025, CLEF (Working Notes)

2025, Proceedings of …

Clarity is being developed in a user-centred way with user involvement from the beginning. The design of the first user interface was based on current best practise, particular attention was paid to empirical evidence for a specific... more