The Effect of Collection Fusion Strategies on Information Seeking Performance in Distributed Hypermedia Digital Libraries (original) (raw)
Related papers
Report on the TREC-8 Experiment: Searching on the Web and in Distributed Collections
Text Retrieval Engineering Conference, 2001
The Internet paradigm permits information searches to be made across wide-area networks where information is contained in web pages and/or whole document collections such as digital libraries. These new distributed information environments reveal new and challenging problems for the IR community. Consequently, in this TREC experiment we investigated two questions related to information searches on the web or in digital libraries: (1) an analysis of the impact of hyperlinks in improving retrieval performance, and (2) a study of techniques useful in selecting more appropriate text databases (database selection problem encountered when faced with multiple collections), including an evaluation of certain merging strategies effective in producing, single, ranked lists to be presented to the user (database merging problem).
Collection Profiling for Collection Fusion in Distributed Information Retrieval Systems
Lecture Notes in Computer Science, 2007
Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalizing scores based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database and do not consider the retrieval performance. In this paper, we address the problem that in peer to peer information systems and argue that the performance of search engine should also be considered. We also proposed a collection profiling strategy which can discover not only collection content but also retrieval performance. Web-based query classification and two collection fusion approaches based on the collection profiling are also introduced in this paper. Our experiments show that our merging strategies are effective in merging results on uncooperative environment.
Cooperative Information Retrieval in Digital Libraries
Proceedings of the 18th BCS IRSG …, 1996
In recent years, the great expansion of distributed resources through for example the Internet has set-up a framework for the realisation of an age-old vision: gaining firsthand immediate access to vast amounts of information. The concept of a digital library is a step towards the realisation of this vision, and can be regarded (from a computer science perspective) simply as a distributed information system. Digital libraries bring new challenges to information retrieval (IR). These challenges are primarily in the areas of distributed retrieval, distributed storage, multimedia retrieval and, effective interfaces which provide rich user interactions. In this paper, we present an open agent-based hypermedia model for distributed digital libraries, but we focus mainly on a technique for using dynamic links between information sources, known as co-operative retrieval links, and the implications of this technique for the process and nature of distributed information retrieval.
Comparative Analysis of User Searching in Domain-Specific and Domain-Independent Digital Libraries
This note reports preliminary results of a comparative transaction log analysis study of user searching in two digital libraries in the United Statesa large-scale digital library in the domain of US history and a domain-independent federal-level digital library that aggregates digital collections regardless of their subject scope. This investigation reveals similarities in user search behavior among the two types of digital libraries with regard to rate of collection-level search use and application of search limits, most often occurring search categories, etc. At the same time, notable differences are observed in the rate of fielded search and phrase search use, average search query length and frequencies, and distribution of some search categories. This study provides empirical data to support digital library developers' decision making regarding audience-based information organization in large-scale digital libraries.
Large-scale digital library user searching: What role does domain play?
2013
This poster presents findings of an exploratory comparative analysis of user search queries in three largescale digital libraries: two domain-specific functioning in the domains of STEM education and US history, and one domain-independent (. This study measured search query lengths and frequencies, and categorized search queries into ten search categories based on content analysis. Results suggest that domain-based differences (i.e., differences in user searching between digital libraries representing different domains) are more substantial than differences in user searching between domain-specific and domain-independent digital libraries. Domain-based differences in distribution of search categories between search query length and search query frequencies are statistically significant. These findings may have implications for design and evaluation of large-scale aggregations of digitized materials.
An Explanatory Study on User Behavior in Discovering Aggregated Multimedia Web Content
IEEE Access
The recent advancements in the web allow users to generate multimedia content, resulting in multimedia information proliferation. Existing search engines provide access to multimedia content via a disjoint assembly of media-specific results called verticals. However, this decentralized assembly of media contents requires manual aggregation and synthesizing efforts at the user's end, hindering the information exploration process and subsequently may cause cognitive overload, hence, demanding innovative tools to discover multimedia content. The researchers have devised numerous state-of-the-art approaches; however, analysis to confirm the efficacy has little emphasis. This study investigates users' complex multimedia information-seeking behavior over state-of-the-art web search systems to unveil the user's informationseeking issues. Our research employs between-subjects study and post hoc analysis strategies to analyze participants' information-seeking characteristics. The study design adopted statistical hypothesis testing to consolidate previous user behavioral studies, confirm existing strategies, and present recommended practices for future general-purpose web search engines. The participants were assigned Google and an advanced discovery search system using the same multimedia dataset to ensure the obtained results' credibility. The primary behavioral parameters include search efforts, multimedia content exploration, search user interface (SUI), information management and presentation, and user cognition. This study uncovers several inadequacies of the search engines in meeting users' complex discovery needs, including 29.6% less user engagement, 43% system and searching dissatisfaction, and 32% less knowledge acquisition with 63.9% increased clicking effort on traditional search engines. The results confirmed previous user studies and suggest novel research recommendations statistically significant in multimedia information explorationrelated endeavors.
User Behavior Tendencies on Data Collections in a Digital Library
Lecture Notes in Computer Science, 2002
We compare the usage of a Digital Library with many different categories of collections, by examining its log files for a period of twenty months, and we conclude that the access points that the users mostly refer to, depend heavily on the type of content of the collection, the detail of the existing metadata and the target user group. We also found that most users tend to use simple query structures (e.g. only one search term) and very few and primitive operations to accomplish their request. Furthermore, as they get more experienced, they reduce the number of operations in their sessions.
THE EFFECT OF SPECIALIZED MULTIMEDIA COLLECTIONS ON WEB SEARCHING
Multimedia Web searching is a significant information activity for many people. Major Web search engines are critical resources in people's efforts to locate relevant online multimedia information. It is therefore important that we understand how searchers are utilizing these Web information systems in their quest to retrieve multimedia information to design effective Web systems in support of these information needs. In this paper, we report the results of a research study evaluating the effect of separate multimedia Web collections on individual searching behavior. The AltaVista search engine has an extensive multimedia collection and uses tabs to search specific collections. The motivating questions for this research are: (1) What are the characteristics of multimedia searching on AltaVista? and (2) What are the effects on Web searching of separate multimedia collections? The results of our research show that multimedia searching is complex relative to general Web searching. Searching specific multimedia collections has reduced the complexity of audio searching, but it has not had the same effect for image and video searching. Query length and Boolean usage rates are much higher for image searching, compared to general Web searching. We discuss the implications of the research findings for the design, development and evaluation of Web multimedia retrieval systems.
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '00, 2000
This article compares search effectiveness when using query-based lnternet search (via the Google search engine), directory-based search (via Yahoo) and phrasebased query reformulation assisted search (via the Hypefindex browser) by means of a controlled, userbased experimental study. The focus was to evaluate aspects of the search process. Cognitive load was • measured using a secondary digit-monitoring task to quantify the effort of the user in various search states; independent relevance judgements were employed to gauge the quality of the documents accessed during the search process. Time was monitored in various search states. Results indicated the directory-based search does not offer increased relevance over the query-based search (with or without query formulation assistance), and also takes longer. Query reformulation does significantly improve the relevance of the documents through which the user must trawl versus standard query-based internet search. However, the improvement in document relevance comes at the cost of increased search time and increased cognitive load. Keywords: navigation versus ad hoc search, monitoring user behaviour to improve search, field/empirical studies of the reformation seeking process, testing methodology. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that capias are not made or dtstributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a foe.