Wan-Ching Wu - Academia.edu (original) (raw)
Papers by Wan-Ching Wu
Previous research has shown that information seekers in biomedical domain need more support in fo... more Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems.
This research sought to explain online searchers' stopping behaviors when interacting with se... more This research sought to explain online searchers' stopping behaviors when interacting with search engine result pages (SERPs) using the theories of Information Scent and Need for Cognition (NFC). Specifically, the problems addressed were how: (1) information scent level, operationalized as the number of relevant documents on the first SERP, (2) information scent pattern, operationalized as the distribution of relevant and non-relevant results on the first SERP, and (3) NFC, a person's tendency to engage in and enjoy effortful cognitive activities measured by the Need for Cognition scale, impacted a person's search stopping behaviors. The two search stopping behaviors that were examined were query stopping, or the point at which a person decides to issue a new query, and task stopping, or the point at which a person decides to end the search task. A laboratory experiment was conducted with 48 participants, who were asked to gather information for six open-ended search tas...
The paper reported results from a user survey study conducted in an academic library setting that... more The paper reported results from a user survey study conducted in an academic library setting that aimed at investigating readers' borrowing decision making process. The research is motivated by adaptive decision making theory and Len's Model in cognitive psychology. Specifically, the research sets out the explore readers' reliance on various information sources when facing different search situations. The results show that readers adaptively make use of information sources in their immediate information environment and those in the library setting to learn about and judge the value of a title. It is hoped that the results will lend support to the provision of richer bibliographic information and connectivity among works to facilitate user judgment and browse-based access to library collection. The theoretical implications of the study on the development of human information seeking are also discussed.
Proceedings of the American Society for Information Science and Technology, 2009
This paper reports results of an evaluation study of MAP (Multi-faceted Access to PubMed), a meta... more This paper reports results of an evaluation study of MAP (Multi-faceted Access to PubMed), a metadata induced query suggestion interface for PubMed bibliographic search. A novel evaluation methodology was used to address the challenges involved in evaluating an IIR (Interactive Information Retrieval) system such as the MAP interface. The most significant aspect of this methodology is that, instead of using assigned tasks common in traditional IR evaluation, it asks real users with real search requests to search with real systems in an experimental setting. Several performance measures were created based on which comparisons were made between MAP and PubMed baseline. MAP was shown to perform better in several of these measures, especially when the search requests had not been attempted before. The finding pointed to search characteristics as an important intervening variable in IIR evaluation. The advantages of and potential threats to our methodology were also discussed.
International Journal of Medical Informatics, 2013
Previous research has shown that information seekers in biomedical domain need more support in fo... more Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems.
Proceedings of the American Society for Information Science and Technology, 2010
Social navigation tools were developed with an aim to guide user exploration of an information sp... more Social navigation tools were developed with an aim to guide user exploration of an information space and to inform users' decision making processes (Dieberger, Dourish, Hook, Resnick, & Wexelblat, 2000). In the online bookstore setting, social navigation tools such as book recommendations, user tags and customer reviews address information needs not expressible with keyword search so as to facilitate exploratory activities, which may enhance subjective search experience. In order to examine whether online social navigation tools influence the affective aspects of user experience, theory of flow is applied in this study to form a new evaluation methodology. Impacts of social navigation tools on behavioral variance are also discussed.
Cardiac rehabilitation (CR), consisting of exercise and diet modifications, has been proven to pr... more Cardiac rehabilitation (CR), consisting of exercise and diet modifications, has been proven to promote a healthy lifestyle that can extend life, particularly for survivors of cardiovascular events. Nonetheless, there is a long-standing concern regarding the underutilization of CR in general and especially by women. The American Association of Cardiovascular and Pulmonary Rehabilitation recommends all eligible persons be referred and participate in a CR program. However, participation and adherence to CR remain low. There appears to be a CR referral information gap in many instances, and thus focus groups that were conducted investigated four main research questions. First, what did CR mean for these former CR participants? Second, how did participants find out about CR? Third, what kind of referral information was received? Last, what information should prospective cardiac rehabilitation program participants receive? The poster will present the background and motivation for the stud...
Proceedings of the American Society for Information Science and Technology, 2014
The goal of the study is to understand the factors that influence people's search stopping behavi... more The goal of the study is to understand the factors that influence people's search stopping behaviors during online information search. Past research on search stopping behavior has primarily focused on the stopping behavior that takes place at the conclusion of an information-seeking task. However, in this study we focus on two types of stopping behaviors that take place during information search tasks: query abandonment, or the point at which a person decides to stop his/her current query and enter a new one, and task stopping, or the point at which a person decides to stop the search task. A laboratory study was conducted with 48 participants who were asked to complete a set of six assigned search tasks and were interviewed about their experiences and search strategies after search. Results show that participants made query abandonment decisions based on the properties of search results, of queries and of search tasks. Their decisions to stop a task were influenced by the content they had examined, the goal they wished to achieve, the subjective perceptions they felt, and the study constraints they faced.
Proceedings of the American Society for Information Science and Technology, 2014
Collaborative search is concerned with how people work together to address a common information n... more Collaborative search is concerned with how people work together to address a common information need. In this paper, we investigate two factors that may influence collaborative search behaviors: providing awareness information about collaborators' prior activities, and the orientation of the shared task. For awareness, we examined two levels: an aware condition in which participants could see their collaborators' prior activity in the system, and a non-aware condition in which they could only see their own history. For orientation, we investigated a task with an open set of goals compared to a recall-oriented task that asked participants to find as many relevant documents as possible. Forty-one participants in a laboratory study used a prototype system called ResultsSpace to complete an asynchronous collaborative search task with three simulated collaborators. We developed a novel set of measures for examining collaborative behavior, and found that participants who were aware of their collaborators' prior actions issued more query terms in common with their collaborators and tended to avoid rating or viewing documents that collaborators had already rated as compared to the non-aware group. We also observed an interaction between awareness and task-orientation in terms of the number of unique documents found, suggesting that task orientation could change the direction of the effect of awareness on document space exploration.
Proceedings of the 2015 International Conference on The Theory of Information Retrieval, 2015
One of the most challenging aspects of designing interactive information retrieval (IIR) experime... more One of the most challenging aspects of designing interactive information retrieval (IIR) experiments with users is the development of search tasks. We describe an evaluation of 20 search tasks that were designed for use in IIR experiments and developed using a cognitive complexity framework from educational theory. The search tasks represent five levels of cognitive complexity and four topical domains. The tasks were evaluated in the context of a laboratory IIR experiment with 48 participants. Behavioral and self-report data were used to characterize and understand differences among tasks. Results showed more cognitively complex tasks required significantly more search activity from participants (e.g., more queries, clicks, and time to complete). However, participants did not evaluate more cognitively complex tasks as more difficult and were equally satisfied with their performances across tasks. Our work makes four contributions: (1) it adds to what is known about the relationship among task, search behaviors and user experience; (2) it presents a framework for task creation and evaluation; (3) it provides tasks and questionnaires that can be reused by others and (4) it raises questions about findings and assumptions of many recent studies that only use behavioral signals from search logs as evidence for task difficulty and searcher satisfaction, as many of our results directly contradict these findings.
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13, 2013
Aggregated search is the task of incorporating results from different search services, or vertica... more Aggregated search is the task of incorporating results from different search services, or verticals, into the web search results. Aggregated search coherence refers to the extent to which results from different sources focus on similar senses of a given query. Prior research investigated aggregated search coherence between images and web results. A user study showed that users are more likely to interact with the web results when the images are more consistent with the intended query-sense. We build upon this work and address three outstanding research questions about aggregated search coherence: (1) Does the same "spill-over" effect generalize to other verticals besides images? (2) Is the effect stronger when the vertical results include image thumbnails? and (3) What factors influence if and when a spill-over occurs from a user's perspective? We investigate these questions using a large-scale crowdsourcing study and a smallerscale laboratory study. Results suggest that the spill-over effect occurs for some verticals (images, shopping, video), but not others (news), and that including thumbnails in the vertical results has little effect. Qualitative data from our laboratory study provides insights about participants' actions and thought-processes when faced with (in)coherent results.
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, 2013
Human assessments of document relevance are needed for the construction of test collections, for ... more Human assessments of document relevance are needed for the construction of test collections, for ad-hoc evaluation, and for training text classifiers. Showing documents to assessors in different orderings, however, may lead to different assessment outcomes. We examine the effect that threshold priming, seeing varying degrees of relevant documents, has on people's calibration of relevance. Participants judged the relevance of a prologue of documents containing highly relevant, moderately relevant, or non-relevant documents, followed by a common epilogue of documents of mixed relevance. We observe that participants exposed to only non-relevant documents in the prologue assigned significantly higher average relevance scores to prologue and epilogue documents than participants exposed to moderately or highly relevant documents in the prologue. We also examine how need for cognition, an individual difference measure of the extent to which a person enjoys engaging in effortful cognitive activity, impacts relevance assessments. High need for cognition participants had a significantly higher level of agreement with expert assessors than low need for cognition participants did. Our findings indicate that assessors should be exposed to documents from multiple relevance levels early in the judging process, in order to calibrate their relevance thresholds in a balanced way, and that individual difference measures might be a useful way to screen assessors.
Proceedings of the 4th Information Interaction in Context Symposium, 2012
One of the most challenging aspects of designing an interactive information retrieval (IIR) study... more One of the most challenging aspects of designing an interactive information retrieval (IIR) study is the development of search tasks. In this paper, we present preliminary results of a study designed to evaluate a set of search tasks that were developed for use in IIR studies. We created 20 search tasks using five levels of cognitive complexity and four domains, and conducted a laboratory evaluation of these tasks with 48 undergraduate subjects. We describe preliminary results from an analysis of data from 24 subjects for 10 search tasks. Initial results show that, in general, as cognitive complexity increased, subjects issued more queries, clicked on more search results, viewed more URLs and took more time to complete the task. Subjects' expected and experienced difficulty ratings of tasks generally increased as cognitive complexity increased with some exceptions. When subjects were asked to rank tasks according to difficulty and engagement, tasks with higher cognitive complexity were rated as more difficult than tasks with lower cognitive complexity, but not necessarily as more engaging. These preliminary results suggest that behaviors and ratings are fairly consistent with the differences one might expect among the search tasks and provide initial evidence of the usefulness of these tasks in IIR studies.
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 2012
Aggregated search is the task of blending results from specialized search services or verticals i... more Aggregated search is the task of blending results from specialized search services or verticals into the Web search results. While many studies have focused on aggregated search techniques, few studies have tried to better understand how users interact with aggregated search results. This study investigates how task complexity and vertical display (the blending of vertical results into the web results) affect the use of vertical content. Twenty-nine subjects completed six search tasks of varying levels of task complexity using two aggregated search interfaces: one that blended vertical results into the web results and one that only provided indirect vertical access. Our results show that more complex tasks required significantly more interaction and that subjects completing these tasks examined more vertical results. While the amount of interaction was the same between interfaces, subjects clicked on more vertical results when these were blended into the web results. Our results also show an interaction between task complexity and vertical display; subjects clicked on more verticals when completing the more complex tasks with the interface that blended vertical results. Subjects' evaluations of the two interfaces were nearly identical, but when analyzed with respect to their interface preferences, we found a positive relationship between system evaluations and individual preferences. Subjects justified their preference using similar rationales and their comments illustrate how the display itself can influence judgments of information quality, especially in cases when the vertical results might not be relevant to the search task.
Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014
The purpose of this study is to investigate the extent to which two theories, Information Scent a... more The purpose of this study is to investigate the extent to which two theories, Information Scent and Need for Cognition, explain people's search behaviors when interacting with search engine results pages (SERPs). Information Scent, the perception of the value of information sources, was manipulated by varying the number and distribution of relevant results on the first SERP. Need for Cognition (NFC), a personality trait that measures the extent to which a person enjoys cognitively effortful activities, was measured by a standardized scale. A laboratory experiment was conducted with forty-eight participants, who completed six openended search tasks. Results showed that while interacting with SERPs containing more relevant documents, participants examined more documents and clicked deeper in the search result list. When interacting with SERPs that contained the same number of relevant results distributed across different ranks, participants were more likely to abandon their queries when relevant documents appeared later on the SERP. With respect to NFC, participants with higher NFC paginated less frequently and paid less attention to results at lower ranks than those with lower NFC. The interaction between NFC and the number of relevant results on the SERP affected the time spent on searching and a participant's likelihood to reformulate, paginate and stop. Our findings suggest evaluating system effectiveness based on the first page of results, even for tasks that require the user to view multiple documents, and varying interface features based on NFC.
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12, 2012
Although a great deal of research has been conducted about automatic techniques for determining q... more Although a great deal of research has been conducted about automatic techniques for determining query quality, there have been relatively few studies about how people judge query quality. This study investigated this topic through a laboratory experiment with 40 subjects. Subjects were shown eight information problems (five fact-finding and three exploratory) and asked to evaluate queries for these problems according to several quality attributes. Subjects then evaluated search engine results pages (SERPs) for each query, which were manipulated to exhibit different levels of performance. Following this, subjects reevaluated the queries, were interviewed about their evaluation approaches and repeated the rating procedure for two information problems. Results showed that for fact-finding information problems, longer queries received higher ratings (both initial and post-SERP), and that post-SERP query ratings were more affected by the proportion of relevant documents viewed to all documents viewed rather than the ranks of the relevant documents. For exploratory information problems, subjects' ratings were highly correlated with the number of relevant documents in the SERP as well as the proportion of relevant documents viewed. Subjects adopted several approaches when evaluating query quality, which led to different quality ratings. Finally, during the reliability check subjects' initial evaluations were fairly stable, but their post-SERP evaluations significantly increased.
Previous research has shown that information seekers in biomedical domain need more support in fo... more Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems.
This research sought to explain online searchers' stopping behaviors when interacting with se... more This research sought to explain online searchers' stopping behaviors when interacting with search engine result pages (SERPs) using the theories of Information Scent and Need for Cognition (NFC). Specifically, the problems addressed were how: (1) information scent level, operationalized as the number of relevant documents on the first SERP, (2) information scent pattern, operationalized as the distribution of relevant and non-relevant results on the first SERP, and (3) NFC, a person's tendency to engage in and enjoy effortful cognitive activities measured by the Need for Cognition scale, impacted a person's search stopping behaviors. The two search stopping behaviors that were examined were query stopping, or the point at which a person decides to issue a new query, and task stopping, or the point at which a person decides to end the search task. A laboratory experiment was conducted with 48 participants, who were asked to gather information for six open-ended search tas...
The paper reported results from a user survey study conducted in an academic library setting that... more The paper reported results from a user survey study conducted in an academic library setting that aimed at investigating readers' borrowing decision making process. The research is motivated by adaptive decision making theory and Len's Model in cognitive psychology. Specifically, the research sets out the explore readers' reliance on various information sources when facing different search situations. The results show that readers adaptively make use of information sources in their immediate information environment and those in the library setting to learn about and judge the value of a title. It is hoped that the results will lend support to the provision of richer bibliographic information and connectivity among works to facilitate user judgment and browse-based access to library collection. The theoretical implications of the study on the development of human information seeking are also discussed.
Proceedings of the American Society for Information Science and Technology, 2009
This paper reports results of an evaluation study of MAP (Multi-faceted Access to PubMed), a meta... more This paper reports results of an evaluation study of MAP (Multi-faceted Access to PubMed), a metadata induced query suggestion interface for PubMed bibliographic search. A novel evaluation methodology was used to address the challenges involved in evaluating an IIR (Interactive Information Retrieval) system such as the MAP interface. The most significant aspect of this methodology is that, instead of using assigned tasks common in traditional IR evaluation, it asks real users with real search requests to search with real systems in an experimental setting. Several performance measures were created based on which comparisons were made between MAP and PubMed baseline. MAP was shown to perform better in several of these measures, especially when the search requests had not been attempted before. The finding pointed to search characteristics as an important intervening variable in IIR evaluation. The advantages of and potential threats to our methodology were also discussed.
International Journal of Medical Informatics, 2013
Previous research has shown that information seekers in biomedical domain need more support in fo... more Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems.
Proceedings of the American Society for Information Science and Technology, 2010
Social navigation tools were developed with an aim to guide user exploration of an information sp... more Social navigation tools were developed with an aim to guide user exploration of an information space and to inform users' decision making processes (Dieberger, Dourish, Hook, Resnick, & Wexelblat, 2000). In the online bookstore setting, social navigation tools such as book recommendations, user tags and customer reviews address information needs not expressible with keyword search so as to facilitate exploratory activities, which may enhance subjective search experience. In order to examine whether online social navigation tools influence the affective aspects of user experience, theory of flow is applied in this study to form a new evaluation methodology. Impacts of social navigation tools on behavioral variance are also discussed.
Cardiac rehabilitation (CR), consisting of exercise and diet modifications, has been proven to pr... more Cardiac rehabilitation (CR), consisting of exercise and diet modifications, has been proven to promote a healthy lifestyle that can extend life, particularly for survivors of cardiovascular events. Nonetheless, there is a long-standing concern regarding the underutilization of CR in general and especially by women. The American Association of Cardiovascular and Pulmonary Rehabilitation recommends all eligible persons be referred and participate in a CR program. However, participation and adherence to CR remain low. There appears to be a CR referral information gap in many instances, and thus focus groups that were conducted investigated four main research questions. First, what did CR mean for these former CR participants? Second, how did participants find out about CR? Third, what kind of referral information was received? Last, what information should prospective cardiac rehabilitation program participants receive? The poster will present the background and motivation for the stud...
Proceedings of the American Society for Information Science and Technology, 2014
The goal of the study is to understand the factors that influence people's search stopping behavi... more The goal of the study is to understand the factors that influence people's search stopping behaviors during online information search. Past research on search stopping behavior has primarily focused on the stopping behavior that takes place at the conclusion of an information-seeking task. However, in this study we focus on two types of stopping behaviors that take place during information search tasks: query abandonment, or the point at which a person decides to stop his/her current query and enter a new one, and task stopping, or the point at which a person decides to stop the search task. A laboratory study was conducted with 48 participants who were asked to complete a set of six assigned search tasks and were interviewed about their experiences and search strategies after search. Results show that participants made query abandonment decisions based on the properties of search results, of queries and of search tasks. Their decisions to stop a task were influenced by the content they had examined, the goal they wished to achieve, the subjective perceptions they felt, and the study constraints they faced.
Proceedings of the American Society for Information Science and Technology, 2014
Collaborative search is concerned with how people work together to address a common information n... more Collaborative search is concerned with how people work together to address a common information need. In this paper, we investigate two factors that may influence collaborative search behaviors: providing awareness information about collaborators' prior activities, and the orientation of the shared task. For awareness, we examined two levels: an aware condition in which participants could see their collaborators' prior activity in the system, and a non-aware condition in which they could only see their own history. For orientation, we investigated a task with an open set of goals compared to a recall-oriented task that asked participants to find as many relevant documents as possible. Forty-one participants in a laboratory study used a prototype system called ResultsSpace to complete an asynchronous collaborative search task with three simulated collaborators. We developed a novel set of measures for examining collaborative behavior, and found that participants who were aware of their collaborators' prior actions issued more query terms in common with their collaborators and tended to avoid rating or viewing documents that collaborators had already rated as compared to the non-aware group. We also observed an interaction between awareness and task-orientation in terms of the number of unique documents found, suggesting that task orientation could change the direction of the effect of awareness on document space exploration.
Proceedings of the 2015 International Conference on The Theory of Information Retrieval, 2015
One of the most challenging aspects of designing interactive information retrieval (IIR) experime... more One of the most challenging aspects of designing interactive information retrieval (IIR) experiments with users is the development of search tasks. We describe an evaluation of 20 search tasks that were designed for use in IIR experiments and developed using a cognitive complexity framework from educational theory. The search tasks represent five levels of cognitive complexity and four topical domains. The tasks were evaluated in the context of a laboratory IIR experiment with 48 participants. Behavioral and self-report data were used to characterize and understand differences among tasks. Results showed more cognitively complex tasks required significantly more search activity from participants (e.g., more queries, clicks, and time to complete). However, participants did not evaluate more cognitively complex tasks as more difficult and were equally satisfied with their performances across tasks. Our work makes four contributions: (1) it adds to what is known about the relationship among task, search behaviors and user experience; (2) it presents a framework for task creation and evaluation; (3) it provides tasks and questionnaires that can be reused by others and (4) it raises questions about findings and assumptions of many recent studies that only use behavioral signals from search logs as evidence for task difficulty and searcher satisfaction, as many of our results directly contradict these findings.
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13, 2013
Aggregated search is the task of incorporating results from different search services, or vertica... more Aggregated search is the task of incorporating results from different search services, or verticals, into the web search results. Aggregated search coherence refers to the extent to which results from different sources focus on similar senses of a given query. Prior research investigated aggregated search coherence between images and web results. A user study showed that users are more likely to interact with the web results when the images are more consistent with the intended query-sense. We build upon this work and address three outstanding research questions about aggregated search coherence: (1) Does the same "spill-over" effect generalize to other verticals besides images? (2) Is the effect stronger when the vertical results include image thumbnails? and (3) What factors influence if and when a spill-over occurs from a user's perspective? We investigate these questions using a large-scale crowdsourcing study and a smallerscale laboratory study. Results suggest that the spill-over effect occurs for some verticals (images, shopping, video), but not others (news), and that including thumbnails in the vertical results has little effect. Qualitative data from our laboratory study provides insights about participants' actions and thought-processes when faced with (in)coherent results.
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, 2013
Human assessments of document relevance are needed for the construction of test collections, for ... more Human assessments of document relevance are needed for the construction of test collections, for ad-hoc evaluation, and for training text classifiers. Showing documents to assessors in different orderings, however, may lead to different assessment outcomes. We examine the effect that threshold priming, seeing varying degrees of relevant documents, has on people's calibration of relevance. Participants judged the relevance of a prologue of documents containing highly relevant, moderately relevant, or non-relevant documents, followed by a common epilogue of documents of mixed relevance. We observe that participants exposed to only non-relevant documents in the prologue assigned significantly higher average relevance scores to prologue and epilogue documents than participants exposed to moderately or highly relevant documents in the prologue. We also examine how need for cognition, an individual difference measure of the extent to which a person enjoys engaging in effortful cognitive activity, impacts relevance assessments. High need for cognition participants had a significantly higher level of agreement with expert assessors than low need for cognition participants did. Our findings indicate that assessors should be exposed to documents from multiple relevance levels early in the judging process, in order to calibrate their relevance thresholds in a balanced way, and that individual difference measures might be a useful way to screen assessors.
Proceedings of the 4th Information Interaction in Context Symposium, 2012
One of the most challenging aspects of designing an interactive information retrieval (IIR) study... more One of the most challenging aspects of designing an interactive information retrieval (IIR) study is the development of search tasks. In this paper, we present preliminary results of a study designed to evaluate a set of search tasks that were developed for use in IIR studies. We created 20 search tasks using five levels of cognitive complexity and four domains, and conducted a laboratory evaluation of these tasks with 48 undergraduate subjects. We describe preliminary results from an analysis of data from 24 subjects for 10 search tasks. Initial results show that, in general, as cognitive complexity increased, subjects issued more queries, clicked on more search results, viewed more URLs and took more time to complete the task. Subjects' expected and experienced difficulty ratings of tasks generally increased as cognitive complexity increased with some exceptions. When subjects were asked to rank tasks according to difficulty and engagement, tasks with higher cognitive complexity were rated as more difficult than tasks with lower cognitive complexity, but not necessarily as more engaging. These preliminary results suggest that behaviors and ratings are fairly consistent with the differences one might expect among the search tasks and provide initial evidence of the usefulness of these tasks in IIR studies.
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 2012
Aggregated search is the task of blending results from specialized search services or verticals i... more Aggregated search is the task of blending results from specialized search services or verticals into the Web search results. While many studies have focused on aggregated search techniques, few studies have tried to better understand how users interact with aggregated search results. This study investigates how task complexity and vertical display (the blending of vertical results into the web results) affect the use of vertical content. Twenty-nine subjects completed six search tasks of varying levels of task complexity using two aggregated search interfaces: one that blended vertical results into the web results and one that only provided indirect vertical access. Our results show that more complex tasks required significantly more interaction and that subjects completing these tasks examined more vertical results. While the amount of interaction was the same between interfaces, subjects clicked on more vertical results when these were blended into the web results. Our results also show an interaction between task complexity and vertical display; subjects clicked on more verticals when completing the more complex tasks with the interface that blended vertical results. Subjects' evaluations of the two interfaces were nearly identical, but when analyzed with respect to their interface preferences, we found a positive relationship between system evaluations and individual preferences. Subjects justified their preference using similar rationales and their comments illustrate how the display itself can influence judgments of information quality, especially in cases when the vertical results might not be relevant to the search task.
Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014
The purpose of this study is to investigate the extent to which two theories, Information Scent a... more The purpose of this study is to investigate the extent to which two theories, Information Scent and Need for Cognition, explain people's search behaviors when interacting with search engine results pages (SERPs). Information Scent, the perception of the value of information sources, was manipulated by varying the number and distribution of relevant results on the first SERP. Need for Cognition (NFC), a personality trait that measures the extent to which a person enjoys cognitively effortful activities, was measured by a standardized scale. A laboratory experiment was conducted with forty-eight participants, who completed six openended search tasks. Results showed that while interacting with SERPs containing more relevant documents, participants examined more documents and clicked deeper in the search result list. When interacting with SERPs that contained the same number of relevant results distributed across different ranks, participants were more likely to abandon their queries when relevant documents appeared later on the SERP. With respect to NFC, participants with higher NFC paginated less frequently and paid less attention to results at lower ranks than those with lower NFC. The interaction between NFC and the number of relevant results on the SERP affected the time spent on searching and a participant's likelihood to reformulate, paginate and stop. Our findings suggest evaluating system effectiveness based on the first page of results, even for tasks that require the user to view multiple documents, and varying interface features based on NFC.
Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12, 2012
Although a great deal of research has been conducted about automatic techniques for determining q... more Although a great deal of research has been conducted about automatic techniques for determining query quality, there have been relatively few studies about how people judge query quality. This study investigated this topic through a laboratory experiment with 40 subjects. Subjects were shown eight information problems (five fact-finding and three exploratory) and asked to evaluate queries for these problems according to several quality attributes. Subjects then evaluated search engine results pages (SERPs) for each query, which were manipulated to exhibit different levels of performance. Following this, subjects reevaluated the queries, were interviewed about their evaluation approaches and repeated the rating procedure for two information problems. Results showed that for fact-finding information problems, longer queries received higher ratings (both initial and post-SERP), and that post-SERP query ratings were more affected by the proportion of relevant documents viewed to all documents viewed rather than the ranks of the relevant documents. For exploratory information problems, subjects' ratings were highly correlated with the number of relevant documents in the SERP as well as the proportion of relevant documents viewed. Subjects adopted several approaches when evaluating query quality, which led to different quality ratings. Finally, during the reliability check subjects' initial evaluations were fairly stable, but their post-SERP evaluations significantly increased.