Carlos Hurtado - Academia.edu (original) (raw)
Papers by Carlos Hurtado
One of the challenges in image and video retrieval is the content-based retrieval of images and v... more One of the challenges in image and video retrieval is the content-based retrieval of images and videos in the web. Less work has been done in this area, mainly due to scalability issues. For this reason, in this paper we investigate this problem by presenting tools for the characterization of the visual contents on specific web collections and a strategy for the search of faces in the web using visual and text information. A case study is also presented in a specific web domain.
Journal of The American Society for Information Science and Technology, 2007
In this paper, we present a framework for clustering Web search engine queries whose aim is to id... more In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
Over the past few years, there has been a great deal of research on the use of content and links ... more Over the past few years, there has been a great deal of research on the use of content and links of Web pages to improve the quality of Web page rankings returned by search engines. However, few formal approaches have considered the use of search engine logs to improve the rankings. In this paper we propose a ranking algorithm that uses the logs of search engines to boost their retrieval quality. The relevance of Web pages is estimated using the historical preferences of users that appear in the logs. The algorithm is based on a clustering process in which groups of semantically similar queries are identified. The method proposed is simple, has low computational cost, and we show with experiments that achieves good results.
Web usage mining is a main research area in Web mining focused on learning about Web users and th... more Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. Main challenges in Web usage mining are the application of data mining techniques to Web data in an efficient way and the discovery of non trivial user behaviour patterns. In this paper we focus the attention on search engines analyzing query log data and showing several models about how users search and how users use search engine results.
In this paper we propose a method that, given a query submitted to a search engine, suggests a li... more In this paper we propose a method that, given a query submitted to a search engine, suggests a list of related queries. The related queries are based in previously issued queries, and can be issued by the user to the search engine to tune or redirect the search process. The method proposed is based on a query clustering process in which groups of semantically similar queries are identified. The clustering process uses the content of historical preferences of users registered in the query log of the search engine. The method not only discovers the related queries, but also ranks them according to a relevance criterion. Finally, we show with experiments over the query log of a search engine the effectiveness of the method.
Query logs record past query sessions across a time span. A statistical model is proposed to expl... more Query logs record past query sessions across a time span. A statistical model is proposed to explain the log generation process. Within a search engine list of results, the model explains the document selection – a user’s click – by taking into account both a document position and its popularity. We show that it is possible to quantify this influence and consequently estimate document “un-biased” popularities. Among other applications, this allows to re-order the result list to match more closely user preferences and to use the logs as a feedback to improve search engines.
@ De la edición: RA-MA 1995 MARCAS COMERCIALES: RA-MA ha intentado a lo largo de este libro disti... more @ De la edición: RA-MA 1995 MARCAS COMERCIALES: RA-MA ha intentado a lo largo de este libro distinguir las marcas registradas de los términos descriptivos, siguiendo el estilo de mayúsculas que utiliza el fabricante, sin intención de infringir la marca y sólo en beneficio del propietario de la misma.
One of the challenges in image and video retrieval is the content-based retrieval of images and v... more One of the challenges in image and video retrieval is the content-based retrieval of images and videos in the web. Less work has been done in this area, mainly due to scalability issues. For this reason, in this paper we investigate this problem by presenting tools for the characterization of the visual contents on specific web collections and a strategy for the search of faces in the web using visual and text information. A case study is also presented in a specific web domain.
Journal of The American Society for Information Science and Technology, 2007
In this paper, we present a framework for clustering Web search engine queries whose aim is to id... more In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
Over the past few years, there has been a great deal of research on the use of content and links ... more Over the past few years, there has been a great deal of research on the use of content and links of Web pages to improve the quality of Web page rankings returned by search engines. However, few formal approaches have considered the use of search engine logs to improve the rankings. In this paper we propose a ranking algorithm that uses the logs of search engines to boost their retrieval quality. The relevance of Web pages is estimated using the historical preferences of users that appear in the logs. The algorithm is based on a clustering process in which groups of semantically similar queries are identified. The method proposed is simple, has low computational cost, and we show with experiments that achieves good results.
Web usage mining is a main research area in Web mining focused on learning about Web users and th... more Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. Main challenges in Web usage mining are the application of data mining techniques to Web data in an efficient way and the discovery of non trivial user behaviour patterns. In this paper we focus the attention on search engines analyzing query log data and showing several models about how users search and how users use search engine results.
In this paper we propose a method that, given a query submitted to a search engine, suggests a li... more In this paper we propose a method that, given a query submitted to a search engine, suggests a list of related queries. The related queries are based in previously issued queries, and can be issued by the user to the search engine to tune or redirect the search process. The method proposed is based on a query clustering process in which groups of semantically similar queries are identified. The clustering process uses the content of historical preferences of users registered in the query log of the search engine. The method not only discovers the related queries, but also ranks them according to a relevance criterion. Finally, we show with experiments over the query log of a search engine the effectiveness of the method.
Query logs record past query sessions across a time span. A statistical model is proposed to expl... more Query logs record past query sessions across a time span. A statistical model is proposed to explain the log generation process. Within a search engine list of results, the model explains the document selection – a user’s click – by taking into account both a document position and its popularity. We show that it is possible to quantify this influence and consequently estimate document “un-biased” popularities. Among other applications, this allows to re-order the result list to match more closely user preferences and to use the logs as a feedback to improve search engines.
@ De la edición: RA-MA 1995 MARCAS COMERCIALES: RA-MA ha intentado a lo largo de este libro disti... more @ De la edición: RA-MA 1995 MARCAS COMERCIALES: RA-MA ha intentado a lo largo de este libro distinguir las marcas registradas de los términos descriptivos, siguiendo el estilo de mayúsculas que utiliza el fabricante, sin intención de infringir la marca y sólo en beneficio del propietario de la misma.