An effective statistical approach to blog post opinion retrieval (original) (raw)

Finding Opinionated Blogs Using Statistical Classifiers and Lexical Features

Third International AAAI Conference on Weblogs and …, 2009

This paper systematically exploited various lexical features for opinion analysis on blog data using a statistical learning framework. Our experimental results using the TREC Blog track data show that all the features we explored effectively represent opinion expressions, and different classification strategies have a significant impact on opinion classification performance. We also present results when combining opinion analysis with the retrieval component for the task of retrieving relevant and opinionated blogs. Compared with the best results in the TREC evaluation, our system achieves reasonable performance, but does not rely on much human knowledge or deep level linguistic analysis.

Blog mining through opinionated words

2006

Intent mining is a special kind of document analysis whose goal is to assess the attitude of the document author with respect to a given subject. Opinion mining is a kind of intent mining where the attitude is a positive or negative opinion. Most systems tackle the problem with a two step approach, an information retrieval followed by a postprocess or filter phase to identify opinionated blogs. We explored a single stage approach to opinion mining, retrieving opinionated documents ranked with a special ranking function which exploits an index enriched with opinion tags. A set of subjective words are used as tags for identifying opinionated sentences. Subjective words are marked as "opinionated" and are used in the retrieval phase to boost the rank of documents containing them. In indexing the collection, we recovered the relevant content from the blog permalink pages, exploiting HTML metadata about the generator and heuristics to remove irrelevant parts from the body. The index also contains information about the occurrence of opinionated words, extracted from an analysis of WordNet glosses. The experiments compared the precision of normal queries with respect to queries which included as constraint the proximity to an opinionated word. The results show a significant improvement in precision for both topic relevance and opinion relevance.

Integrating proximity to subjective sentences for blog opinion retrieval

2009

Abstract. Opinion finding is a challenging retrieval task, where it has been shown that it is especially difficult to improve over a strongly performing topic-relevance baseline. In this paper, we propose a novel approach for opinion finding, which takes into account the proximity of query terms to subjective sentences in a document. We adapt two stateof-the-art opinion detection techniques to identify subjective sentences from the retrieved documents.

Fdu at trec 2007: opinion retrieval of blog track

2007

This paper describes our participation in the opinion retrieval task at Blog Track 07. The system consisted of the preprocess part, the topic retrieval part and sentiment analysis part. In the topic retrieval part, we adopted pseudo-relevance feedback and a novel approach to form a modified query. In the sentiment analysis part, each blog post was given an opinion score based on the sentences contained in this post. The subjectivity of each sentence was predicted by a CME classifier. Then the blog posts were reranked based on the similarity given by the topic retrieval and the opinion score given by the sentiment analysis.

The BlogVox Opinion Retrieval System

2006

The BlogVox system retrieves opinionated blog posts specified by ad hoc queries. BlogVox was developed for the 2006 TREC blog track by the University of Maryland, Baltimore County and the Johns Hopkins University Applied Physics Laboratory using a novel system to recognize legitimate posts and discriminate against spam blogs. It also processes posts to eliminate extraneous non-content, including blog-rolls, link-rolls, advertisements and sidebars. After retrieving posts relevant to a topic query, the system processes them to produce a set of independent features estimating the likelihood that a post expresses an opinion about the topic. These are combined using an SVM-based system and integrated with the relevancy score to rank the results. We evaluate BlogVox's performance against human assessors. We also evaluate the individual splog filtering and non-content removal components of BlogVox.

Facet-based opinion retrieval from blogs

Information Processing & Management, 2010

The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback–Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjec- tive words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective.

Fusion Approach to Finding Opinions in Blogosphere

2007

In this paper, we describe a fusion approach to finding opinion about a given target in blog postings. We tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: ontopic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs using combined opinion scores generated by four opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., "sooo good") for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguistics' distribution similarity approach to learn the subjective language from training data.

Positive, Negative, or Mixed? Mining Blogs for Opinions

2009

The rich non-factual information on the blogosphere presents interesting research questions. In this paper, we present a study on analysis of blog posts for their sentiment by using a generic sentiment lexicon. In particular, we applied Support Vector Machine to classify blog posts into three categories of opinions: positive, negative and mixed. We investigated the performance difference between global topic-independent and local topic-dependent opinion classification on a collection of blogs. Our experiment shows that topic-dependent classification performs significantly better than topic-independent classification, and this result indicates high interaction between sentiment words and topic.

Improving opinionated blog retrieval effectiveness with quality measures and temporal features

World Wide Web Journal, 2014

The massive acceptance and usage of the blog communities by a significant portion of the Web users has rendered knowledge extraction from blogs a particularly important research field. One of the most interesting related problems is the issue of the opinionated retrieval, that is, the retrieval of blog entries which contain opinions about a topic. There has been a remarkable amount of work towards the improvement of the effectiveness of the opinion retrieval systems. The primary objective of these systems is to retrieve blog posts which are both relevant to a given query and contain opinions, and generate a ranked list of the retrieved documents according to the relevance and opinion scores. Although a wide variety of effective opinion retrieval methods have been proposed, to the best of our knowledge, none of them takes into consideration the issue of the importance of the retrieved opinions. In this work we introduce a ranking model which combines the existing retrieval strategies with query-independent information to enhance the ranking of the opinionated documents. More specifically, our model accounts for the influence of the blogger who authored an opinion, the reputation of the blog site which published a specific blog post, and the impact of the post itself. Furthermore, we expand the current proximity-based opinion scoring strategies by considering the physical locations of the query and opinion terms within a document. We conduct extensive experiments with the TREC Blogs08 dataset which demonstrate that the application of our methods enhances retrieval precision by a significant margin.