Blog Mining-Review and Extensions (original) (raw)
Related papers
Development of An Opinion Blog Mining System
—Current search engines apply numerous innovative searching methods that are adhered to retrieve documents or web pages containing certain queried keywords along with their synonyms. Alternatively, the user might be interested in not just finding relevant documents or web pages to a query but also extracting the opinions or reviews commented previously on some topic; this is called opinion mining. This research aims to develop an opinion blog mining system; in general; the development process of Opinion system involves employing different models and schemes related to several computer science disciplines such as data mining, databases, social networks and information retrieval. Also, this research proposes a new opinion mining method that passes several phases that include the crawling of the reviews related to a certain search Query based on user. The expected system will be developed using the .NET 2008 environment.
Blog mining through opinionated words
2006
Intent mining is a special kind of document analysis whose goal is to assess the attitude of the document author with respect to a given subject. Opinion mining is a kind of intent mining where the attitude is a positive or negative opinion. Most systems tackle the problem with a two step approach, an information retrieval followed by a postprocess or filter phase to identify opinionated blogs. We explored a single stage approach to opinion mining, retrieving opinionated documents ranked with a special ranking function which exploits an index enriched with opinion tags. A set of subjective words are used as tags for identifying opinionated sentences. Subjective words are marked as "opinionated" and are used in the retrieval phase to boost the rank of documents containing them. In indexing the collection, we recovered the relevant content from the blog permalink pages, exploiting HTML metadata about the generator and heuristics to remove irrelevant parts from the body. The index also contains information about the occurrence of opinionated words, extracted from an analysis of WordNet glosses. The experiments compared the precision of normal queries with respect to queries which included as constraint the proximity to an opinionated word. The results show a significant improvement in precision for both topic relevance and opinion relevance.
Positive, Negative, or Mixed? Mining Blogs for Opinions
2009
The rich non-factual information on the blogosphere presents interesting research questions. In this paper, we present a study on analysis of blog posts for their sentiment by using a generic sentiment lexicon. In particular, we applied Support Vector Machine to classify blog posts into three categories of opinions: positive, negative and mixed. We investigated the performance difference between global topic-independent and local topic-dependent opinion classification on a collection of blogs. Our experiment shows that topic-dependent classification performs significantly better than topic-independent classification, and this result indicates high interaction between sentiment words and topic.
Web opinion mining: how to extract opinions from blogs
2008
The growing popularity of Web 2.0 provides with increasing numbers of documents expressing opinions on different topics. Recently, new research approaches have been defined in order to automatically extract such opinions from the Internet. They usually consider opinions to be expressed through adjectives, and make extensive use of either general dictionaries or experts to provide the relevant adjectives. Unfortunately, these approaches suffer from the following drawback: in a specific domain, a given adjective may either not exist or have a different meaning from another domain. In this paper, we propose a new approach focusing on two steps. First, we automatically extract a learning dataset for a specific domain from the Internet. Secondly, from this learning set we extract the set of positive and negative adjectives relevant to the domain. The usefulness of our approach was demonstrated by experiments performed on real data.
Current Approaches to Data Mining Blogs
A summary of recent research dealing with data mining blogs for various purposes is outlined below. The papers have been organized into four categories based on the approach taken to data mining, the purpose of the research, and the type of analysis provided. The four categories therefore include articles 1. relating to tagging, classification and folksonomy in the blogosphere; 2. mining comments and links to determine blogging communitynetworks; 3. focusing on spatio-temporal data; 4. extracting information regarding bloggers' identity, behavior and/or mood. A brief conclusion along with the possibilities of future research is also presented.
Pbm: A new dataset for blog mining
Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new ge...
Finding Opinionated Blogs Using Statistical Classifiers and Lexical Features
Third International AAAI Conference on Weblogs and …, 2009
This paper systematically exploited various lexical features for opinion analysis on blog data using a statistical learning framework. Our experimental results using the TREC Blog track data show that all the features we explored effectively represent opinion expressions, and different classification strategies have a significant impact on opinion classification performance. We also present results when combining opinion analysis with the retrieval component for the task of retrieving relevant and opinionated blogs. Compared with the best results in the TREC evaluation, our system achieves reasonable performance, but does not rely on much human knowledge or deep level linguistic analysis.
Annotating opinion–evaluation of blogs
2008
This chapter deals with annotating opinions on a non-specific corpus of blogs. This work is motivated by a more general aim of building a generic method for detecting opinions. In accordance with this aim, we propose a linguistic model for the description of the opinion expression phenomenon.
Automatic Sentiment Monitoring of Specific Topics in the Blogosphere
DyNaK 2010 Dynamic …, 2010
Abstract. The classification of a text according to its sentiment is a task of raising relevance in many applications, including applications related to monitoring and tracking of the blogosphere. The blogosphere provides a rich source of information about products, ...
Entity Centric Opinion Mining from Blogs
24th International Conference on Computational Linguistics, 2012
With the growth of web 2.0, people are using it as a medium to express their opinion and thoughts. With the explosion of blogs, journal like user-generated content on the web, companies, celebrities and politicians are concerned about mining and analyzing the discussions about them or their products. In this paper, we present a method to perform opinion mining and summarize opinions at entity level for English blogs. We first identify various objects (named entities) which are talked about by the blogger, then we identify ...