Modeling user reputation in wikis (original) (raw)
Related papers
L.: A content-driven reputation system for the wikipedia
2007
We present a content-driven reputation system for Wikipedia authors. In our system, authors gain reputa-tion when the edits they perform to Wikipedia articles are preserved by subsequent authors, and they lose reputation when their edits are rolled back or undone in short order. Thus, author reputation is computed solely on the basis of content evolution; user-to-user comments or ratings are not used. The author reputation we compute could be used to flag new contributions from low-reputation authors, or it could be used to allow only authors with high reputation to contribute to controversial or critical pages. A reputation system for the Wikipedia could also provide an incentive for high-quality contributions. We have implemented the proposed system, and we have used it to analyze the entire Italian and French Wikipedias, consisting of a total of 691,551 pages and 5,587,523 revi-sions. Our results show that our notion of reputation has good predictive value: changes performed by l...
WikiTrust: Content-Driven Reputation for the Wikipedia
2012
WikiTrust: Content-Driven Reputation for the Wikipedia B. Thomas Adler The Wikipedia was initially created to promote collaboration between writers before submitting their work to a peer review process, to address complaints about the speed of peer review. Ironically, the criticism most widely levied against the Wikipedia is the lack of accountability for authors, and the potential to misinform readers. There is a large community around the Wikipedia project which actively fixes errors as they are discovered, but an unending stream of vandals and spammers chip away at the good will of volunteers who maintain the project for the collective good. We suggest that vandalism detection systems can be used to help direct the volunteer effort on changes more likely to be a problem, making more efficient use of the project's human Thanks to Luca, who never gave up when I needed him and stuck with me through the ups and downs of the years. And to my labmates and co-authors, Marco, Vishwa, Ian, Pritam, Leandro, Krish, and Axel: your moral (and research) support was invaluable at so many points for keeping me engaged. Karen provided endless edits to my grammar and diction, often to my chagrin. Thanks to each of you: I learned a great deal and appreciate all that you shared with me. There are a few heroes that saw me through the darkest hours; I am indebted to you for your caring when mine failed me. And there are so many friends and family, new and old, that were characters in this adventure and gave me their support and encouragement and compassion. You all have my deepest thanks. I am the luckiest one. The text of this dissertation includes excerpts of previously published material; copyright of this material remains with its respective holders and appears here with their permission. Chapters 3, 4, and 6 expand on the initial paper presenting our contentdriven reputation ideas for the Wikipedia [2]. Chapter 5 is a reprint of our investigation into contribution measures [5]. Chapter 7 covers similar ground as (and includes some material from) two previously published works [4, 3]. Illustrations from PhD comics are copyright Jorge Cham [16], with many thanks.
Learning to Predict the Quality of Contributions to Wikipedia
Although some have argued that Wikipedia's open edit policy is one of the primary reasons for its success, it also raises concerns about quality -vandalism, bias, and errors can be problems. Despite these challenges, Wikipedia articles are often (perhaps surprisingly) of high quality, which many attribute to both the dedicated Wikipedia community and "good Samaritan" users. As Wikipedia continues to grow, however, it becomes more difficult for these users to keep up with the increasing number of articles and edits. This motivates the development of tools to assist users in creating and maintaining quality. In this paper, we propose metrics that quantify the quality of contributions to Wikipedia through implicit feedback from the community. We then learn discriminative probabilistic models that predict the quality of a new edit using features of the changes made, the author of the edit, and the article being edited. Through estimating parameters for these models, we also gain an understanding of factors that influence quality. We advocate using edit quality predictions and information gleaned from model analysis not to place restrictions on editing, but to instead alert users to potential quality problems, and to facilitate the development of additional incentives for contributors. We evaluate the edit quality prediction models on the Spanish Wikipedia. Experiments demonstrate that the models perform better when given access to content-based features of the edit, rather than only features of contributing user. This suggests that a user-based solution to the Wikipedia quality problem may not be sufficient.
A Vision for Performing Social and Economic Data Analysis using Wikipedia's Edit History
2017
In this vision paper, we suggest combining two lines of research to study the collective behavior of Wikipedia contributors. The first line of research analyzes Wikipedia's edit history to quantify the quality of individual contributions and the resulting reputation of the contributor. The second line of research surveys Wikipedia contributors to gain insights, e.g., on their personal and professional background, socioeconomic status, or motives to contribute toWikipedia. While both lines of research are valuable on their own, we argue that the combination of both approaches could yield insights that exceed the sum of the individual parts. Linking survey data to contributor reputation and content-based quality metrics could provide a large-scale, public domain data set to perform user modeling, i.e. deducing interest profiles of user groups. User profiles can, among other applications, help to improve recommender systems. The resulting dataset can also enable a better understand...
Using Language Models to Detect Wikipedia Vandalism
This paper explores a statistical language modeling approach for detecting Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The col-laborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism, defined as malicious editing intended to compromise the in-tegrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated ap-proach to alleviate the laborious process is essential. This paper offers first a categorization of Wikipedia van-dalism types and identifies technical challenges associated with detecting each category. In addition, this paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previo...
Collaborative Content-Based Method for Estimating User Reputation in Online Forums
Lecture Notes in Computer Science, 2015
Collaborative ratings of forum posts have been successfully applied in order to infer the reputations of forum users. Famous websites such as Slashdot or Stack Exchange allow their users to score messages in order to evaluate their content. These scores can be aggregated for each user in order to compute a reputation value in the forum. However, explicit rating functionalities are rarely used in many online communities such as health forums. At the same time, the textual content of the messages can reveal a lot of information regarding the trust that users have in the posted information. In this work, we propose to use these hidden expressions of trust in order to estimate user reputation in online forums.
Creating, destroying, and restoring value in Wikipedia
Proceedings of the …, 2007
Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed. Using several datasets, including recent logs of all article views, we show that frequent editors dominate what people see when they visit Wikipedia, and that this domination is increasing. * Similarly, using the same impact measure, we show that the probability of a typical article view being damaged is small but increasing, and we present empirically grounded classes of damage. Finally, we make policy recommendations for Wikipedia and other wikis in light of these findings.
Automated Detection of Sockpuppet Accounts in Wikipedia
2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
Wikipedia is a free Internet-based encyclopedia that is built and maintained via the open-source collaboration of a community of volunteers. Wikipedia's purpose is to benefit readers by acting as a widely accessible and free encyclopedia, a comprehensive written synopsis that contains information on all discovered branches of knowledge. The website has millions of pages that are maintained by thousands of volunteer editors. Unfortunately, given its open-editing format, Wikipedia is highly vulnerable to malicious activity, including vandalism, spam, undisclosed paid editing, etc. Malicious users often use sockpuppet accounts to circumvent a block or a ban imposed by Wikipedia administrators on the person's original account. A sockpuppet is an "online identity used for the purpose of deception." Usually, several sockpuppet accounts are controlled by a unique individual (or entity) called a puppetmaster. Currently, suspected sockpuppet accounts are manually verified by Wikipedia administrators, which makes the process slow and inefficient. The primary objective of this research is to develop an automated ML and neuralnetwork-based system to recognize the patterns of sockpuppet accounts as early as possible and recommend suspension. We address the problem as a binary classification task and propose a set of new features to capture suspicious behavior that considers user activity and analyzes the contributed content. To comply with this work, we have focused on account-based and content-based features. Our solution was bifurcated into developing a strategy to automatically detect and categorize suspicious edits made by the same author vii from multiple accounts. We hypothesize that "you can hide behind the screen, but your personality can't hide." In addition to the above-mentioned method, we have also encountered the sequential nature of the work. Therefore, we have extended our analysis with a Long Short Term Memory (LSTM) model to track down the sequential pattern of users' writing styles. Throughout the research, we strive to automate the sockpuppet account detection system and develop tools to help the Wikipedia administration maintain the quality of articles. We tested our system on a dataset we built containing 17K accounts validated as sockpuppets. Experimental results show that our approach achieves an F1 score of 0.82 and outperforms other systems proposed in the literature. We plan to deliver our research to the Wikipedia authorities to integrate it into their existing system. viii TABLE OF CONTENTS
2008
ABSTRACT One of the major issues facing socially-driven content and collaborative work on the Web (such as Wikipedia) is the lack of tools to measure at large scale the evolution of content (in terms of quality and quantity), to reduce the dropout rate of active contributors or to detect vandalism in a timely manner.