08391 Working Group Summary--Analyzing Tag Semantics Across Tagging Systems} (original) (raw)

Tag relatedness in image folksonomies

Folksonomies-networks of users, resources, and tags allow users to easily retrieve, organize and browse web contents. However, their advantages are still limited mainly due to the noisiness of user provided tags. To overcome this issue, we propose an approach for characterizing related tags in folksonomies: we use tag co-occurrence statistics and Laplacian score based feature selection in order to create empirical co-occurrence probability distribution for each tag; then we identify related tags on the basis of the dissimilarity between their distributions. For this purpose, we introduce variant of the Jensen-Shannon Divergence, which is more robust to statistical noise. We experimentally evaluate our approach using WordNet and compare it to a common tag-relatedness approach based on the cosine similarity. The results show the effectiveness of our approach and its advantage over the competing method. RÉSUMÉ. Folksonomies-Les réseaux sociaux, les ressources disponibles sur le web et les tags utilisateurs qui y sont associés permettent de facilement récupérer, organiser du contenu et naviguer sur le web. Cependant, leurs avantages restent limités, principalement à cause du caractère bruité des tags proposés par les utilisateurs. Pour pallier cette difficulté, nous proposons une méthode pour regrouper les tags similaires dans une folksonomie : les cooccurrences entre tags et le "Laplacian Score" sont utilisées pour définir, pour chaque tag, une distribution de probabilité empirique ; les tags supposés liés sont identifiés selon les similarités entre leurs distributions. Dans ce but, nous présentons une variante de la divergence de Jensen-Shannon, plus résistante au bruit. Nous évaluons notre approche expérimentalement à l'aide de WordNet et la comparons à une méthode classique de recherche de similarité entre tags, basée sur la similarité cosinus. Les résultats de notre évaluation montrent l'efficacité de notre approche et ses avantages par rapport aux méthodes concurrentes.

Tag Similarity in Folksonomies

Folksonomies-collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap. However, user tags are noisy; thus, they need to be processed before they can be used by further applications. In this paper, we propose an approach for bootstrapping semantics from folksonomy tags. Our goal is to automatically identify semantically related tags. The approach is based on creating probability distribution for each tag based on co-occurrence statistics. Subsequently, the similarity between two tags is determined by the distance between their corresponding probability distributions. For this purpose, we propose an extension for the well-known Jensen-Shannon Divergence. We compared our approach to a widely used method for identifying similar tags based on the cosine measure. The evaluation shows promising results and emphasizes the advantage of our approach.

An integrated approach to discover tag semantics

2011

Tag-based systems have become very common for online classification thanks to their intrinsic advantages such as self-organization and rapid evolution. However, they are still affected by some issues that limit their utility, mainly due to the inherent ambiguity in the semantics of tags. Synonyms, homonyms, and polysemous words, while not harmful for the casual user, strongly affect the quality of search results and the performances of tag-based recommendation systems.

Exploiting tag similarities to discover synonyms and homonyms in folksonomies

Software: Practice and Experience, 2013

Tag-based systems are widely available, thanks to their intrinsic advantages, such as self-organization, currency, and ease of use. Although they represent a precious source of semantic metadata, their utility is still limited. The inherent lexical ambiguities of tags strongly affect the extraction of structured knowledge and the quality of tag-based recommendation systems. In this paper, we propose a methodology for the analysis of tag-based systems, addressing tag synonymy and homonymy at the same time in a holistic approach: in more detail, we exploit a tripartite graph to reduce the problem of synonyms and homonyms; we apply a customized version of Tag Context Similarity to detect them, overcoming the limitations of current similarity metrics; finally, we propose the application of an overlapping clustering algorithm to detect contexts and homonymies, then evaluate its performances, and introduce a methodology for the interpretation of its results. journal special issues (e.g., ACM RecSys ¶ or UMAP || conference, or SASWeb workshops series, ** ACM Transactions on Intelligent Systems and Technology, and so on) are devoted to them.

A comparative study of Flickr tags and index terms in a general image collection

Journal of the Association for Information Science and Technology, 2010

Web 2.0 and social/collaborative tagging have altered the traditional roles of indexer and user. Traditional indexing tools and systems assume the top-down approach to indexing in which a trained professional is responsible for assigning index terms to information sources with a potential user in mind. However, in today's Web, end users create, organize, index, and search for images and other information sources through social tagging and other collaborative activities. One of the impediments to user-centered indexing had been the cost of soliciting user-generated index terms or tags. Social tagging of images such as those on Flickr, an online photo management and sharing application, presents an opportunity that can be seized by designers of indexing tools and systems to bridge the semantic gap between indexer terms and user vocabularies. Empirical research on the differences and similarities between user-generated tags and index terms based on controlled vocabularies has the potential to inform future design of image indexing tools and systems.Toward this end, a random sample of Flickr images and the tags assigned to them were content analyzed and compared with another sample of index terms from a general image collection using established frameworks for image attributes and contents. The results show that there is a fundamental difference between the types of tags and types of index terms used. In light of this, implications for research into and design of user-centered image indexing tools and systems are discussed.

Evaluating similarity measures for emergent semantics of social tagging

2009

Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as navigation support, semantic search, and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures derived from established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity among tags and resources, considering different ways to aggregate annotations across users. After comparing how tag similarity measures predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory. We also investigate the issue of scalability. We find that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity.

Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure

2011

Social (or folksonomic) tagging has become a very popular way to describe content within Web 2.0 websites. However, as tags are informally defined, continually changing, and ungoverned, it has often been criticised for lowering, rather than increasing, the efficiency of searching. To address this issue, a variety of approaches have been proposed that recommend users what tags to use, both when labeling and when looking for resources. These techniques work well in dense folksonomies, but they fail to do so when tag usage exhibits a power law distribution, as it often happens in real-life folksonomies. To tackle this issue, we propose an approach that induces the creation of a dense folksonomy, in a fully automatic and transparent way: when users label resources, an innovative tag similarity metric is deployed, so to enrich the chosen tag set with related tags already present in the folksonomy. The proposed metric, which represents the core of our approach, is based on the mutual reinforcement principle. Our experimental evaluation proves that the accuracy and coverage of searches guaranteed by our metric are higher than those achieved by applying classical metrics.

Discovering and Exploiting Semantics in Folksonomies

2007

Folksonomies are Web 2.0 platforms where users share resources with each other. Furthermore, they can assign keywords (called tags) to the resources for categorizing and organizing the resources. Numerous types of resources like websites (Delicious), images (Flickr), and videos (YouTube) are supported by different folksonomies. The folksonomies are easy to use and thus attract the attention of millions of users. Together with the ease they offer, there are also some problems. This thesis addresses different problems of folksonomies and proposes Contents Contents viii List of Figures xi List of Tables xiv 8 Conclusions Bibliography Curriculum Vitae List of Figures xiii 6.11 Micro F-Measure comparison of Manhattan (Manh) and Euclidean (Eucl) distances for non-text based features. Dark lines show the results obtained using Euclidean distance and gray lines show results obtained using Manhattan distance. Performance for both distances is almost same for all the image features.. .. .. .. . 6.12 Micro F-Measure comparing results of two different low-level features Edge Histogram Descriptor (EHD) and Color Layout (CL). Both of the low-level image features performs almost the same.. . 6.13 Micro F-Measure comparison of Euclidean (Eucl) and Histogram Intersection (HI) metrics for MPEG-7 Edge Histogram Descriptor. Recommendations based on Euclidean distance perform significantly better than the histogram intersection measure.. .. .. 6.14 Micro F-Measure comparison of Cosine (Cos) and Euclidean (Eucl) distances for tag/text based features. Cosine distance performs significantly better than the Euclidean distance when using tag

Insight into social tagging: BL_Flickr2016

I. For statistical and comparative overview, please see: Mets Õ., Kippar J. (2017) Social Tagging: Implications from Studying User Behavior and Institutional Practice. In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9\_33 II. The following presents some additional aspects of the content of the BL Flickr tags. The amount of BL Flickr tags gives endless opportunities to explore them. That is why the snapshot of it is provided mostly as R Markdown file, to change syntax, experiment with different parameters and learn R :) This part of the work is a collaboration with Jaagup Kippar and his students in the School of Digital Technologies, Tallinn University, Estonia.