Multi-Label Collective Classification (original) (raw)
Related papers
Multi-label Collective Classification Using Adaptive Neighborhoods
2012 11th International Conference on Machine Learning and Applications, 2012
Multi-label learning in graph-based relational data has gained popularity in recent years due to the increasingly complex structures of real world applications. Collective Classification deals with the simultaneous classification of neighboring instances in relational data, until a convergence criterion is reached. The rationale behind collective classification stems from the fact that an entity in a network (or relational data) is most likely influenced by the neighboring entities, and can be classified accordingly, based on the class assignment of the neighbors. Although extensive work has been done on collective classification of single labeled data, the domain of multi-labeled relational data has not been sufficiently explored. In this paper, we propose a neighborhood ranking method for multi-label classification, which can be further used in the Multi-label Collective Classification framework. We test our methods on real world datasets and also discuss the relevance of our approach for other multi-labeled relational data. Our experimental results show that the use of ranking in neighborhood selection for collective classification improves the performance of the classifier.
Collective Multi-Label Classification
Common approaches to multi-label classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only well-suited to problems in which categories are independent. However, in many domains labels are highly interdependent. This paper explores multilabel conditional random field (CRF) classification models that directly parameterize label co-occurrences in multi-label classification. Experiments show that the models outperform their singlelabel counterparts on standard text corpora. Even when multilabels are sparse, the models improve subset classification error by as much as 40%.
Multi-label Collective Classification Using Link Based Label Diffusion
Procedia Computer Science, 2018
Due to the growth of numerous applications of network mining, node classification has become an important task on various domains like social networks, biological networks and communication networks. Exploiting the dependencies or correlations among the nodes in the network is a major challenge in node classification. Performing classification based on these correlations is known as collective classification. Classification problem where each observation can have multiple target labels is referred to as multilabel classification. The correlation between these multiple target labels makes multi-label classification a difficult task. In this paper, we address the problem of multi-label collective classification, which has to consider both between-node and between-label correlations. In this work, we propose a novel method for multi-label collective classification using link based label diffusion (MCLD), which exploits both the structural properties of network and label correlations among the nodes. We conducted experiments using various network datasets. We evaluated the efficiency of the system using various evaluation measures and compared the results with the state-of-the-art methods.
Multi-label relational neighbor classification using social context features
Networked data, extracted from social media, web pages, and bibliographic databases, can contain entities of multiple classes, interconnected through different types of links. In this paper, we focus on the problem of performing multi-label classification on networked data, where the instances in the network can be assigned multiple labels. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, instances in a network can be linked for various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers.
Collective classification with relational dependency networks
Proceedings of the Second International Workshop on Multi-Relational Data Mining, 2003
Collective classification models exploit the dependencies in a network of objects to improve predictions. For example, in a network of web pages, the topic of a page may depend on the topics of hyperlinked pages. A relational model capable of expressing and reasoning with such dependencies should achieve superior performance to relational models that ignore such dependencies. In this paper, we present relational dependency networks (RDNs), extending recent work in dependency networks to a relational setting. RDNs are a ...
Latent linkage semantic kernels for collective classification of link data
Journal of Intelligent Information Systems, 2006
Generally, links among objects demonstrate certain patterns and contain rich semantic clues. These important clues can be used to improve classification accuracy. However, many real-world link data may exhibit more complex regularity. For example, there may be some noisy links that carry no human editorial endorsement about semantic relationships. To effectively capture such regularity, this paper proposes latent linkage semantic kernels (LLSKs) by first introducing the linkage kernels to model the local and global dependency structure of a link graph and then applying the singular value decomposition (SVD) in the kernel-induced space. For the computational efficiency on large datasets, we also develop a block-based algorithm for LLSKs. A kernel-based contextual dependency network (KCDN) model is then presented to exploit the dependencies in a network of objects for collective classification. We provide experimental results demonstrating that the KCDN model, together with LLSKs, demonstrates relatively high robustness on the datasets with the complex link regularity, and the block-based computation method can scale well with varying sizes of the problem.
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013
Multi-label classification is prevalent in many real-world applications, where each example can be associated with a set of multiple labels simultaneously. The key challenge of multi-label classification comes from the large space of all possible label sets, which is exponential to the number of candidate labels. Most previous work focuses on exploiting correlations among different labels to facilitate the learning process. It is usually assumed that the label correlations are given beforehand or can be derived directly from data samples by counting their label co-occurrences. However, in many real-world multi-label classification tasks, the label correlations are not given and can be hard to learn directly from data samples within a moderate-sized training set. Heterogeneous information networks can provide abundant knowledge about relationships among different types of entities including data samples and class labels. In this paper, we propose to use heterogeneous information networks to facilitate the multi-label classification process. By mining the linkage structure of heterogeneous information networks, multiple types of relationships among different class labels and data samples can be extracted. Then we can use these relationships to effectively infer the correlations among different class labels in general, as well as the dependencies among the label sets of data examples interconnected in the network. Empirical studies on real-world tasks demonstrate that the performance of multi-label classification can be effectively boosted using heterogeneous information networks.
Collective classification in network data
2008
Abstract Many real-world applications produce networked data such as the world-wide web (hypertext documents connected via hyperlinks), social networks (for example, people connected by friendship links), communication networks (computers connected via communication links) and biological networks (for example, protein interaction networks). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such networks.
Multi-Label classification for Mining Big Data
2015
In big data problems mining requires special handling of the problem under investigation to achieve accuracy and speed on the same time. In this research we investigate the multi-label classification problems for better accuracy in a timely fashion. Label dependencies are the biggest influencing factor on performance, directly and indirectly, and is a distinguishing factor for multi-label from multi-class problems. The key objective in multi-label learning is to exploit this dependency effectively. Most of the current research ignore the correlation between labels or develop complex algorithms that don’t scale efficiently with large datasets. Hence, the goal of our research is to propose a fundamental solution through which preliminary identification of dependencies and correlations between labels is explicit from large multi-label datasets. This is to be done before any classifiers are induced by using an association rule mining algorithm. Then the dependencies discovered in the pr...
Collective prediction with latent graphs
Proceedings of the 20th ACM international …, 2011
Collective classification in relational data has become an important and active research topic in the last decade. It exploits the dependencies of instances in a network to improve predictions. Related applications include hyperlinked document classification, social network analysis and collaboration network analysis. Most of the traditional collective classification models mainly study the scenario that there exists a large amount of labeled examples (labeled nodes). However, in many real-world applications, labeled data are extremely difficult to obtain. For example, in network intrusion detection, there may be only a limited number of identified intrusions whereas there are a huge set of unlabeled nodes. In this situation, most of the data have no connection to labeled nodes; hence, no supervision knowledge can be obtained from the local connections. In this paper, we propose to explore various latent linkages among the nodes and judiciously integrate the linkages to generate a latent graph. This is achieved by finding a graph that maximizes the linkages among the training data with the same label, and maximizes the separation among the data with different labels. The objective is further cast into an optimization problem and is solved with quadratic programming. Finally, we apply label propagation on the latent graph to make prediction. Experiments show that the proposed model LNP (Latent Network Propagation) can improve the learning accuracy significantly. For instance, when there are only 10% of labeled examples, the accuracies of all the comparison models are less than 63%, while that of the proposed model is 74%.