Neeti Arora - Academia.edu (original) (raw)

Papers by Neeti Arora

Research paper thumbnail of Algorithms for Clustering of Documents

Data mining and knowledge engineering, 2011

There is great need to organize a large set of documents into categories. The Document Clustering... more There is great need to organize a large set of documents into categories. The Document Clustering techniques are widely recognized as useful tools for information retrieval, organizing web document and also allow users to search in appropriate direction. A large variety of techniques have been developed by researchers for clustering. The purpose of this paper is to present a novel survey of the various clustering techniques. These techniques can also be used to group web and other documents into meaningful clusters. Categorization of different clustering algorithms is also proposed in this paper.

Research paper thumbnail of A Study on Initial Centroids Selection for Partitional Clustering Algorithms

Advances in intelligent systems and computing, Jun 13, 2018

Data mining tools and techniques allow an organization to make creative decisions and subsequentl... more Data mining tools and techniques allow an organization to make creative decisions and subsequently do proper planning. Clustering is used to determine the objects that are similar in characteristics and group them together. K-means clustering method chooses random cluster centres (initial centroid), one for each centroid, and this is the major weakness of K-means. The performance and quality of K-means strongly depends on the initial guess of centres (centroid). By augmenting K-means with a technique of selecting centroids, several modifications have been suggested in research on clustering. The first two main authors of this paper have also developed three algorithms that unlike K-means do not perform random generation of the initial centres and actually produce same set of initial centroids for the same input data. These developed algorithms are sum of distance clustering (SODC), distance-based clustering algorithm (DBCA) and farthest distributed centroid clustering (FDCC). We present a brief survey of the algorithms available in the research on modification of initial centroids for K-means clustering algorithm and further describe the developed algorithm farthest distributed centroid clustering in this paper. The experimental results carried out show that farthest distributed centroid clustering algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm.

Research paper thumbnail of Sum of Distance based Algorithm for Clustering Web Data

International journal of computer applications, Feb 14, 2014

Clustering is a data mining technique used to make groups of objects that are somehow similar in ... more Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. The criterion for checking the similarity is implementation dependent.Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. K-means is a widely used clustering algorithm that chooses random cluster centers (centroid), one for each centroid. The performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones as the algorithm can converge to local optimal solutions. Therefore it is important for K-means to have good choice of initial centroids. An algorithm for clustering that selects initial centroids using criteria of finding sum of distances of data objects to all other data objects have been formed. The proposed algorithm results in better clustering on synthetic as well as real datasets when compared to the K-means technique.

Research paper thumbnail of Algorithms for Clustering of Documents

Data mining and knowledge engineering, 2011

There is great need to organize a large set of documents into categories. The Document Clustering... more There is great need to organize a large set of documents into categories. The Document Clustering techniques are widely recognized as useful tools for information retrieval, organizing web document and also allow users to search in appropriate direction. A large variety of techniques have been developed by researchers for clustering. The purpose of this paper is to present a novel survey of the various clustering techniques. These techniques can also be used to group web and other documents into meaningful clusters. Categorization of different clustering algorithms is also proposed in this paper.

Research paper thumbnail of Optimizing K-Means by Fixing Initial Cluster Centers

International Journal of Current Engineering and Technology, 2011

Data mining techniques help in business decision making and predicting behaviors and future trend... more Data mining techniques help in business decision making and predicting behaviors and future trends. Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. Kmeans is a widely used partitional clustering algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones. Therefore it is important for Kmeans to have good choice of initial centroids. By augmenting K-means with a technique of selecting centroids using criteria of sum of distances of data objects to all other data objects, we obtain an algorithm Farthest Distributed Centroids Clustering (FDCC) that result in better clustering as compared to not only the K-means partition clustering algorithm but also to the agglomerative hierarchical cluste...

Research paper thumbnail of A Distance Based Clustering Algorithm

Clustering is an unsupervised data mining technique used to determine the objects that are simila... more Clustering is an unsupervised data mining technique used to determine the objects that are similar in characteristics and group them together. K-means is a widely used partitional clustering algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones. Therefore it is important for K-means to have good choice of initial centroids. We have developed a clustering algorithm based on distance criteria to select a good set of initial centroids. Once some point d is selected as initial centroid, the proposed algorithm computes average of data points to avoid the points near to d from being selected as next initial centroids. These initial centroids are given as input to the K-means technique leading to a clustering algorithm that result in better clustering as compared to the K-means partition clustering algorithm, agglomerative hierarchical clustering algorithm and Hierarchical partitionin...

Research paper thumbnail of Sum of Distance based Algorithm for Clustering Web Data

International Journal of Computer Applications, 2014

Clustering is a data mining technique used to make groups of objects that are somehow similar in ... more Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. The criterion for checking the similarity is implementation dependent.Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. K-means is a widely used clustering algorithm that chooses random cluster centers (centroid), one for each centroid. The performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones as the algorithm can converge to local optimal solutions. Therefore it is important for K-means to have good choice of initial centroids. An algorithm for clustering that selects initial centroids using criteria of finding sum of distances of data objects to all other data objects have been formed. The proposed algorithm results in better clustering on synthetic as well as real datasets when compared to the K-means technique.

Research paper thumbnail of A Study on Initial Centroids Selection for Partitional Clustering Algorithms

Advances in Intelligent Systems and Computing, 2018

Data mining tools and techniques allow an organization to make creative decisions and subsequentl... more Data mining tools and techniques allow an organization to make creative decisions and subsequently do proper planning. Clustering is used to determine the objects that are similar in characteristics and group them together. K-means clustering method chooses random cluster centres (initial centroid), one for each centroid, and this is the major weakness of K-means. The performance and quality of K-means strongly depends on the initial guess of centres (centroid). By augmenting K-means with a technique of selecting centroids, several modifications have been suggested in research on clustering. The first two main authors of this paper have also developed three algorithms that unlike K-means do not perform random generation of the initial centres and actually produce same set of initial centroids for the same input data. These developed algorithms are sum of distance clustering (SODC), distance-based clustering algorithm (DBCA) and farthest distributed centroid clustering (FDCC). We present a brief survey of the algorithms available in the research on modification of initial centroids for K-means clustering algorithm and further describe the developed algorithm farthest distributed centroid clustering in this paper. The experimental results carried out show that farthest distributed centroid clustering algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm.

Research paper thumbnail of Algorithms for Clustering of Documents

Data mining and knowledge engineering, 2011

There is great need to organize a large set of documents into categories. The Document Clustering... more There is great need to organize a large set of documents into categories. The Document Clustering techniques are widely recognized as useful tools for information retrieval, organizing web document and also allow users to search in appropriate direction. A large variety of techniques have been developed by researchers for clustering. The purpose of this paper is to present a novel survey of the various clustering techniques. These techniques can also be used to group web and other documents into meaningful clusters. Categorization of different clustering algorithms is also proposed in this paper.

Research paper thumbnail of A Study on Initial Centroids Selection for Partitional Clustering Algorithms

Advances in intelligent systems and computing, Jun 13, 2018

Data mining tools and techniques allow an organization to make creative decisions and subsequentl... more Data mining tools and techniques allow an organization to make creative decisions and subsequently do proper planning. Clustering is used to determine the objects that are similar in characteristics and group them together. K-means clustering method chooses random cluster centres (initial centroid), one for each centroid, and this is the major weakness of K-means. The performance and quality of K-means strongly depends on the initial guess of centres (centroid). By augmenting K-means with a technique of selecting centroids, several modifications have been suggested in research on clustering. The first two main authors of this paper have also developed three algorithms that unlike K-means do not perform random generation of the initial centres and actually produce same set of initial centroids for the same input data. These developed algorithms are sum of distance clustering (SODC), distance-based clustering algorithm (DBCA) and farthest distributed centroid clustering (FDCC). We present a brief survey of the algorithms available in the research on modification of initial centroids for K-means clustering algorithm and further describe the developed algorithm farthest distributed centroid clustering in this paper. The experimental results carried out show that farthest distributed centroid clustering algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm.

Research paper thumbnail of Sum of Distance based Algorithm for Clustering Web Data

International journal of computer applications, Feb 14, 2014

Clustering is a data mining technique used to make groups of objects that are somehow similar in ... more Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. The criterion for checking the similarity is implementation dependent.Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. K-means is a widely used clustering algorithm that chooses random cluster centers (centroid), one for each centroid. The performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones as the algorithm can converge to local optimal solutions. Therefore it is important for K-means to have good choice of initial centroids. An algorithm for clustering that selects initial centroids using criteria of finding sum of distances of data objects to all other data objects have been formed. The proposed algorithm results in better clustering on synthetic as well as real datasets when compared to the K-means technique.

Research paper thumbnail of Algorithms for Clustering of Documents

Data mining and knowledge engineering, 2011

There is great need to organize a large set of documents into categories. The Document Clustering... more There is great need to organize a large set of documents into categories. The Document Clustering techniques are widely recognized as useful tools for information retrieval, organizing web document and also allow users to search in appropriate direction. A large variety of techniques have been developed by researchers for clustering. The purpose of this paper is to present a novel survey of the various clustering techniques. These techniques can also be used to group web and other documents into meaningful clusters. Categorization of different clustering algorithms is also proposed in this paper.

Research paper thumbnail of Optimizing K-Means by Fixing Initial Cluster Centers

International Journal of Current Engineering and Technology, 2011

Data mining techniques help in business decision making and predicting behaviors and future trend... more Data mining techniques help in business decision making and predicting behaviors and future trends. Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. Kmeans is a widely used partitional clustering algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones. Therefore it is important for Kmeans to have good choice of initial centroids. By augmenting K-means with a technique of selecting centroids using criteria of sum of distances of data objects to all other data objects, we obtain an algorithm Farthest Distributed Centroids Clustering (FDCC) that result in better clustering as compared to not only the K-means partition clustering algorithm but also to the agglomerative hierarchical cluste...

Research paper thumbnail of A Distance Based Clustering Algorithm

Clustering is an unsupervised data mining technique used to determine the objects that are simila... more Clustering is an unsupervised data mining technique used to determine the objects that are similar in characteristics and group them together. K-means is a widely used partitional clustering algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones. Therefore it is important for K-means to have good choice of initial centroids. We have developed a clustering algorithm based on distance criteria to select a good set of initial centroids. Once some point d is selected as initial centroid, the proposed algorithm computes average of data points to avoid the points near to d from being selected as next initial centroids. These initial centroids are given as input to the K-means technique leading to a clustering algorithm that result in better clustering as compared to the K-means partition clustering algorithm, agglomerative hierarchical clustering algorithm and Hierarchical partitionin...

Research paper thumbnail of Sum of Distance based Algorithm for Clustering Web Data

International Journal of Computer Applications, 2014

Clustering is a data mining technique used to make groups of objects that are somehow similar in ... more Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. The criterion for checking the similarity is implementation dependent.Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. K-means is a widely used clustering algorithm that chooses random cluster centers (centroid), one for each centroid. The performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones as the algorithm can converge to local optimal solutions. Therefore it is important for K-means to have good choice of initial centroids. An algorithm for clustering that selects initial centroids using criteria of finding sum of distances of data objects to all other data objects have been formed. The proposed algorithm results in better clustering on synthetic as well as real datasets when compared to the K-means technique.

Research paper thumbnail of A Study on Initial Centroids Selection for Partitional Clustering Algorithms

Advances in Intelligent Systems and Computing, 2018

Data mining tools and techniques allow an organization to make creative decisions and subsequentl... more Data mining tools and techniques allow an organization to make creative decisions and subsequently do proper planning. Clustering is used to determine the objects that are similar in characteristics and group them together. K-means clustering method chooses random cluster centres (initial centroid), one for each centroid, and this is the major weakness of K-means. The performance and quality of K-means strongly depends on the initial guess of centres (centroid). By augmenting K-means with a technique of selecting centroids, several modifications have been suggested in research on clustering. The first two main authors of this paper have also developed three algorithms that unlike K-means do not perform random generation of the initial centres and actually produce same set of initial centroids for the same input data. These developed algorithms are sum of distance clustering (SODC), distance-based clustering algorithm (DBCA) and farthest distributed centroid clustering (FDCC). We present a brief survey of the algorithms available in the research on modification of initial centroids for K-means clustering algorithm and further describe the developed algorithm farthest distributed centroid clustering in this paper. The experimental results carried out show that farthest distributed centroid clustering algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm.