SHUTING XU - Academia.edu (original) (raw)
Papers by SHUTING XU
Journal of Relationship Marketing, 2008
Collecting customer information and analyzing the information using data mining techniques are th... more Collecting customer information and analyzing the information using data mining techniques are the primary processes of customer relationship management. One of the important issues in such processes is how to protect the trade secrecy of corporations and privacy of customers contained in the data sets collected and used for the purpose of data mining. In this article, we propose a privacy preserved data mining framework for customer relationship management that not only enables firms to protect the private information but maintains the performance and utility of the data mining analysis as well. We use churn prediction as a case study to show how this framework works in the real world.
Solving very large sparse linear systems are often encountered in many scientific and engineering... more Solving very large sparse linear systems are often encountered in many scientific and engineering applications. Generally there are two classes of methods available to solve the sparse linear systems. The first class is the direct solution methods, represented by the Gauss elimination method. The second class is the iterative solution methods, of which the preconditioned Krylov subspace methods are considered
Condition number of a matrix is an important measure in numerical analysis and linear algebra. It... more Condition number of a matrix is an important measure in numerical analysis and linear algebra. It is a measure of stability or sensitivity of a matrix to numerical operations. However, the direct computation of the condition number of a matrix is very expensive in terms of CPU and memory cost, and becomes prohibitive for large size matrices. We propose to
Data mining technologies have now been used in commercial, industrial, and governmental businesse... more Data mining technologies have now been used in commercial, industrial, and governmental businesses, for various purposes, ranging from increasing profitability to enhancing national security. The widespread applications of data mining technologies have raised concerns about trade secrecy of corporations and privacy of innocent people contained in the datasets collected and used for the data mining purpose. It is necessary that data mining technologies designed for knowledge discovery across corporations and for security purpose towards general population have sufficient privacy awareness to protect the corporate trade secrecy and individual private information. Unfortunately, most standard data mining algorithms are not very efficient in terms of privacy protection, as they were originally developed mainly for commercial applications, in which different organizations collect and own their private databases, and mine their private databases for specific commercial purposes.
Preserving privacy is a major concern in the application of data mining techniques to datasets co... more Preserving privacy is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserving privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. A sparsified Singular Value Decomposition (SVD) method for data distortion is introduced in this chapter. A few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection are also explained in detail. The experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
The Journal of Supercomputing, 2004
Clustering web document is an important procedure in many web information retrieval systems. As t... more Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.
International Journal of Information and Computer Security, 2008
... to maintain the advantage of data privacy and data usability of SVD, but achieve a significan... more ... to maintain the advantage of data privacy and data usability of SVD, but achieve a significant ... Then data mining algorithms are used on the distorted new dataset matrix. Compared to the design of privacy preserving methods, evaluation techniques cannot be underestimated. ...
... provided a thorough review on churn prediction methodologies in (Hadden, Tiwari, Roy, & R... more ... provided a thorough review on churn prediction methodologies in (Hadden, Tiwari, Roy, & Ruta, 2007). Besides classification, other methods have also been applied to churn prediction like hazard modeling approach (Jamal & Bucklin, 2006) and social network analysis ...
Knowledge and Information Systems, 2006
Privacy-preserving is a major concern in the application of data mining techniques to datasets co... more Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
Accurate information extracted from datasets is required for making reasonable decisions using da... more Accurate information extracted from datasets is required for making reasonable decisions using data mining algorithms. Privacy preservation has become one of the top priorities in the design of various data mining applications. In this paper, a novel data distortion strategy based on structural partition and sparsified Singular Value Decomposition (SSVD) technique is proposed. Three schemes, object-based partition, feature-based partition and hybrid partition, are defined to permit a tradeoff between privacy protection on centralized datasets and accuracy of data mining techniques. Some metrics to measure privacy preservation are used to examine the performance of the proposed new strategies. Data utility of the three proposed schemes is examined by a binary classification based on the support vector machine. Furthermore, the effect of different ranks of SVD and the threshold value of SSVD on data distortion and utility are also tested. Our experimental results indicate that, in comparison with standard data distortion techniques, the proposed schemes are very efficient in achieving a good tradeoff between data privacy and data utility, and it affords a feasible solution, with a significant reduction on the computational cost from SVD, to protect sensitive information and promise high accuracy in decision making.
Matrix Decomposition Techniques for Data Privacy (9781605660103): Jun Zhang, Jie Wang, Shuting Xu... more Matrix Decomposition Techniques for Data Privacy (9781605660103): Jun Zhang, Jie Wang, Shuting Xu: Book Chapters.
Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Minin... more Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.
Data distortion is a critical component to preserve privacy in security-related data mining appli... more Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
Journal of Relationship Marketing, 2008
Collecting customer information and analyzing the information using data mining techniques are th... more Collecting customer information and analyzing the information using data mining techniques are the primary processes of customer relationship management. One of the important issues in such processes is how to protect the trade secrecy of corporations and privacy of customers contained in the data sets collected and used for the purpose of data mining. In this article, we propose a privacy preserved data mining framework for customer relationship management that not only enables firms to protect the private information but maintains the performance and utility of the data mining analysis as well. We use churn prediction as a case study to show how this framework works in the real world.
Solving very large sparse linear systems are often encountered in many scientific and engineering... more Solving very large sparse linear systems are often encountered in many scientific and engineering applications. Generally there are two classes of methods available to solve the sparse linear systems. The first class is the direct solution methods, represented by the Gauss elimination method. The second class is the iterative solution methods, of which the preconditioned Krylov subspace methods are considered
Condition number of a matrix is an important measure in numerical analysis and linear algebra. It... more Condition number of a matrix is an important measure in numerical analysis and linear algebra. It is a measure of stability or sensitivity of a matrix to numerical operations. However, the direct computation of the condition number of a matrix is very expensive in terms of CPU and memory cost, and becomes prohibitive for large size matrices. We propose to
Data mining technologies have now been used in commercial, industrial, and governmental businesse... more Data mining technologies have now been used in commercial, industrial, and governmental businesses, for various purposes, ranging from increasing profitability to enhancing national security. The widespread applications of data mining technologies have raised concerns about trade secrecy of corporations and privacy of innocent people contained in the datasets collected and used for the data mining purpose. It is necessary that data mining technologies designed for knowledge discovery across corporations and for security purpose towards general population have sufficient privacy awareness to protect the corporate trade secrecy and individual private information. Unfortunately, most standard data mining algorithms are not very efficient in terms of privacy protection, as they were originally developed mainly for commercial applications, in which different organizations collect and own their private databases, and mine their private databases for specific commercial purposes.
Preserving privacy is a major concern in the application of data mining techniques to datasets co... more Preserving privacy is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserving privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. A sparsified Singular Value Decomposition (SVD) method for data distortion is introduced in this chapter. A few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection are also explained in detail. The experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
The Journal of Supercomputing, 2004
Clustering web document is an important procedure in many web information retrieval systems. As t... more Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.
International Journal of Information and Computer Security, 2008
... to maintain the advantage of data privacy and data usability of SVD, but achieve a significan... more ... to maintain the advantage of data privacy and data usability of SVD, but achieve a significant ... Then data mining algorithms are used on the distorted new dataset matrix. Compared to the design of privacy preserving methods, evaluation techniques cannot be underestimated. ...
... provided a thorough review on churn prediction methodologies in (Hadden, Tiwari, Roy, & R... more ... provided a thorough review on churn prediction methodologies in (Hadden, Tiwari, Roy, & Ruta, 2007). Besides classification, other methods have also been applied to churn prediction like hazard modeling approach (Jamal & Bucklin, 2006) and social network analysis ...
Knowledge and Information Systems, 2006
Privacy-preserving is a major concern in the application of data mining techniques to datasets co... more Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
Accurate information extracted from datasets is required for making reasonable decisions using da... more Accurate information extracted from datasets is required for making reasonable decisions using data mining algorithms. Privacy preservation has become one of the top priorities in the design of various data mining applications. In this paper, a novel data distortion strategy based on structural partition and sparsified Singular Value Decomposition (SSVD) technique is proposed. Three schemes, object-based partition, feature-based partition and hybrid partition, are defined to permit a tradeoff between privacy protection on centralized datasets and accuracy of data mining techniques. Some metrics to measure privacy preservation are used to examine the performance of the proposed new strategies. Data utility of the three proposed schemes is examined by a binary classification based on the support vector machine. Furthermore, the effect of different ranks of SVD and the threshold value of SSVD on data distortion and utility are also tested. Our experimental results indicate that, in comparison with standard data distortion techniques, the proposed schemes are very efficient in achieving a good tradeoff between data privacy and data utility, and it affords a feasible solution, with a significant reduction on the computational cost from SVD, to protect sensitive information and promise high accuracy in decision making.
Matrix Decomposition Techniques for Data Privacy (9781605660103): Jun Zhang, Jie Wang, Shuting Xu... more Matrix Decomposition Techniques for Data Privacy (9781605660103): Jun Zhang, Jie Wang, Shuting Xu: Book Chapters.
Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Minin... more Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.
Data distortion is a critical component to preserve privacy in security-related data mining appli... more Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.