Dimension Reduction Research Papers - Academia.edu (original) (raw)
Abstract: The current data tends to be more complex than conventional data and need dimension reduction. Dimension reduction is important in cluster analysis and creates a smaller data in volume and has the same analytical results as the... more
Abstract: The current data tends to be more complex than conventional data and need dimension reduction. Dimension reduction is important in cluster analysis and creates a smaller data in volume and has the same analytical results as the original representation. A clustering process needs data reduction to obtain an efficient processing time while clustering and mitigate curse of dimensionality. This paper proposes a model for extracting multidimensional data clustering of health database. We implemented four dimension ...
- by Ming-wei Su
- •
- Engineering, Genetics, Physics, Chemistry
A 24-dimensional model for the ‘harmonic content’ of pieces of music has proved to be remarkably robust in the retrieval of polyphonic queries from a database of polyphonic music in the presence of quite significant noise and errors in... more
A 24-dimensional model for the ‘harmonic content’ of pieces of music has proved to be remarkably robust in the retrieval of polyphonic queries from a database of polyphonic music in the presence of quite significant noise and errors in either query or database document. We have further found that higher-order (1st- to 3rd-order) models tend to work better for music retrieval than 0th-order ones owing to the richer context they capture. However, there is a serious performance cost due to the large size of such models and the present paper reports on some attempts to reduce dimensionality while retaining the general robustness of the method. We find that some simple reduced-dimensionality models, if their parameter settings are carefully chosen, do indeed perform almost as well as the full 24-dimensional versions. Furthermore, in terms of recall in the top 1000 documents retrieved, we find that a 6-dimensional 2nd-order model gives even better performance than the full model. This represents a potential 64-times reduction in model size and search-time, making it a suitable candidate for filtering a large database as the first stage of a two-stage retrieval system.
The eective behavior of second order strain energy densities is obtained using relaxation and ¡ convergence techniques. The Cosserat theory (see e.g. (13), (14) and (2)) is recovered within a dimension reduction analysis for 3D non linear... more
The eective behavior of second order strain energy densities is obtained using relaxation and ¡ convergence techniques. The Cosserat theory (see e.g. (13), (14) and (2)) is recovered within a dimension reduction analysis for 3D non linear elastic thin domains with varying profiles. Homogeneous and inhomogeneous plate models with periodic profiles are treated.
Non-negative matrix factorization (NMF) is a relatively new method of matrix decomposition which factors an m by n data matrix X into an m by k matrix W and a k by n matrix H, so that X = W * H. Importantly, all values in X, W, and H are... more
Non-negative matrix factorization (NMF) is a relatively new method of matrix decomposition which factors an m by n data matrix X into an m by k matrix W and a k by n matrix H, so that X = W * H. Importantly, all values in X, W, and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components or latent "parts" into which X has been decomposed. The question arises: how does one choose k? In this thesis, we assess multiple methods for estimating the number of components k in the context of NMF, and we also examine the e ects of various types of normalization on this estimate. We conclude that when estimating k, it is best not to perform any normalization. If it is known or assumed that the underlying components are orthogonal or nearly so, then perhaps Velicer's MAP or Minka's Laplace-PCA method might be best to use. However, in the general case where it is unknown whether the underlying components are orthogonal or not, none of the methods for estimating k seemed obviously better than the others.
Non-symmetrical correspondence analysis (NSCA) is a useful tool for graphically detecting the asymmetric relationship between two categorical variables. Most of the theory associated with NSCA does not distinguish between a two-way... more
Non-symmetrical correspondence analysis (NSCA) is a useful tool for graphically detecting the asymmetric relationship between two categorical variables. Most of the theory associated with NSCA does not distinguish between a two-way contingency table of ordinal variables and a two-way one of nominal variables. Typically, singular value decomposition (SVD) is used in classical NSCA for dimension reduction. A bivariate moment decomposition (BMD) for ordinal variables in contingency tables using orthogonal polynomials and generalized correlations is proposed. This method not only takes into account the ordinal nature of the two categorical variables, but also permits for the detection of significant association in terms of location, dispersion and higher order components.
In this paper, wavelets and fuzzy support vector machines are used to automated detect and classify power quality (PQ) disturbances. Electric power quality is an aspect of power engineering that has been with us since the inception of... more
In this paper, wavelets and fuzzy support vector machines are used to automated detect and classify power quality (PQ) disturbances. Electric power quality is an aspect of power engineering that has been with us since the inception of power systems. The types of concerned disturbances include voltage sags, swells, interruptions, switching transients, impulses, flickers, harmonics, and notches. Fourier transform and wavelet analysis are utilized to denoise the digital signals, to decompose the signals and then to obtain eight common features for the sampling PQ disturbance signals. A fuzzy support vector machines is designed and trained by 8-dimension feature space points for making a decision regarding the type of the disturbance. Simulation cases illustrate the effectiveness.
I introduce Forecastable Component Analysis (ForeCA), a novel dimension reduction technique for temporally dependent signals. Based on a new forecastability measure, ForeCA finds an optimal transformation to separate multivariate signal... more
I introduce Forecastable Component Analysis (ForeCA), a novel dimension reduction technique for temporally dependent signals. Based on a new forecastability measure, ForeCA finds an optimal transformation to separate multivariate signal into a forecastable and an orthogonal white noise space. I present a provably converging algorithm with a fast eigenvector solution. Applications to financial and macro-economic data show that ForeCA can successfully discover informative structure in multivariate time series: structure that can be used for forecasting and classification.
The main methods for ForeCA are implemented in the R package ForeCA (cran.r-project.org/web/packages/ForeCA/index.html), which is publicly available on CRAN.
ABSTRACT. We propose a robust test for the equality of the covariance struc-tures in two functional samples. The test statistic has a chi-square asymptotic dis-tribution with a known number of degrees of freedom, which depends on the... more
ABSTRACT. We propose a robust test for the equality of the covariance struc-tures in two functional samples. The test statistic has a chi-square asymptotic dis-tribution with a known number of degrees of freedom, which depends on the level of dimension reduction needed to represent the data. Detailed analysis of the asymp-totic properties is developed. Finite sample performance is examined by a simulation study and an application to egg–laying curves of fruit flies.
Reconstruction of equations of motion from incomplete or noisy data and dimension reduction are two fundamental problems in the study of dynamical systems with many degrees of freedom. For the latter, extensive efforts have been made, but... more
Reconstruction of equations of motion from incomplete or noisy data and dimension reduction are two fundamental problems in the study of dynamical systems with many degrees of freedom. For the latter, extensive efforts have been made, but with limited success, to generalize the Zwanzig–Mori projection formalism, originally developed for Hamiltonian systems close to thermodynamic equilibrium, to general non-Hamiltonian systems lacking detailed balance. One difficulty introduced by such systems is the lack of an invariant measure, needed to define a statistical distribution. Based on a recent discovery that a non-Hamiltonian system defined by a set of stochastic differential
equations can be mapped to a Hamiltonian system, we develop such general projection formalism. In the resulting generalized Langevin equations, a set of generalized fluctuation–dissipation relations connect the memory kernel and the random noise terms, analogous to Hamiltonian systems obeying detailed balance. Lacking of these relations restricts previous application of the generalized Langevin formalism. Result of this work may serve as the theoretical basis for further technical developments
on model reconstruction with reduced degrees of freedom. We first use an analytically solvable example to illustrate the formalism and the fluctuation–dissipation relation. Our numerical test on a chemical network with end-product inhibition further demonstrates the validity of the formalism. We suggest that the formalism can find wide applications in scientific modeling. Specifically, we discuss potential applications to biological networks. In particular, the method provides a suitable framework for gaining insights into network properties such as robustness and parameter transferability
A novel approach using volumetric texture and reduced-spectral features is presented for hyperspectral image classification. Using this approach, the volumetric textural features were extracted by volumetric gray-level co-occurrence... more
A novel approach using volumetric texture and reduced-spectral features is presented for hyperspectral image classification. Using this approach, the volumetric textural features were extracted by volumetric gray-level co-occurrence matrices (VGLCM). The spectral features were extracted by minimum estimated abundance covariance (MEAC) and linear prediction (LP)-based band selection, and a semi-supervised k-means (SKM) clustering method with deleting the worst cluster (SKMd) bandclustering algorithms. Moreover, four feature combination schemes were designed for hyperspectral image classification by using spectral and textural features. It has been proven that the proposed method using VGLCM outperforms the gray-level co-occurrence matrices (GLCM) method, and the experimental results indicate that the combination of spectral information with volumetric textural features leads to an improved classification performance in hyperspectral imagery.
The paper proposes a latent variable model for binary data coming from an unobserved heterogeneous population. The heterogeneity is taken into account by replacing the traditional assumption of Gaussian distributed factors by a finite... more
The paper proposes a latent variable model for binary data coming from an unobserved heterogeneous population. The heterogeneity is taken into account by replacing the traditional assumption of Gaussian distributed factors by a finite mixture of multivariate Gaussians. The aim of the proposed model is twofold: it allows to achieve dimension reduction when the data are dichotomous and, simultaneously, it performs model based clustering in the latent space. Model estimation is obtained by means of a maximum likelihood method via a generalized version of the EM algorithm. In order to evaluate the performance of the model a simulation study and two real applications are illustrated.