Yutaka Kano | Osaka University (original) (raw)
Papers by Yutaka Kano
arXiv (Cornell University), Dec 19, 2013
We consider the problem of full information maximum likelihood (FIML) estimation in a factor anal... more We consider the problem of full information maximum likelihood (FIML) estimation in a factor analysis model when a majority of the data values are missing. The expectation-maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on observed variables are included in complete data. However, the EM algorithm has an extremely high computational cost when the number of observations is large and/or plenty of missing values are involved. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on observed variables as a part of complete data. Our algorithm is applied to a real data set collected from a Web questionnaire that asks about first impressions of human; almost 90% of the data values are missing. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy even if the number of observations is large. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes.
IEICE technical report. Speech, 2012
Journal of Statistical Computation and Simulation, Jan 6, 2015
The authors build on the idea put forward by Shugan to infer product maps from scanning data. The... more The authors build on the idea put forward by Shugan to infer product maps from scanning data. They demonstrate that the actual estimation procedure used by Shugan has several methodological problems and may yield unstable estimates. They propose an alternative estimation procedure, full-information maximum likelihood (FIML), which addresses the problems and yields significantly improved results. An important additional advantage of the procedure is that the parameters of the preference distribution can be estimated simultaneously with the brand coordinates. Hence, it is not necessary to assume a fixed (uniform) distribution of preferences. An empirical application is presented in which the outcomes obtained from Shugan's procedure are compared with those from the proposed procedure.
Recently, robust extensions of normal theory statistics have been proposed to permit modeling und... more Recently, robust extensions of normal theory statistics have been proposed to permit modeling under a wider class of distributions (e.g., Taylor, 1992). Let X be a p X 1 random vector, p a p X 1 location parameter, and I' a p X p scatter matrix. Kano et al. (1993) studied inference in the elliptical class of distributions and gave a criterion for the choice of a particular family within this class to best describe the data at hand when the latter exhibit serious departure from normality. In this paper, we investigate the criterion for a simple but general setup , namely, when the operating distribution is multivariate t with v degrees of freedom and the model is also a multivariate t-distribution with (Y degrees of freedom. We compute the exact inefficiency of the estimators of p and V based on that model and compare it to the one based on the mutivariate normal model. Our results provide evidence for the choice of v = 4 proposed by Lange et al. (1989). In addition, we give numerical results showing that for fixed v, the inflation of the variance of the pseudo maximum likelihood estimator of the scatter matrix, as a function of the hypothesized degrees of freedom a, is increasing in its domain.
Independent component analysis (ICA, see, e.g., Hyvarinen, et al., 2001) is a technique of multiv... more Independent component analysis (ICA, see, e.g., Hyvarinen, et al., 2001) is a technique of multivariate analysis that has been developed to separate a multivariate observational sensor vector x consisting of unobserved source signals s mixed linearly by an unknown mixed matrix A. The noisy ICA model is a variant of ICA models created by adding an error term n, represented as x = As + n, where x, s and n are, respectively, p-, m- and p-dimensional random vectors with the zero mean vector, and s and n are independent. In addition, the components of s are independently and nonnormally distributed while n is assumed to be normally distributed. The (noisy) ICA model is said to be separable if the mixing matrix is identifiable and the original signals can be recovered from the observed sensors. In the ICA model i.e. n = 0, Comon (1994) and Eriksson and Koivunen (2003) have proved that the ICA model is separable if A is of full column rank and at most one component of s may be normally dis...
Kodo Keiryogaku (The Japanese Journal of Behaviormetrics), 2002
In this rejoinder, special attentions are paid to error covariances and specific factors in the c... more In this rejoinder, special attentions are paid to error covariances and specific factors in the comparison between SEM and traditional methods. When a factor analysis model receives a poor fit, it does not make sense to simply remove important variables although inconsistent with the factor analysis model, as pointed out by the discussants. It is to be emphasized that a better way than removing the variables is to allow for error covariances, in order to overcome the inconsistency problem. The model with error covariances guarantees the invariance of estimation results over item selection.
Studies in Classification, Data Analysis, and Knowledge Organization, 1998
There are many causes of occurrence of improper solutions in factor analysis. Identifying potenti... more There are many causes of occurrence of improper solutions in factor analysis. Identifying potential causes of the improper solutions gives very useful information on suitability of the model considered for a data set. This paper studies possible causes of improper solutions in exploratory factor analysis, focusing upon (A) sampling fluctuations, (B) model underidentifiable and (C) model unfitted, each having several more detailed items. We then give a checklist to identify the cause of the improper solution obtained and suggest a method of reanalysis of the data set for each cause.
Behaviormetrika, 1997
Any exploratory factor analysis model requires at least three indicators (observed variables) for... more Any exploratory factor analysis model requires at least three indicators (observed variables) for each common factor to ensure model identifiability. If one would make exploratory factor analysis for a data set in which one of common factors would have only two indicators in its population, one would encounter difficulties such as improper solutions and nonconvergence of iterative process in calculating estimates. In this paper, we first develop conditions for identifiability of the remaining factor loadings except for a factor loading vector which relates to a common factor with only two indicators. Two models for analyzing such data sets are then proposed with the help of confirmatory factor analysis and covariance structure analysis. The first model is an exploratory factor analysis model that permits correlation between unique factors ; the second model is a kind of confirmatory factor model with equal factor loadings. Two real data sets are analyzed to illustrate usefulness of these models.
Journal of Educational and Behavioral Statistics, 2018
Meta-analysis plays a key role in combining studies to obtain more reliable results. In social, b... more Meta-analysis plays a key role in combining studies to obtain more reliable results. In social, behavioral, and health sciences, measurement units are typically not well defined. More meaningful results can be obtained by standardizing the variables and via the analysis of the correlation matrix. Structural equation modeling (SEM) with the combined correlations, called meta-analytical SEM (MASEM), is a powerful tool for examining the relationship among latent constructs as well as those between the latent constructs and the manifest variables. Three classes of methods have been proposed for MASEM: (1) generalized least squares (GLS) in combining correlations and in estimating the structural model, (2) normal-distribution-based maximum likelihood (ML) in combining the correlations and then GLS in estimating the structural model (ML-GLS), and (3) ML in combining correlations and in estimating the structural model (ML). The current article shows that these three methods are equivalent....
Journal of the Japanese Society of Computational Statistics, 2013
Since a latent trait θ can not be directly observed in item response theory models, it is difficu... more Since a latent trait θ can not be directly observed in item response theory models, it is difficult to specify an item response function (IRF). Many mathematical models have been proposed, among which the two-parameter logistic model (2PLM) is often included. In this article, we will propose a new parametric model, namely, a finite mixture of logistic models (MLM). The MLM has different mixing weights per item, and can model a plateau in the learning curve, which is a well-known phenomenon in education and psychology. It is also known that finite mixtures have some problems with estimating item parameters. Therefore, we develop a new useful estimation algorithm for item parameters and present simulation studies which show that this estimation algorithm works well. In fact, when the MLM was applied to analyze real data, we also found that the MLM makes it possible to distinguish whether or not a plateau appears in an IRF, whereas the 2PLM does not have this capability.
Entropy, 2015
A path analysis method for causal systems based on generalized linear models is proposed by using... more A path analysis method for causal systems based on generalized linear models is proposed by using entropy. A practical example is introduced, and a brief explanation of the entropy coefficient of determination is given. Direct and indirect effects of explanatory variables are discussed as log odds ratios, i.e., relative information, and a method for summarizing the effects is proposed. The example dataset is re-analyzed by using the method.
Kodo Keiryogaku (The Japanese Journal of Behaviormetrics), 2002
Lecture Notes in Computer Science, 2006
Causal discovery is the task of finding plausible causal relationships from statistical data [1, ... more Causal discovery is the task of finding plausible causal relationships from statistical data [1, 2]. Such methods rely on various assumptions about the data generating process to identify it from uncontrolled observations. We have recently proposed a causal discovery method based on independent component analysis (ICA) called LiNGAM [3], showing how to completely identify the data generating process under the assumptions of linearity, non-gaussianity, and no hidden variables. In this paper, after briefly recapitulating this approach, we focus on the algorithmic problems encountered when the number of variables considered is large. Thus we extend the applicability of the method to data sets with tens of variables or more. Experiments confirm the performance of the proposed algorithms, implemented as part of the latest version of our freely available Matlab/Octave LiNGAM package.
Communications in Statistics - Theory and Methods, 2014
We propose a new test for the equality of the mean vectors between a two groups with the same num... more We propose a new test for the equality of the mean vectors between a two groups with the same number of the observations in highdimensional data. The existing tests for this problem require a strong condition on the population covariance matrix. The proposed test in this paper does not require such conditions for it. This test will be obtained in a general model, that is, the data need not be normally distributed.
In this paper, we consider the boosting method with a penalized risk functional as a regularizati... more In this paper, we consider the boosting method with a penalized risk functional as a regularization method. A risk consistency is established under some conditions on the penalizing parameter that controls the degree of the penalty. The condition to prevent the overlearning is just simple: the parameter converges to zero as the sample size goes to infinity. It can also be seen that the penalizing parameter can be changed adaptively at each boosting step.
Journal of Multivariate Analysis, 1998
Based on concentration probability of estimators about a true parameter, thirdorder asymptotic ef... more Based on concentration probability of estimators about a true parameter, thirdorder asymptotic efficiency of the first-order bias-adjusted MLE within the class of first-order bias-adjusted estimators has been well established in a variety of probability models. In this paper we consider the class of second-order biasadjusted Fisher consistent estimators of a structural parameter vector on the basis of an i.i.d. sample drawn from a curved exponential-type distribution, and study the asymptotic concentration probability, about a true parameter vector, of these estimators up to the fifth-order. In particular, (i) we show that third-order efficient estimators are always fourth-order efficient; (ii) a necessary and sufficient condition for fifth-order efficiency is provided; and finally (iii) the MLE is shown to be fifth-order efficient.
Journal of Marketing Research, 1991
The authors build on the idea put forward by Shugan to infer product maps from scanning data. The... more The authors build on the idea put forward by Shugan to infer product maps from scanning data. They demonstrate that the actual estimation procedure used by Shugan has several methodological problems and may yield unstable estimates. They propose an alternative estimation procedure, full-information maximum likelihood (FIML), which addresses the problems and yields significantly improved results. An important additional advantage of the procedure is that the parameters of the preference distribution can be estimated simultaneously with the brand coordinates. HerKe, it is not necessary to assume a fixed (uniform) distribution of preferences. An empirical application is presented in which the outcomes obtained from Shugan's procedure are compared with those from the proposed procedure.
Journal of Multivariate Analysis, 2012
Heterogeneous data are common in social, educational, medical and behavioral sciences. Recently, ... more Heterogeneous data are common in social, educational, medical and behavioral sciences. Recently, finite mixture structural equation models (SEMs) and two-level SEMs have been respectively proposed to analyze different kinds of heterogeneous data. Due to the complexity of these two kinds of SEMs, model comparison is difficult. For instance, the computational burden in evaluating the Bayes factor is heavy, and the Deviance Information Criterion may not be appropriate for mixture SEMs. In this paper, a Bayesian criterion-based method called the L v measure, which involves a component related to the variability of the prediction and a component related to the discrepancy between the data and the prediction, is proposed. Moreover, the calibration distribution is introduced for formal comparison of competing models. Two simulation studies, and two applications based on real data sets are presented to illustrate the satisfactory performance of the L v measure in model comparison.
New Developments in Psychometrics, 2003
arXiv (Cornell University), Dec 19, 2013
We consider the problem of full information maximum likelihood (FIML) estimation in a factor anal... more We consider the problem of full information maximum likelihood (FIML) estimation in a factor analysis model when a majority of the data values are missing. The expectation-maximization (EM) algorithm is often used to find the FIML estimates, in which the missing values on observed variables are included in complete data. However, the EM algorithm has an extremely high computational cost when the number of observations is large and/or plenty of missing values are involved. In this paper, we propose a new algorithm that is based on the EM algorithm but that efficiently computes the FIML estimates. A significant improvement in the computational speed is realized by not treating the missing values on observed variables as a part of complete data. Our algorithm is applied to a real data set collected from a Web questionnaire that asks about first impressions of human; almost 90% of the data values are missing. When there are many missing data values, it is not clear if the FIML procedure can achieve good estimation accuracy even if the number of observations is large. In order to investigate this, we conduct Monte Carlo simulations under a wide variety of sample sizes.
IEICE technical report. Speech, 2012
Journal of Statistical Computation and Simulation, Jan 6, 2015
The authors build on the idea put forward by Shugan to infer product maps from scanning data. The... more The authors build on the idea put forward by Shugan to infer product maps from scanning data. They demonstrate that the actual estimation procedure used by Shugan has several methodological problems and may yield unstable estimates. They propose an alternative estimation procedure, full-information maximum likelihood (FIML), which addresses the problems and yields significantly improved results. An important additional advantage of the procedure is that the parameters of the preference distribution can be estimated simultaneously with the brand coordinates. Hence, it is not necessary to assume a fixed (uniform) distribution of preferences. An empirical application is presented in which the outcomes obtained from Shugan's procedure are compared with those from the proposed procedure.
Recently, robust extensions of normal theory statistics have been proposed to permit modeling und... more Recently, robust extensions of normal theory statistics have been proposed to permit modeling under a wider class of distributions (e.g., Taylor, 1992). Let X be a p X 1 random vector, p a p X 1 location parameter, and I' a p X p scatter matrix. Kano et al. (1993) studied inference in the elliptical class of distributions and gave a criterion for the choice of a particular family within this class to best describe the data at hand when the latter exhibit serious departure from normality. In this paper, we investigate the criterion for a simple but general setup , namely, when the operating distribution is multivariate t with v degrees of freedom and the model is also a multivariate t-distribution with (Y degrees of freedom. We compute the exact inefficiency of the estimators of p and V based on that model and compare it to the one based on the mutivariate normal model. Our results provide evidence for the choice of v = 4 proposed by Lange et al. (1989). In addition, we give numerical results showing that for fixed v, the inflation of the variance of the pseudo maximum likelihood estimator of the scatter matrix, as a function of the hypothesized degrees of freedom a, is increasing in its domain.
Independent component analysis (ICA, see, e.g., Hyvarinen, et al., 2001) is a technique of multiv... more Independent component analysis (ICA, see, e.g., Hyvarinen, et al., 2001) is a technique of multivariate analysis that has been developed to separate a multivariate observational sensor vector x consisting of unobserved source signals s mixed linearly by an unknown mixed matrix A. The noisy ICA model is a variant of ICA models created by adding an error term n, represented as x = As + n, where x, s and n are, respectively, p-, m- and p-dimensional random vectors with the zero mean vector, and s and n are independent. In addition, the components of s are independently and nonnormally distributed while n is assumed to be normally distributed. The (noisy) ICA model is said to be separable if the mixing matrix is identifiable and the original signals can be recovered from the observed sensors. In the ICA model i.e. n = 0, Comon (1994) and Eriksson and Koivunen (2003) have proved that the ICA model is separable if A is of full column rank and at most one component of s may be normally dis...
Kodo Keiryogaku (The Japanese Journal of Behaviormetrics), 2002
In this rejoinder, special attentions are paid to error covariances and specific factors in the c... more In this rejoinder, special attentions are paid to error covariances and specific factors in the comparison between SEM and traditional methods. When a factor analysis model receives a poor fit, it does not make sense to simply remove important variables although inconsistent with the factor analysis model, as pointed out by the discussants. It is to be emphasized that a better way than removing the variables is to allow for error covariances, in order to overcome the inconsistency problem. The model with error covariances guarantees the invariance of estimation results over item selection.
Studies in Classification, Data Analysis, and Knowledge Organization, 1998
There are many causes of occurrence of improper solutions in factor analysis. Identifying potenti... more There are many causes of occurrence of improper solutions in factor analysis. Identifying potential causes of the improper solutions gives very useful information on suitability of the model considered for a data set. This paper studies possible causes of improper solutions in exploratory factor analysis, focusing upon (A) sampling fluctuations, (B) model underidentifiable and (C) model unfitted, each having several more detailed items. We then give a checklist to identify the cause of the improper solution obtained and suggest a method of reanalysis of the data set for each cause.
Behaviormetrika, 1997
Any exploratory factor analysis model requires at least three indicators (observed variables) for... more Any exploratory factor analysis model requires at least three indicators (observed variables) for each common factor to ensure model identifiability. If one would make exploratory factor analysis for a data set in which one of common factors would have only two indicators in its population, one would encounter difficulties such as improper solutions and nonconvergence of iterative process in calculating estimates. In this paper, we first develop conditions for identifiability of the remaining factor loadings except for a factor loading vector which relates to a common factor with only two indicators. Two models for analyzing such data sets are then proposed with the help of confirmatory factor analysis and covariance structure analysis. The first model is an exploratory factor analysis model that permits correlation between unique factors ; the second model is a kind of confirmatory factor model with equal factor loadings. Two real data sets are analyzed to illustrate usefulness of these models.
Journal of Educational and Behavioral Statistics, 2018
Meta-analysis plays a key role in combining studies to obtain more reliable results. In social, b... more Meta-analysis plays a key role in combining studies to obtain more reliable results. In social, behavioral, and health sciences, measurement units are typically not well defined. More meaningful results can be obtained by standardizing the variables and via the analysis of the correlation matrix. Structural equation modeling (SEM) with the combined correlations, called meta-analytical SEM (MASEM), is a powerful tool for examining the relationship among latent constructs as well as those between the latent constructs and the manifest variables. Three classes of methods have been proposed for MASEM: (1) generalized least squares (GLS) in combining correlations and in estimating the structural model, (2) normal-distribution-based maximum likelihood (ML) in combining the correlations and then GLS in estimating the structural model (ML-GLS), and (3) ML in combining correlations and in estimating the structural model (ML). The current article shows that these three methods are equivalent....
Journal of the Japanese Society of Computational Statistics, 2013
Since a latent trait θ can not be directly observed in item response theory models, it is difficu... more Since a latent trait θ can not be directly observed in item response theory models, it is difficult to specify an item response function (IRF). Many mathematical models have been proposed, among which the two-parameter logistic model (2PLM) is often included. In this article, we will propose a new parametric model, namely, a finite mixture of logistic models (MLM). The MLM has different mixing weights per item, and can model a plateau in the learning curve, which is a well-known phenomenon in education and psychology. It is also known that finite mixtures have some problems with estimating item parameters. Therefore, we develop a new useful estimation algorithm for item parameters and present simulation studies which show that this estimation algorithm works well. In fact, when the MLM was applied to analyze real data, we also found that the MLM makes it possible to distinguish whether or not a plateau appears in an IRF, whereas the 2PLM does not have this capability.
Entropy, 2015
A path analysis method for causal systems based on generalized linear models is proposed by using... more A path analysis method for causal systems based on generalized linear models is proposed by using entropy. A practical example is introduced, and a brief explanation of the entropy coefficient of determination is given. Direct and indirect effects of explanatory variables are discussed as log odds ratios, i.e., relative information, and a method for summarizing the effects is proposed. The example dataset is re-analyzed by using the method.
Kodo Keiryogaku (The Japanese Journal of Behaviormetrics), 2002
Lecture Notes in Computer Science, 2006
Causal discovery is the task of finding plausible causal relationships from statistical data [1, ... more Causal discovery is the task of finding plausible causal relationships from statistical data [1, 2]. Such methods rely on various assumptions about the data generating process to identify it from uncontrolled observations. We have recently proposed a causal discovery method based on independent component analysis (ICA) called LiNGAM [3], showing how to completely identify the data generating process under the assumptions of linearity, non-gaussianity, and no hidden variables. In this paper, after briefly recapitulating this approach, we focus on the algorithmic problems encountered when the number of variables considered is large. Thus we extend the applicability of the method to data sets with tens of variables or more. Experiments confirm the performance of the proposed algorithms, implemented as part of the latest version of our freely available Matlab/Octave LiNGAM package.
Communications in Statistics - Theory and Methods, 2014
We propose a new test for the equality of the mean vectors between a two groups with the same num... more We propose a new test for the equality of the mean vectors between a two groups with the same number of the observations in highdimensional data. The existing tests for this problem require a strong condition on the population covariance matrix. The proposed test in this paper does not require such conditions for it. This test will be obtained in a general model, that is, the data need not be normally distributed.
In this paper, we consider the boosting method with a penalized risk functional as a regularizati... more In this paper, we consider the boosting method with a penalized risk functional as a regularization method. A risk consistency is established under some conditions on the penalizing parameter that controls the degree of the penalty. The condition to prevent the overlearning is just simple: the parameter converges to zero as the sample size goes to infinity. It can also be seen that the penalizing parameter can be changed adaptively at each boosting step.
Journal of Multivariate Analysis, 1998
Based on concentration probability of estimators about a true parameter, thirdorder asymptotic ef... more Based on concentration probability of estimators about a true parameter, thirdorder asymptotic efficiency of the first-order bias-adjusted MLE within the class of first-order bias-adjusted estimators has been well established in a variety of probability models. In this paper we consider the class of second-order biasadjusted Fisher consistent estimators of a structural parameter vector on the basis of an i.i.d. sample drawn from a curved exponential-type distribution, and study the asymptotic concentration probability, about a true parameter vector, of these estimators up to the fifth-order. In particular, (i) we show that third-order efficient estimators are always fourth-order efficient; (ii) a necessary and sufficient condition for fifth-order efficiency is provided; and finally (iii) the MLE is shown to be fifth-order efficient.
Journal of Marketing Research, 1991
The authors build on the idea put forward by Shugan to infer product maps from scanning data. The... more The authors build on the idea put forward by Shugan to infer product maps from scanning data. They demonstrate that the actual estimation procedure used by Shugan has several methodological problems and may yield unstable estimates. They propose an alternative estimation procedure, full-information maximum likelihood (FIML), which addresses the problems and yields significantly improved results. An important additional advantage of the procedure is that the parameters of the preference distribution can be estimated simultaneously with the brand coordinates. HerKe, it is not necessary to assume a fixed (uniform) distribution of preferences. An empirical application is presented in which the outcomes obtained from Shugan's procedure are compared with those from the proposed procedure.
Journal of Multivariate Analysis, 2012
Heterogeneous data are common in social, educational, medical and behavioral sciences. Recently, ... more Heterogeneous data are common in social, educational, medical and behavioral sciences. Recently, finite mixture structural equation models (SEMs) and two-level SEMs have been respectively proposed to analyze different kinds of heterogeneous data. Due to the complexity of these two kinds of SEMs, model comparison is difficult. For instance, the computational burden in evaluating the Bayes factor is heavy, and the Deviance Information Criterion may not be appropriate for mixture SEMs. In this paper, a Bayesian criterion-based method called the L v measure, which involves a component related to the variability of the prediction and a component related to the discrepancy between the data and the prediction, is proposed. Moreover, the calibration distribution is introduced for formal comparison of competing models. Two simulation studies, and two applications based on real data sets are presented to illustrate the satisfactory performance of the L v measure in model comparison.
New Developments in Psychometrics, 2003