Mortaza Jamshidian - Academia.edu (original) (raw)

Papers by Mortaza Jamshidian

Research paper thumbnail of Validation of the Hot and Cold Vehicle Mode Percentages for the State of Florida

In using MOBILE5a, the default values for transient (hot and cold) and stabilized mode percentage... more In using MOBILE5a, the default values for transient (hot and cold) and stabilized mode percentages are normally used. However, they may not be applicable to regions and time periods outside of the scenarios under which they were derived. A research effort was undertaken by the Florida Department of Transportation (FDOT) and the University of Central Florida (UCF) to obtain vehicle mode percentages that are more representative of conditions in Florida. The first phase of this study was conducted using mail-back survey sheets and the data were used to calculate mode percentages as percent vehicle miles traveled (VMT) in terms of categorizing variables (i.e., area type, facility type, time of day, etc.). This report presents the methodology and results from the second phase of the study which were used to supplement and validate the data from the first phase. Different approaches were taken in the second phase in that the surveys involved verbally asking questions at intersections rather than using mail-back survey sheets and the mode percentages were based on travel times to an intersection rather than entire trips. The data from the first phase were also re-evaluated. Aggregation of the data into just peak and off-peak categories was possible based on statistical testing since statistical testing appears to indicate that the results from the two phases are similar. The data from the two phases were combined to produce the final percentages that may be used for the state of Florida and are reported here.

Research paper thumbnail of 2 Advances in Analysis of Mean and Covariance Structure when Data are Incomplete

Handbook of computing and statistics with applications, 2007

Research paper thumbnail of Application of the conjugate gradient methods in statistical computing

Research paper thumbnail of Postmodeling Sensitivity Analysis to Detect the Effect of Missing Data Mechanisms

Multivariate Behavioral Research, Sep 10, 2008

Incomplete or missing data is a common problem in almost all areas of empirical research. It is w... more Incomplete or missing data is a common problem in almost all areas of empirical research. It is well known that simple and ad hoc methods such as complete case analysis or mean imputation can lead to biased and/or inefficient estimates. The method of maximum likelihood works well; however, when the missing data mechanism is not one of missing completely at random (MCAR) or missing at random (MAR), it too can result in incorrect inference. Statistical tests for MCAR have been proposed, but these are restricted to a certain class of problems. The idea of sensitivity analysis as a means to detect the missing data mechanism has been proposed in the statistics literature in conjunction with selection models where conjointly the data and missing data mechanism are modeled. Our approach is different here in that we do not model the missing data mechanism but use the data at hand to examine the sensitivity of a given model to the missing data mechanism. Our methodology is meant to raise a flag for researchers when the assumptions of MCAR (or MAR) do not hold. To our knowledge, no specific proposal for sensitivity analysis has been set forth in the area of structural equation models (SEM). This article gives a specific method for performing postmodeling sensitivity analysis using a statistical test and graphs. A simulation study is performed to assess the methodology in the context of structural equation models. This study shows success of the method, especially when the sample size is 300 or more and the percentage of missing data is 20% or more. The method is also used to study a set of real data measuring physical and social self-concepts in 463 Nigerian adolescents using a factor analysis model.

Research paper thumbnail of Using asymptotic results to obtain a confidence interval for the population median

International Journal of Mathematical Education in Science and Technology, Sep 15, 2007

ABSTRACT Almost all introductory and intermediate level statistics textbooks include the topic of... more ABSTRACT Almost all introductory and intermediate level statistics textbooks include the topic of confidence interval for the population mean. Almost all these texts introduce the median as a robust measure of central tendency. Only a few of these books, however, cover inference on the population median and in particular confidence interval for the median. This may be due to the somewhat complex nature of the problem. This paper attempts to popularize a method that is conceptually and computationally simpler than the currently used methods in textbooks and has the promise of being more accessible to elementary/intermediate level statistics students. The method is conceptually simpler, because its development parallels that of obtaining a confidence interval for the mean and it involves concepts that are well-covered in elementary courses. It is computationally simple, because its major computational component is a smoothing method that is widely available in statistical software. For the latter reason, the proposed method is referred to as the Smoothing method. A simple R program is given that produces confidence intervals using the Smoothing method. Utilization of Minitab, SAS, and SPSS for this purpose is also discussed. A simulation study is performed to compare statistical properties of the proposed method to those of the two currently popular methods of Bootstrap and Binomial. Based on this limited simulation studies, it is observed that the Smoothing method is at least as good as, and in some respects is superior to, the Binomial and Bootstrap methods in samples of size as large or larger than 30.

Research paper thumbnail of Conjugate gradient methods in confirmatory factor analysis

Computational Statistics & Data Analysis, Mar 1, 1994

Research paper thumbnail of <b>SimReg</b>: A Software Includinge Some New Developments in Multiple Comparison and Simultaneous Confidence Bands for Linear Regression Models

Journal of Statistical Software, 2005

The problem of simultaneous inference and multiple comparison for comparing means of k(≥ 3) popul... more The problem of simultaneous inference and multiple comparison for comparing means of k(≥ 3) populations has been long studied in the statistics literature and is widely available in statistics literature. However to-date, the problem of multiple comparison of regression models has not found its way to the software. It is only recently that the computational aspects of this problem have been resolved in a general setting. SimReg employs this new methodology and provides users with software for multiple regression of several regression models. The comparisons can be among any set of pairs, and moreover any number of predictors can be included in the model. More importantly predictors can be constrained to their natural boundaries, if known. Computational methods for the problem of simultaneous confidence bands when predictors are constrained to intervals has also recently been addressed. SimReg utilizes this recent development to offer simultaneous confidence bands for regression models with any number of predictor variables. Again, the predictors can be constrained to their natural boundaries which results in narrower bands, as compared to the case where no restriction is imposed. A by-product of these confidence bands is a new method for comparing two regression surfaces, that is more informative than the usual partial F test.

Research paper thumbnail of An EM Algorithm for ML Factor Analysis with Missing Data

Springer eBooks, 1997

EM algorithm is a popular algorithm for obtaining maximum likelihood estimates. Here we propose a... more EM algorithm is a popular algorithm for obtaining maximum likelihood estimates. Here we propose an EM algorithm for the factor analysis model. This algorithm extends a previously proposed EM algorithm to handle problems with missing data. It is simple to implement and is the most storage efficient among its competitors. We apply our algorithm to three examples and discuss the results. For problems with reasonable amount of missing data, it converges in reasonable time. For problems with large amount of missing data EM algorithm is usually slow. For such cases we successfully apply two EM acceleration methods to our examples. Finally, we discuss different methods of obtaining standard errors and in particular we recommend a method based on center difference approximation to the derivative.

Research paper thumbnail of A quasi-newton method for minimum trace factor analysis

Journal of Statistical Computation and Simulation, Dec 1, 1998

In the past several algorithms have been given to solve the minimum trace factor analysis (MTFA) ... more In the past several algorithms have been given to solve the minimum trace factor analysis (MTFA) and the constrained minimum trace factor analysis (CMTFA) problems. Some of these algorithms, depending on the initial value, may converge to points that are not the solution to the above problems, some converge linearly, and some are quadratically convergent but are somewhat difficult to implement. In this paper we propose modified Han–Powell algorithms to solve the MTFA and CMTFA problems. The modifications deal with the problem of multiple eigenvalues. The proposed algorithms are globally convergent and their speed is locally superlinear. We also give a modified Han–Powell algorithm to solve the weighted minimum trace factor analysis (WMTFA) problem. This method is also locally superlinear and is simpler to implement as compared to methods proposed earlier. Four examples are given to show the performance of the proposed algorithms. More generally, our experience with these algorithms shows that, starting at...

Research paper thumbnail of Multiple Comparison of Several Linear Regression Models

Journal of the American Statistical Association, Jun 1, 2004

ABSTRACT Research on multiple comparison during the past 50 years or so has focused mainly on the... more ABSTRACT Research on multiple comparison during the past 50 years or so has focused mainly on the comparison of several population means. Several years ago, Spurrier considered the multiple comparison of several simple linear regression lines. He constructed simultaneous confidence bands for all of the contrasts of the simple linear regression lines over the entire range (-8, 8) when the models have the same design matrices. This article extends Spurrier&#39;s work in several directions. First, multiple linear regression models are considered and the design matrices are allowed to be different. Second, the predictor variables are either unconstrained or constrained to finite intervals. Third, the types of comparison allowed can be very flexible, including pairwise, many-one, and successive. Two simulation methods are proposed for the calculation of critical constants. The methodologies are illustrated with examples.

Research paper thumbnail of Strategies for Analysis of Incomplete Data

SAGE Publications Ltd eBooks, 2004

... The example above, however, does not have this prop-erty; namely, maximization of the Q-funct... more ... The example above, however, does not have this prop-erty; namely, maximization of the Q-function is as difficult as that of the log-likelihood £(8, yohJ. As we said, this example was not typical and we have only used it to demon-strate ideas. ...

Research paper thumbnail of On algorithms for restricted maximum likelihood estimation

Computational Statistics & Data Analysis, Mar 1, 2004

This work proposes a globally convergent algorithm, based on gradient projections, for maximum li... more This work proposes a globally convergent algorithm, based on gradient projections, for maximum likelihood estimation under linear equality and inequality restrictions (constraints) on parameters. The proposed algorithm has wide applicability, and as an important special case its application to restricted expectation-maximization (EM) problems is described. Often, a class of algorithms that we call expectation-restricted-maximization (ERM) is used to deal with constraints in the EM setting. We describe two such ERM algorithms that handle linear equality constraints, and discuss their convergence. As we explain, the assumptions for global convergence of one of the algorithms may be practically too restrictive, and as such we suggest a modiÿcation. We provide an example where the second algorithm fails. In general we argue that the gradient projection (GP) algorithm is superior to ERM algorithms in terms of simplicity of implementation and time to converge. We give an example of application of GP to parameter estimation of mixtures of normal densities where linear inequality constraints are imposed, and compare CPU times required for the algorithms discussed.

Research paper thumbnail of Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data

Psychometrika, Aug 3, 2010

Test of homogeneity of covariances (or homoscedasticity) among several groups has many applicatio... more Test of homogeneity of covariances (or homoscedasticity) among several groups has many applications in statistical analysis. In the context of incomplete data analysis, tests of homoscedasticity among groups of cases with identical missing data patterns have been proposed to test whether data are missing completely at random (MCAR). These tests of MCAR require large sample sizes n and/or large group sample sizes n i , and they usually fail when applied to nonnormal data. Hawkins (1981) proposed a test of multivariate normality and homoscedasticity that is an exact test for complete data when n i are small. This paper proposes a modification of this test for complete data to improve its performance, and extends its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete. Moreover, it is shown that the statistic used in the Hawkins test in conjunction with a nonparametric k-sample test can be used to obtain a nonparametric test of homoscedasticity that works well for both normal and non-normal data. It is explained how a combination of the proposed normal-theory Hawkins test and the nonparametric test can be employed to test for homoscedasticity, MCAR, and multivariate normality. Simulation studies show that the newly proposed tests generally outperform their existing competitors in terms of Type I error rejection rates. Also, a power study of the proposed tests indicates good power. The proposed methods use appropriate missing data imputations to impute missing data. Methods of multiple imputation are described and one of the methods is employed to confirm the result of our single imputation methods. Examples are provided where multiple imputation enables one to identify a group or groups whose covariance matrices differ from the majority of other groups.

Research paper thumbnail of Adaptive Robust Regression by Using a Nonlinear Regression Program

Journal of Statistical Software, 1999

Robust regression procedures have received considerable attention in mathematical statistics lite... more Robust regression procedures have received considerable attention in mathematical statistics literature. They, h o wever, have not received nearly as much attention by practitioners performing data analysis. A contributing factor to this may be the lack o f a vailability of these procedures in commonly used statistical software. In this paper we propose algorithms for obtaining parameter estimates and their asymptotic standard errors when tting regression models to data assuming normal/independent errors. The algorithms proposed can be implemented in the commonly available nonlinear regression programs. We review a number of previously proposed algorithms. As we discuss, these require special code and are di cult to implement in a nonlinear regression program. Methods of implementing the proposed algorithms in SAS-NLIN is discussed. Speci cally, the two applications of regression with the t and the slash family errors are discussed in detail. SAS NLIN and S-plus instructions are given for these two examples. Minor modi cation of these instructions can solve other problems at hand.

Research paper thumbnail of Testing equality of covariance matrices when data are incomplete

Computational Statistics & Data Analysis, May 1, 2007

In the statistics literature, a number of procedures have been proposed for testing equality of s... more In the statistics literature, a number of procedures have been proposed for testing equality of several groups' covariance matrices when data are complete, but this problem has not been considered for incomplete data in a general setting. This paper proposes statistical tests for equality of covariance matrices when data are missing. A Wald test (denoted by T 1), a likelihood ratio test (LRT) (denoted by R), based on the assumption of normal populations are developed. It is well-known that for the complete data case the classic LRT and the Wald test constructed under the normality assumption perform poorly in instances when data are not from multivariate normal distributions. As expected, this is also the case for the incomplete data case and therefore has led us to construct a robust Wald test (denoted by T 2) that performs well for both normal and non-normal data. A re-scaled LRT (denoted by R *) is also proposed. A simulation study is carried out to assess the performance of T 1 , T 2 , R, and R * in terms of closeness of their observed significance level to the nominal significance level as well as the power of these tests. It is found that T 2 performs very well for both normal and non-normal data in both small and large samples. In addition to its usual applications, we have discussed the application of the proposed tests in testing whether a set of data are missing completely at random (MCAR).

Research paper thumbnail of T-distribution modeling using the available statistical software

Computational Statistics & Data Analysis, Jul 1, 1997

Statistical inference based on the t-distribution is less vulnerable to outliers when compared to... more Statistical inference based on the t-distribution is less vulnerable to outliers when compared to the normal distribution. A number of authors have discussed and proposed algorithms for maximum likelihood (ML) estimation of the t-distribution. These algorithms generally require special code, to date, not available in commonly used statistical software. In this paper we discuss the use of the available statistical software for ML estimation of the t-distribution. More specifically, we discuss utilization of BMDP-LE and SAS-NLIN programs for linear and nonlinear regression with t errors. BMDP-LE program instructions require specification of the t density. The problem is that the t density involves the gamma function which is not available in the BMDP function library. We make use of the available functions in BMDP-LE to specify the t density. We show how SAS-NLIN can be used to implement a previously proposed iteratively reweighted least-squares algorithm. We also propose a direct method of using SAS-NLIN for regression estimation with t errors. The SAS-NLIN methods discussed may be implemented in any nonlinear regression program which allows iterative reweighting. The advantages and disadvantages of each method is discussed. Finally, we give a linear and a nonlinear regression example. With minor modifications, the BMDP and SAS input files given for our examples can be used to fit any linear or nonlinear regression model, assuming t distributed errors, to data. @ 1997 Elsevier Science B.V.

Research paper thumbnail of Advances in Analysis of Mean and Covariance Structure when Data are Incomplete

Elsevier eBooks, 2007

Abstract Missing data arise in many areas of empirical research. One such area is in the context ... more Abstract Missing data arise in many areas of empirical research. One such area is in the context of structural equation models (SEM). A review is presented of the methodological advances in fitting data to SEM and, more generally, to mean and covariance structure models when there is missing data. This encompasses common missing data mechanisms and some widely used methods for handling missing data. The methods fall under the classifications of ad-hoc, likelihood-based, and simulation-based. Also included are the results of some of the published simulation studies. In order to encourage further research, a method is proposed for performing sensitivity analysis, which up to now has been seemingly lacking. A simulation study was done to demonstrate the method using a three-factor factor analysis model, focusing on MCAR and MNAR data. Parameter estimates from samples of all available data, in the form of box plots, are compared with parameter estimates from only the complete data. The results indicate a possible distinction for determining missing data mechanisms.

Research paper thumbnail of Acceleration of the EM Algorithm by using Quasi-Newton Methods

Journal of The Royal Statistical Society Series B-statistical Methodology, Sep 1, 1997

The EM algorithm is a popular method for maximum likelihood estimation. Its simplicity in many ap... more The EM algorithm is a popular method for maximum likelihood estimation. Its simplicity in many applications and desirable convergence properties make it very attractive. Its sometimes slow convergence, however, has prompted researchers to propose methods to accelerate it. We review these methods, classifying them into three groups: pure, hybrid and EM-type accelerators. We propose a new pure and a new hybrid accelerator both based on quasi-Newton methods and numerically compare these and two other quasi-Newton accelerators. For this we use examples in each of three areas: Poisson mixtures, the estimation of covariance from incomplete data and multivariate normal mixtures. In these comparisons, the new hybrid accelerator was fastest on most of the examples and often dramatically so. In some cases it accelerated the EM algorithm by factors of over 100. The new pure accelerator is very simple to implement and competed well with the other accelerators. It accelerated the EM algorithm in some cases by factors of over 50. To obtain standard errors, we propose to approximate the inverse of the observed information matrix by using auxiliary output from the new hybrid accelerator. A numerical evaluation of these approximations indicates that they may be useful at least for exploratory purposes.

Research paper thumbnail of Nonorthogonal Analysis of Variance Using Gradient Methods

Journal of the American Statistical Association, Jun 1, 1988

ABSTRACT Because they require very little storage and can be computationally quite efficient, gra... more ABSTRACT Because they require very little storage and can be computationally quite efficient, gradient algorithms are attractive methods for fitting large nonorthogonal analysis of variance (ANOVA) models. A coordinate-free approach is used to provide very simple definitions for a number of well-known gradient algorithms and insights into their similarities and differences. The key to finding a good algorithm is finding an algorithm metric that leads to easily computed gradients and that is as close as possible to the metric defined by the ANOVA problem. This leads to the proposal of a new class of algorithms based on a proportional subclass metric. Several new theoretical results on convergence are derived, and some empirical comparisons are made. A similar, but much briefer, treatment of analysis of covariance is given. On theoretical convergence of the methods it is shown, for example, that the Golub and Nash (1982) algorithm requires at most d + 1 iterations if all but d of the cells in the model have the same cell count, that the proportional subclass algorithm converges in one step for proportional subclass problems, and that it demands at most 2 min(a, b) −1 iterations when fitting the two-way additive ANOVA model of size a by b. This can, for example, lead to large savings for models with many more rows than columns. For empirical comparisons a two-way ANOVA model is fitted to some artificial and nonartificial data. For the problems considered, the proportional subclass algorithm requires the fewest iterations followed by the Golub and Nash, optimized steepest descent, Hemmerle, and Yates algorithms, in that order. Some of the differences are quite substantial, involving factors of 10 or more.

Research paper thumbnail of Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modelling

Journal of Statistical Computation and Simulation, Jul 1, 2013

ABSTRACT Missing data are a common problem in almost all areas of empirical research. Ignoring th... more ABSTRACT Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.

Research paper thumbnail of Validation of the Hot and Cold Vehicle Mode Percentages for the State of Florida

In using MOBILE5a, the default values for transient (hot and cold) and stabilized mode percentage... more In using MOBILE5a, the default values for transient (hot and cold) and stabilized mode percentages are normally used. However, they may not be applicable to regions and time periods outside of the scenarios under which they were derived. A research effort was undertaken by the Florida Department of Transportation (FDOT) and the University of Central Florida (UCF) to obtain vehicle mode percentages that are more representative of conditions in Florida. The first phase of this study was conducted using mail-back survey sheets and the data were used to calculate mode percentages as percent vehicle miles traveled (VMT) in terms of categorizing variables (i.e., area type, facility type, time of day, etc.). This report presents the methodology and results from the second phase of the study which were used to supplement and validate the data from the first phase. Different approaches were taken in the second phase in that the surveys involved verbally asking questions at intersections rather than using mail-back survey sheets and the mode percentages were based on travel times to an intersection rather than entire trips. The data from the first phase were also re-evaluated. Aggregation of the data into just peak and off-peak categories was possible based on statistical testing since statistical testing appears to indicate that the results from the two phases are similar. The data from the two phases were combined to produce the final percentages that may be used for the state of Florida and are reported here.

Research paper thumbnail of 2 Advances in Analysis of Mean and Covariance Structure when Data are Incomplete

Handbook of computing and statistics with applications, 2007

Research paper thumbnail of Application of the conjugate gradient methods in statistical computing

Research paper thumbnail of Postmodeling Sensitivity Analysis to Detect the Effect of Missing Data Mechanisms

Multivariate Behavioral Research, Sep 10, 2008

Incomplete or missing data is a common problem in almost all areas of empirical research. It is w... more Incomplete or missing data is a common problem in almost all areas of empirical research. It is well known that simple and ad hoc methods such as complete case analysis or mean imputation can lead to biased and/or inefficient estimates. The method of maximum likelihood works well; however, when the missing data mechanism is not one of missing completely at random (MCAR) or missing at random (MAR), it too can result in incorrect inference. Statistical tests for MCAR have been proposed, but these are restricted to a certain class of problems. The idea of sensitivity analysis as a means to detect the missing data mechanism has been proposed in the statistics literature in conjunction with selection models where conjointly the data and missing data mechanism are modeled. Our approach is different here in that we do not model the missing data mechanism but use the data at hand to examine the sensitivity of a given model to the missing data mechanism. Our methodology is meant to raise a flag for researchers when the assumptions of MCAR (or MAR) do not hold. To our knowledge, no specific proposal for sensitivity analysis has been set forth in the area of structural equation models (SEM). This article gives a specific method for performing postmodeling sensitivity analysis using a statistical test and graphs. A simulation study is performed to assess the methodology in the context of structural equation models. This study shows success of the method, especially when the sample size is 300 or more and the percentage of missing data is 20% or more. The method is also used to study a set of real data measuring physical and social self-concepts in 463 Nigerian adolescents using a factor analysis model.

Research paper thumbnail of Using asymptotic results to obtain a confidence interval for the population median

International Journal of Mathematical Education in Science and Technology, Sep 15, 2007

ABSTRACT Almost all introductory and intermediate level statistics textbooks include the topic of... more ABSTRACT Almost all introductory and intermediate level statistics textbooks include the topic of confidence interval for the population mean. Almost all these texts introduce the median as a robust measure of central tendency. Only a few of these books, however, cover inference on the population median and in particular confidence interval for the median. This may be due to the somewhat complex nature of the problem. This paper attempts to popularize a method that is conceptually and computationally simpler than the currently used methods in textbooks and has the promise of being more accessible to elementary/intermediate level statistics students. The method is conceptually simpler, because its development parallels that of obtaining a confidence interval for the mean and it involves concepts that are well-covered in elementary courses. It is computationally simple, because its major computational component is a smoothing method that is widely available in statistical software. For the latter reason, the proposed method is referred to as the Smoothing method. A simple R program is given that produces confidence intervals using the Smoothing method. Utilization of Minitab, SAS, and SPSS for this purpose is also discussed. A simulation study is performed to compare statistical properties of the proposed method to those of the two currently popular methods of Bootstrap and Binomial. Based on this limited simulation studies, it is observed that the Smoothing method is at least as good as, and in some respects is superior to, the Binomial and Bootstrap methods in samples of size as large or larger than 30.

Research paper thumbnail of Conjugate gradient methods in confirmatory factor analysis

Computational Statistics & Data Analysis, Mar 1, 1994

Research paper thumbnail of <b>SimReg</b>: A Software Includinge Some New Developments in Multiple Comparison and Simultaneous Confidence Bands for Linear Regression Models

Journal of Statistical Software, 2005

The problem of simultaneous inference and multiple comparison for comparing means of k(≥ 3) popul... more The problem of simultaneous inference and multiple comparison for comparing means of k(≥ 3) populations has been long studied in the statistics literature and is widely available in statistics literature. However to-date, the problem of multiple comparison of regression models has not found its way to the software. It is only recently that the computational aspects of this problem have been resolved in a general setting. SimReg employs this new methodology and provides users with software for multiple regression of several regression models. The comparisons can be among any set of pairs, and moreover any number of predictors can be included in the model. More importantly predictors can be constrained to their natural boundaries, if known. Computational methods for the problem of simultaneous confidence bands when predictors are constrained to intervals has also recently been addressed. SimReg utilizes this recent development to offer simultaneous confidence bands for regression models with any number of predictor variables. Again, the predictors can be constrained to their natural boundaries which results in narrower bands, as compared to the case where no restriction is imposed. A by-product of these confidence bands is a new method for comparing two regression surfaces, that is more informative than the usual partial F test.

Research paper thumbnail of An EM Algorithm for ML Factor Analysis with Missing Data

Springer eBooks, 1997

EM algorithm is a popular algorithm for obtaining maximum likelihood estimates. Here we propose a... more EM algorithm is a popular algorithm for obtaining maximum likelihood estimates. Here we propose an EM algorithm for the factor analysis model. This algorithm extends a previously proposed EM algorithm to handle problems with missing data. It is simple to implement and is the most storage efficient among its competitors. We apply our algorithm to three examples and discuss the results. For problems with reasonable amount of missing data, it converges in reasonable time. For problems with large amount of missing data EM algorithm is usually slow. For such cases we successfully apply two EM acceleration methods to our examples. Finally, we discuss different methods of obtaining standard errors and in particular we recommend a method based on center difference approximation to the derivative.

Research paper thumbnail of A quasi-newton method for minimum trace factor analysis

Journal of Statistical Computation and Simulation, Dec 1, 1998

In the past several algorithms have been given to solve the minimum trace factor analysis (MTFA) ... more In the past several algorithms have been given to solve the minimum trace factor analysis (MTFA) and the constrained minimum trace factor analysis (CMTFA) problems. Some of these algorithms, depending on the initial value, may converge to points that are not the solution to the above problems, some converge linearly, and some are quadratically convergent but are somewhat difficult to implement. In this paper we propose modified Han–Powell algorithms to solve the MTFA and CMTFA problems. The modifications deal with the problem of multiple eigenvalues. The proposed algorithms are globally convergent and their speed is locally superlinear. We also give a modified Han–Powell algorithm to solve the weighted minimum trace factor analysis (WMTFA) problem. This method is also locally superlinear and is simpler to implement as compared to methods proposed earlier. Four examples are given to show the performance of the proposed algorithms. More generally, our experience with these algorithms shows that, starting at...

Research paper thumbnail of Multiple Comparison of Several Linear Regression Models

Journal of the American Statistical Association, Jun 1, 2004

ABSTRACT Research on multiple comparison during the past 50 years or so has focused mainly on the... more ABSTRACT Research on multiple comparison during the past 50 years or so has focused mainly on the comparison of several population means. Several years ago, Spurrier considered the multiple comparison of several simple linear regression lines. He constructed simultaneous confidence bands for all of the contrasts of the simple linear regression lines over the entire range (-8, 8) when the models have the same design matrices. This article extends Spurrier&#39;s work in several directions. First, multiple linear regression models are considered and the design matrices are allowed to be different. Second, the predictor variables are either unconstrained or constrained to finite intervals. Third, the types of comparison allowed can be very flexible, including pairwise, many-one, and successive. Two simulation methods are proposed for the calculation of critical constants. The methodologies are illustrated with examples.

Research paper thumbnail of Strategies for Analysis of Incomplete Data

SAGE Publications Ltd eBooks, 2004

... The example above, however, does not have this prop-erty; namely, maximization of the Q-funct... more ... The example above, however, does not have this prop-erty; namely, maximization of the Q-function is as difficult as that of the log-likelihood £(8, yohJ. As we said, this example was not typical and we have only used it to demon-strate ideas. ...

Research paper thumbnail of On algorithms for restricted maximum likelihood estimation

Computational Statistics & Data Analysis, Mar 1, 2004

This work proposes a globally convergent algorithm, based on gradient projections, for maximum li... more This work proposes a globally convergent algorithm, based on gradient projections, for maximum likelihood estimation under linear equality and inequality restrictions (constraints) on parameters. The proposed algorithm has wide applicability, and as an important special case its application to restricted expectation-maximization (EM) problems is described. Often, a class of algorithms that we call expectation-restricted-maximization (ERM) is used to deal with constraints in the EM setting. We describe two such ERM algorithms that handle linear equality constraints, and discuss their convergence. As we explain, the assumptions for global convergence of one of the algorithms may be practically too restrictive, and as such we suggest a modiÿcation. We provide an example where the second algorithm fails. In general we argue that the gradient projection (GP) algorithm is superior to ERM algorithms in terms of simplicity of implementation and time to converge. We give an example of application of GP to parameter estimation of mixtures of normal densities where linear inequality constraints are imposed, and compare CPU times required for the algorithms discussed.

Research paper thumbnail of Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data

Psychometrika, Aug 3, 2010

Test of homogeneity of covariances (or homoscedasticity) among several groups has many applicatio... more Test of homogeneity of covariances (or homoscedasticity) among several groups has many applications in statistical analysis. In the context of incomplete data analysis, tests of homoscedasticity among groups of cases with identical missing data patterns have been proposed to test whether data are missing completely at random (MCAR). These tests of MCAR require large sample sizes n and/or large group sample sizes n i , and they usually fail when applied to nonnormal data. Hawkins (1981) proposed a test of multivariate normality and homoscedasticity that is an exact test for complete data when n i are small. This paper proposes a modification of this test for complete data to improve its performance, and extends its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete. Moreover, it is shown that the statistic used in the Hawkins test in conjunction with a nonparametric k-sample test can be used to obtain a nonparametric test of homoscedasticity that works well for both normal and non-normal data. It is explained how a combination of the proposed normal-theory Hawkins test and the nonparametric test can be employed to test for homoscedasticity, MCAR, and multivariate normality. Simulation studies show that the newly proposed tests generally outperform their existing competitors in terms of Type I error rejection rates. Also, a power study of the proposed tests indicates good power. The proposed methods use appropriate missing data imputations to impute missing data. Methods of multiple imputation are described and one of the methods is employed to confirm the result of our single imputation methods. Examples are provided where multiple imputation enables one to identify a group or groups whose covariance matrices differ from the majority of other groups.

Research paper thumbnail of Adaptive Robust Regression by Using a Nonlinear Regression Program

Journal of Statistical Software, 1999

Robust regression procedures have received considerable attention in mathematical statistics lite... more Robust regression procedures have received considerable attention in mathematical statistics literature. They, h o wever, have not received nearly as much attention by practitioners performing data analysis. A contributing factor to this may be the lack o f a vailability of these procedures in commonly used statistical software. In this paper we propose algorithms for obtaining parameter estimates and their asymptotic standard errors when tting regression models to data assuming normal/independent errors. The algorithms proposed can be implemented in the commonly available nonlinear regression programs. We review a number of previously proposed algorithms. As we discuss, these require special code and are di cult to implement in a nonlinear regression program. Methods of implementing the proposed algorithms in SAS-NLIN is discussed. Speci cally, the two applications of regression with the t and the slash family errors are discussed in detail. SAS NLIN and S-plus instructions are given for these two examples. Minor modi cation of these instructions can solve other problems at hand.

Research paper thumbnail of Testing equality of covariance matrices when data are incomplete

Computational Statistics & Data Analysis, May 1, 2007

In the statistics literature, a number of procedures have been proposed for testing equality of s... more In the statistics literature, a number of procedures have been proposed for testing equality of several groups' covariance matrices when data are complete, but this problem has not been considered for incomplete data in a general setting. This paper proposes statistical tests for equality of covariance matrices when data are missing. A Wald test (denoted by T 1), a likelihood ratio test (LRT) (denoted by R), based on the assumption of normal populations are developed. It is well-known that for the complete data case the classic LRT and the Wald test constructed under the normality assumption perform poorly in instances when data are not from multivariate normal distributions. As expected, this is also the case for the incomplete data case and therefore has led us to construct a robust Wald test (denoted by T 2) that performs well for both normal and non-normal data. A re-scaled LRT (denoted by R *) is also proposed. A simulation study is carried out to assess the performance of T 1 , T 2 , R, and R * in terms of closeness of their observed significance level to the nominal significance level as well as the power of these tests. It is found that T 2 performs very well for both normal and non-normal data in both small and large samples. In addition to its usual applications, we have discussed the application of the proposed tests in testing whether a set of data are missing completely at random (MCAR).

Research paper thumbnail of T-distribution modeling using the available statistical software

Computational Statistics & Data Analysis, Jul 1, 1997

Statistical inference based on the t-distribution is less vulnerable to outliers when compared to... more Statistical inference based on the t-distribution is less vulnerable to outliers when compared to the normal distribution. A number of authors have discussed and proposed algorithms for maximum likelihood (ML) estimation of the t-distribution. These algorithms generally require special code, to date, not available in commonly used statistical software. In this paper we discuss the use of the available statistical software for ML estimation of the t-distribution. More specifically, we discuss utilization of BMDP-LE and SAS-NLIN programs for linear and nonlinear regression with t errors. BMDP-LE program instructions require specification of the t density. The problem is that the t density involves the gamma function which is not available in the BMDP function library. We make use of the available functions in BMDP-LE to specify the t density. We show how SAS-NLIN can be used to implement a previously proposed iteratively reweighted least-squares algorithm. We also propose a direct method of using SAS-NLIN for regression estimation with t errors. The SAS-NLIN methods discussed may be implemented in any nonlinear regression program which allows iterative reweighting. The advantages and disadvantages of each method is discussed. Finally, we give a linear and a nonlinear regression example. With minor modifications, the BMDP and SAS input files given for our examples can be used to fit any linear or nonlinear regression model, assuming t distributed errors, to data. @ 1997 Elsevier Science B.V.

Research paper thumbnail of Advances in Analysis of Mean and Covariance Structure when Data are Incomplete

Elsevier eBooks, 2007

Abstract Missing data arise in many areas of empirical research. One such area is in the context ... more Abstract Missing data arise in many areas of empirical research. One such area is in the context of structural equation models (SEM). A review is presented of the methodological advances in fitting data to SEM and, more generally, to mean and covariance structure models when there is missing data. This encompasses common missing data mechanisms and some widely used methods for handling missing data. The methods fall under the classifications of ad-hoc, likelihood-based, and simulation-based. Also included are the results of some of the published simulation studies. In order to encourage further research, a method is proposed for performing sensitivity analysis, which up to now has been seemingly lacking. A simulation study was done to demonstrate the method using a three-factor factor analysis model, focusing on MCAR and MNAR data. Parameter estimates from samples of all available data, in the form of box plots, are compared with parameter estimates from only the complete data. The results indicate a possible distinction for determining missing data mechanisms.

Research paper thumbnail of Acceleration of the EM Algorithm by using Quasi-Newton Methods

Journal of The Royal Statistical Society Series B-statistical Methodology, Sep 1, 1997

The EM algorithm is a popular method for maximum likelihood estimation. Its simplicity in many ap... more The EM algorithm is a popular method for maximum likelihood estimation. Its simplicity in many applications and desirable convergence properties make it very attractive. Its sometimes slow convergence, however, has prompted researchers to propose methods to accelerate it. We review these methods, classifying them into three groups: pure, hybrid and EM-type accelerators. We propose a new pure and a new hybrid accelerator both based on quasi-Newton methods and numerically compare these and two other quasi-Newton accelerators. For this we use examples in each of three areas: Poisson mixtures, the estimation of covariance from incomplete data and multivariate normal mixtures. In these comparisons, the new hybrid accelerator was fastest on most of the examples and often dramatically so. In some cases it accelerated the EM algorithm by factors of over 100. The new pure accelerator is very simple to implement and competed well with the other accelerators. It accelerated the EM algorithm in some cases by factors of over 50. To obtain standard errors, we propose to approximate the inverse of the observed information matrix by using auxiliary output from the new hybrid accelerator. A numerical evaluation of these approximations indicates that they may be useful at least for exploratory purposes.

Research paper thumbnail of Nonorthogonal Analysis of Variance Using Gradient Methods

Journal of the American Statistical Association, Jun 1, 1988

ABSTRACT Because they require very little storage and can be computationally quite efficient, gra... more ABSTRACT Because they require very little storage and can be computationally quite efficient, gradient algorithms are attractive methods for fitting large nonorthogonal analysis of variance (ANOVA) models. A coordinate-free approach is used to provide very simple definitions for a number of well-known gradient algorithms and insights into their similarities and differences. The key to finding a good algorithm is finding an algorithm metric that leads to easily computed gradients and that is as close as possible to the metric defined by the ANOVA problem. This leads to the proposal of a new class of algorithms based on a proportional subclass metric. Several new theoretical results on convergence are derived, and some empirical comparisons are made. A similar, but much briefer, treatment of analysis of covariance is given. On theoretical convergence of the methods it is shown, for example, that the Golub and Nash (1982) algorithm requires at most d + 1 iterations if all but d of the cells in the model have the same cell count, that the proportional subclass algorithm converges in one step for proportional subclass problems, and that it demands at most 2 min(a, b) −1 iterations when fitting the two-way additive ANOVA model of size a by b. This can, for example, lead to large savings for models with many more rows than columns. For empirical comparisons a two-way ANOVA model is fitted to some artificial and nonartificial data. For the problems considered, the proportional subclass algorithm requires the fewest iterations followed by the Golub and Nash, optimized steepest descent, Hemmerle, and Yates algorithms, in that order. Some of the differences are quite substantial, involving factors of 10 or more.

Research paper thumbnail of Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modelling

Journal of Statistical Computation and Simulation, Jul 1, 2013

ABSTRACT Missing data are a common problem in almost all areas of empirical research. Ignoring th... more ABSTRACT Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.