Optimal sufficient dimension reduction in regressions with categorical predictors (original) (raw)
Related papers
A selective review of sufficient dimension reduction for multivariate response regression
2022
We review sufficient dimension reduction (SDR) estimators with multivariate response in this paper. A wide range of SDR methods are characterized as inverse regression SDR estimators or forward regression SDR estimators. The inverse regression family include pooled marginal estimators, projective resampling estimators, and distance-based estimators. Ordinary least squares, partial least squares, and semiparametric SDR estimators, on the other hand, are discussed as estimators from the forward regression family.
On consistency and sparsity for sliced inverse regression in high dimensions
The Annals of Statistics, 2018
We provide here a framework to analyze the phase transition phenomenon of slice inverse regression (SIR), a supervised dimension reduction technique introduced by Li [J. Amer. Statist. Assoc. 86 (1991) 316-342]. Under mild conditions, the asymptotic ratio ρ = lim p/n is the phase transition parameter and the SIR estimator is consistent if and only if ρ = 0. When dimension p is greater than n, we propose a diagonal thresholding screening SIR (DT-SIR) algorithm. This method provides us with an estimate of the eigenspace of var(E[x|y]), the covariance matrix of the conditional expectation. The desired dimension reduction space is then obtained by multiplying the inverse of the covariance matrix on the eigenspace. Under certain sparsity assumptions on both the covariance matrix of predictors and the loadings of the directions, we prove the consistency of DT-SIR in estimating the dimension reduction space in high-dimensional data analysis. Extensive numerical experiments demonstrate superior performances of the proposed method in comparison to its competitors.
Dimension estimation in sufficient dimension reduction: A unifying approach
Journal of Multivariate Analysis, 2011
Sufficient Dimension Reduction (SDR) in regression comprises the estimation of the dimension of the smallest (central) dimension reduction subspace and its basis elements. For SDR methods based on a kernel matrix, such as SIR and SAVE, the dimension estimation is equivalent to the estimation of the rank of a random matrix which is the sample based estimate of the kernel. A test for the rank of a random matrix amounts to testing how many of its eigen or singular values are equal to zero. We propose two tests based on the smallest eigen or singular values of the estimated matrix: an asymptotic weighted chi-square test and a Wald-type asymptotic chi-square test. We also provide an asymptotic chi-square test for assessing whether elements of the left singular vectors of the random matrix are zero. These methods together constitute a unified approach for all SDR methods based on a kernel matrix that covers estimation of the central subspace and its dimension, as well as assessment of variable contribution to the lower-dimensional predictor projections with variable selection, a special case. A small power simulation study shows that the proposed and existing tests, specific to each SDR method, perform similarly with respect to power and achievement of the nominal level. Also, the importance of the choice of the number of slices as a tuning parameter is further exhibited.
Matching Using Sufficient Dimension Reduction for Heterogeneity Causal Effect Estimation
arXiv (Cornell University), 2023
Causal inference plays an important role in understanding the underlying mechanisation of the data generation process across various domains. It is challenging to estimate the average causal effect and individual causal effects from observational data with high-dimensional covariates due to the curse of dimension and the problem of data sufficiency. The existing matching methods can not effectively estimate individual causal effect or solve the problem of dimension curse in causal inference. To address this challenge, in this work, we prove that the reduced set by sufficient dimension reduction (SDR) is a balance score for confounding adjustment. Under the theorem, we propose to use an SDR method to obtain a reduced representation set of the original covariates and then the reduced set is used for the matching method. In detail, a non-parametric model is used to learn such a reduced set and to avoid model specification errors. The experimental results on real-world datasets show that the proposed method outperforms the compared matching methods. Moreover, we conduct an experiment analysis and the results demonstrate that the reduced representation is enough to balance the imbalance between the treatment group and control group individuals.
A new sliced inverse regression method for multivariate response
Computational Statistics & Data Analysis, 2014
A semiparametric regression model of a q-dimensional multivariate response y on a p-dimensional covariate x is considered. A new approach is proposed based on sliced inverse regression (SIR) for estimating the effective dimension reduction (EDR) space without requiring a prespecified parametric model. The convergence at rate √ n of the estimated EDR space is shown. The choice of the dimension of the EDR space is discussed. Moreover, a way to cluster components of y related to the same EDR space is provided. Thus, the proposed multivariate SIR method can be used properly on each cluster instead of blindly applying it on all components of y. The numerical performances of multivariate SIR are illustrated on a simulation study. An application to the Minneapolis elementary schools data is also provided. Although the proposed methodology relies on SIR, it opens the door for new regression approaches with a multivariate response. They could be built similarly based on other reduction dimension methods.
On expectile-assisted inverse regression estimation for sufficient dimension reduction
Journal of Statistical Planning and Inference, 2021
Moment-based sufficient dimension reduction methods such as sliced inverse regression may not work well in the presence of heteroscedasticity. We propose to first estimate the expectiles through kernel expectile regression, and then carry out dimension reduction based on random projections of the regression expectiles. Several popular inverse regression methods in the literature are extended under this general framework. The proposed expectileassisted methods outperform existing moment-based dimension reduction methods in both numerical studies and an analysis of the Big Mac data.
Sufficient Dimension Reduction for Average Causal Effect Estimation
arXiv (Cornell University), 2020
Having a large number of covariates can have a negative impact on the quality of causal effect estimation since confounding adjustment becomes unreliable when the number of covariates is large relative to the samples available. Propensity score is a common way to deal with a large covariate set, but the accuracy of propensity score estimation (normally done by logistic regression) is also challenged by large number of covariates. In this paper, we prove that a large covariate set can be reduced to a lower dimensional representation which captures the complete information for adjustment in causal effect estimation. The theoretical result enables effective data-driven algorithms for causal effect estimation. We develop an algorithm which employs a supervised kernel dimension reduction method to search for a lower dimensional representation for the original covariates, and then utilizes nearest neighbor matching in the reduced covariate space to impute the counterfactual outcomes to avoid large-sized covariate set problem. The proposed algorithm is evaluated on two semi-synthetic and three real-world datasets and the results have demonstrated the effectiveness of the algorithm.
Estimating the structural dimension of regressions via parametric inverse regression
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2001
A new estimation method for the dimension of a regression at the outset of an analysis is proposed. A linear subspace spanned by projections of the regressor vector X, which contains part or all of the modelling information for the regression of a vector Y on X, and its dimension are estimated via the means of parametric inverse regression. Smooth parametric curves are ®tted to the p inverse regressions via a multivariate linear model. No restrictions are placed on the distribution of the regressors. The estimate of the dimension of the regression is based on optimal estimation procedures. A simulation study shows the method to be more powerful than sliced inverse regression in some situations.
Conditional variance estimator for sufficient dimension reduction
Bernoulli
Conditional Variance Estimation (CVE) is a novel sufficient dimension reduction (SDR) method for additive error regressions with continuous predictors and link function. It operates under the assumption that the predictors can be replaced by a lower dimensional projection without loss of information. In contrast to the majority of moment based sufficient dimension reduction methods, Conditional Variance Estimation is fully data driven, does not require the restrictive linearity and constant variance conditions, and is not based on inverse regression. CVE is shown to be consistent and its objective function to be uniformly convergent. CVE outperforms the mean average variance estimation, (MAVE), its main competitor, in several simulation settings, remains on par under others, while it always outperforms the usual inverse regression based linear SDR methods, such as Sliced Inverse Regression.