The Projected Covariance Measure for assumption-lean variable significance testing (original) (raw)

View PDF HTML (experimental)

Abstract:Testing the significance of a variable or group of variables XXX for predicting a response YYY, given additional covariates ZZZ, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for XXX is non-zero. However, when the model is misspecified, the test may have poor power, for example when XXX is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of YYY given XXX and ZZZ does not depend on XXX. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of YYY on XXX and ZZZ using one half of the data, and then to estimate the expected conditional covariance between this projection and YYY on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

Submission history

From: Anton Rask Lundborg [view email]
[v1] Thu, 3 Nov 2022 17:55:50 UTC (130 KB)
[v2] Fri, 1 Sep 2023 06:35:53 UTC (134 KB)
[v3] Sun, 17 Sep 2023 06:55:07 UTC (134 KB)
[v4] Tue, 7 May 2024 09:37:10 UTC (137 KB)