CRAN Task View: Econometrics (original) (raw)
Maintainer: | Achim Zeileis, Grant McDermott, Kevin Tappe |
---|---|
Contact: | Achim.Zeileis at R-project.org |
Version: | 2024-06-03 |
URL: | https://CRAN.R-project.org/view=Econometrics |
Source: | https://github.com/cran-task-views/Econometrics/ |
Contributions: | Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide. |
Citation: | Achim Zeileis, Grant McDermott, Kevin Tappe (2024). CRAN Task View: Econometrics. Version 2024-06-03. URL https://CRAN.R-project.org/view=Econometrics. |
Installation: | The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("Econometrics", coreOnly = TRUE) installs all the core packages or ctv::update.views("Econometrics") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details. |
Base R ships with a lot of functionality useful for (computational) econometrics, in particular in the stats package. This functionality is complemented by many packages on CRAN, a brief overview is given below. There is also a certain overlap between the tools for econometrics in this view and those in the task views on Finance, TimeSeries, and CausalInference.
The packages in this view can be roughly structured into the following topics. If you think that some package is missing from the list, please file an issue in the GitHub repository or contact the maintainer.
Basic linear regression
- Estimation and standard inference: Ordinary least squares (OLS) estimation for linear models is provided by
lm()
(from stats) and standard tests for model comparisons are available in various methods such assummary()
andanova()
. - Further inference and nested model comparisons: Functions analogous to the basic
summary()
andanova()
methods that also support asymptotic tests (z instead of t tests, and Chi-squared instead of F tests) and plug-in of other covariance matrices arecoeftest()
andwaldtest()
in lmtest. (Non)linear hypothesis testing for a wide range of R packages can implemented through thedeltamethod()
function of marginaleffects. This expands on older (non)linear hypothesis test functions likelinearHypothesis()
anddeltaMethod()
from car. - Robust standard errors: HC, HAC, clustered, and bootstrap covariance matrices are available in sandwich and can be plugged into the inference functions mentioned above.
- Nonnested model comparisons: Various tests for comparing non-nested linear models are available in lmtest (encompassing test, J test, Cox test). The Vuong test for comparing other non-nested models is provided by nonnest2 (and specifically for count data regression in pscl).
- Diagnostic checking: The packages car and lmtest provide a large collection of regression diagnostics and diagnostic tests.
- Miscellaneous: Much of the above functionality is bundled together in fixest, which provides a number of in-built convenience features that users may find attractive. This includes robust standard error specification, multi-model estimation, custom hypothesis testing, etc.
Microeconometrics
- Generalized linear models (GLMs): Many standard microeconometric models belong to the family of generalized linear models and can be fitted by
glm()
from package stats. This includes in particular logit and probit models for modeling choice data and Poisson models for count data. - Effects and marginal effects: Effects for typical values of regressors in GLMs and various other probabilistic regression models can be obtained and visualized using effects. Marginal effect tables and corresponding visualizations for a wide range of models can be be produced with marginaleffects. Other implementations of marginal effects for certain models are in margins and mfx. Interactive visualizations of both effects and marginal effects are possible in LinRegInteractive.
- Binary responses: The standard logit and probit models (among many others) for binary responses are GLMs that can be estimated by
glm()
withfamily = binomial
. Bias-reduced GLMs that are robust to complete and quasi-complete separation are provided by brglm. Discrete choice models estimated by simulated maximum likelihood are implemented in Rchoice. bife provides binary choice models with fixed effects. Heteroscedastic probit models (and other heteroscedastic GLMs) are implemented in glmx along with parametric link functions and goodness-of-link tests for GLMs. - Count responses: The basic Poisson regression is a GLM that can be estimated by
glm()
withfamily = poisson
as explained above. Negative binomial GLMs are available viaglm.nb()
in package MASS. Another implementation of negative binomial models is provided by aod, which also contains other models for overdispersed data. Zero-inflated and hurdle count models are provided in package pscl. A reimplementation by the same authors is currently under development in countreg on R-Forge which also encompasses separate functions for zero-truncated regression, finite mixture models etc. - Multinomial responses: Multinomial models with individual-specific covariates only are available in
multinom()
from package nnet. An implementation with both individual- and choice-specific variables is mlogit. Generalized multinomial logit models (e.g., with random effects etc.) are in gmnl. A flexible framework of various customizable choice models (including multinomial logit and nested logit among many others) is implemented in the apollo package. The newer logitr package combines many of the features from these preceding packages and also offers some meaningful performance improvements for fast estimation of multinomial and mixed logit models. Simulated maximum likelihood estimation of mixed logit models, especially for large data sets, is available in mixl. Generalized additive models (GAMs) for multinomial responses can be fitted with the VGAM package. A Bayesian approach to multinomial probit models is provided by MNP. Various Bayesian multinomial models (including logit and probit) are available in bayesm. The package RSGHB fits various hierarchical Bayesian specifications based on direct specification of the likelihood function. Furthermore, the RprobitB package implements latent class mixed multinomial probit models for approximations of the true underlying mixing distribution. - Ordered responses: Proportional-odds regression for ordered responses is implemented in
polr()
from package MASS. The package ordinal provides cumulative link models for ordered data which encompasses proportional odds models but also includes more general specifications. Bayesian ordered probit models are provided by bayesm and RprobitB. - Censored responses: Basic censored regression models (e.g., tobit models) can be fitted by
survreg()
in survival, a convenience interfacetobit()
is in package AER. Further censored regression models, including models for panel data, are provided in censReg. Censored regression models with conditional heteroscedasticity are in crch. Furthermore, hurdle models for left-censored data at zero can be estimated with mhurdle. Models for sample selection are available in sampleSelection and ssmrob using classical and robust inference, respectively. Package matchingMarkets corrects for selection bias when the sample is the result of a stable matching process (e.g., a group formation or college admissions problem). - Truncated responses: crch for truncated (and potentially heteroscedastic) Gaussian, logistic, and t responses. Homoscedastic Gaussian responses are also available in truncreg.
- Fraction and proportion responses: Beta regression for responses in (0, 1) is in betareg and gamlss.
- Duration responses: Many classical duration models can be fitted with survival, e.g., Cox proportional hazard models with
coxph()
or Weibull models withsurvreg()
. Many more refined models can be found in the Survival task view. - High-dimensional fixed effects: Linear and generalized linear models with potentially high-dimensional fixed effects, also for multiple groups, can be fitted with fixest, using optimized parallel C++ code. Other implementations of high-dimensional fixed effects are in lfe and alpaca for linear and generalized linear models, respectively.
- Miscellaneous: Further more refined tools for microeconometrics are provided in the micEcon family of packages: Analysis with Cobb-Douglas, translog, and quadratic functions is in micEcon; the constant elasticity of scale (CES) function is in micEconCES; the symmetric normalized quadratic profit (SNQP) function is in micEconSNQP. The almost ideal demand system (AIDS) is in micEconAids. Stochastic frontier analysis (SFA) is in frontier. Semiparametric SFA in is available in semsfa and spatial SFA in ssfa. The package bayesm implements a Bayesian approach to microeconometrics and marketing. Inference for relative distributions is contained in package reldist.
Common research designs for causal inference
We review packages related to some common research designs for causal inference below. This section is necessarily brief and should be paired with the CausalInference task view, since is there a high degree of overlap.
Difference-in-differences and synthetic control
- Basic difference-in-differences (DiD): The canonical 2x2 DiD model (two units, two periods) can be estimated as a simple interaction between two factor variables in
lm()
orglm()
, etc. Similarly, the equivalent two-way fixed effects (TWFE) design can be obtained using factors to control for unit and time fixed effects. However, for high-dimensional datasets TWFE is more conveniently estimated using a dedicated panel data package like fixest or plm. The former even provides a conveniencei()
operator for constructing and interacting factors in TWFE settings. - Advanced DiD and TWFE corrections: Despite its long-standing popularity, recent research has uncovered various problems with (naive) TWFE; for example, severe bias in the presence of staggered treatment rollout. A cottage industry of workarounds and alternative estimators now exists to address these problems. R package implementations include: bacondecomp, did, did2s, DRDID, etwfe, fixest (via the
sunab()
function), and gsynth. - Synthetic control: The original synthetic control (SC) implementation is available through Synth, while tidysynth offers a newer SC implementation with various enhancements (speed, inspection, etc.) Similarly, gsynth generalizes the original SC implementation to multiple treated units and variable treatment periods, and also supports additional estimation methods like the EM algorithm and matrix completion.
Instrumental variables
- Basic instrumental variables (IV) regression: Two-stage least squares (2SLS) is provided by ivreg, which separates out the dedicated 2SLS routines previously found in AER). Another implementation is available as
tsls()
in package sem. - Binary responses: The LARF package estimates local average response functions for binary treatments and binary instruments.
- Panel data: Several panel data model packages (see below) provide their own dedicated IV routines for efficient estimation in the presence of high-dimensional data. These include fixest and lfe for fixed effects, and plm for first-difference, between, and multiple random effects methods.
- Miscellaneous: REndo fits linear models with endogenous regressor using various latent instrumental variable approaches. SteinIV provides semi-parametric IV estimators, including JIVE and SPS.
Regression discontinuity design
- Regression discontinuity design (RDD) methods are implemented in rdrobust (offering robust confidence interval construction and bandwidth selection), rddensity (density discontinuity testing (also known as manipulation testing)), rdlocrand (inference under local randomization), and rdmulti (analysis with multiple cutoffs or scores).
- Tools to perform power, sample size and minimum detectable effects (MDE) calculations are available in rdpower, while RATest provides a collection of randomization tests, including a permutation test for the continuity assumption of the baseline covariates in the sharp RDD.
Panel data models
- Panel standard errors: A simple approach for panel data is to fit the pooling (or independence) model (e.g., via
lm()
orglm()
) and only correct the standard errors. Different types of clustered, panel, and panel-corrected standard errors are available in sandwich (incorporating prior work from multiwayvcov), clusterSEs, pcse, clubSandwich, plm, and geepack, respectively. The latter two require estimation of the pooling/independence models viaplm()
andgeeglm()
from the respective packages (which also provide other types of models, see below). - Linear panel models: fixest provides very efficient fixed-effect routines that scale to high-dimensional data and multiple fixed-effects. plm, providing a wide range of within, between, and random-effect methods (among others) along with corrected standard errors, tests, etc. Various dynamic panel models are available in plm, with estimation based on moment conditions in pdynmc, and dynamic panel models with fixed effects in OrthoPanels. feisr provides fixed effects individual slope (FEIS) models. Panel vector autoregressions are implemented in panelvar.
- GLMs and generalized estimation equations. The aformentioned fixest supports a variety of GLM-like models in addition to linear panel models. This includes efficient fixed-effect estimation of logit, probit, Poisson, and negative binomial models. Similar functionality is provided by alpaca (which also accounts for incidental parameter problems) and pglm. penppml further extends the high-dimensional case through penalized Poisson Pseudo Maximum Likelihood (PPML) regressions, using lasso or ridge penalties. GEE models for panel data (or longitudinal data in statistical jargon) are available in in geepack.
- Mixed effects models: Linear and nonlinear models for panel data (and more general multi-level data) are available in lme4 and nlme.
- Instrumental variables: fixest. See also above.
- Miscellaneous: Threshold regression and unit root tests are in pdR. The panel data approach method for program evaluation is available in pampe. Dedicated fast data preprocessing for panel data econometrics is provided by collapse.
Further regression models
- Nonlinear least squares modeling:
nls()
in package stats. - Quantile regression: quantreg (including linear, nonlinear, censored, locally polynomial and additive quantile regressions).
- Generalized method of moments (GMM) and generalized empirical likelihood (GEL): gmm.
- Spatial econometric models: The Spatial view gives details about handling spatial data, along with information about (regression) modeling. In particular, spatial regression models can be fitted using spatialreg and sphet (the latter using a GMM approach). splm is a package for spatial panel models. Spatial probit models are available in spatialprobit and spatial seemingly unrelated regression (SUR) models in spsur.
- Bayesian model averaging (BMA): A comprehensive toolbox for BMA is provided by BMS including flexible prior selection, sampling, etc. A different implementation is in BMA for linear models, generalizable linear models and survival models (Cox regression).
- Linear structural equation models: lavaan and sem. See also the Psychometrics task view for more details.
- Machine learning: There are several packages that combine machine learning techniques with econometric inference (especially for identifying causal effects). These include grf for causal random forests and estimation of heterogeneous treatment effects, DoubleML for double machine learning of a wide range of models from the mlr3 family, and hdm for selected high-dimensional econometric models. For a more general overview see the MachineLearning task view.
- Simultaneous equation estimation: systemfit.
- Nonparametric methods: np using kernel smoothing and NNS using partial moments.
- Linear and nonlinear mixed-effect models: nlme and lme4.
- Generalized additive models (GAMs): mgcv, gam, gamlss and VGAM.
- Design-based inference: estimatr contains fast procedures for several design-appropriate estimators with robust standard errors and confidence intervals including linear regression, instrumental variables regression, difference-in-means, among others.
- Extreme bounds analysis: ExtremeBounds.
- Miscellaneous: The packages VGAM, rms and Hmisc provide several tools for extended handling of (generalized) linear regression models.
Time series data and models
- The TimeSeries task view provides much more detailed information about both basic time series infrastructure and time series models. Here, only the most important aspects relating to econometrics are briefly mentioned. Time series models for financial econometrics (e.g., GARCH, stochastic volatility models, or stochastic differential equations, etc.) are described in the Finance task view.
- Infrastructure for regularly spaced time series: The class
"ts"
in package stats is R’s standard class for regularly spaced time series (especially annual, quarterly, and monthly data). It can be coerced back and forth without loss of information to"zooreg"
from package zoo. - Infrastructure for irregularly spaced time series: zoo provides infrastructure for both regularly and irregularly spaced time series (the latter via the class
"zoo"
) where the time information can be of arbitrary class. This includes daily series (typically with"Date"
time index) or intra-day series (e.g., with"POSIXct"
time index). An extension based on zoo geared towards time series with different kinds of time index is xts. Further packages aimed particularly at finance applications are discussed in the Finance task view. - Classical time series models: Simple autoregressive models can be estimated with
ar()
and ARIMA modeling and Box-Jenkins-type analysis can be carried out witharima()
(both in the stats package). An enhanced version ofarima()
is in forecast. - Linear regression models: A convenience interface to
lm()
for estimating OLS and 2SLS models based on time series data is dynlm. Linear regression models with AR error terms via GLS is possible usinggls()
from nlme. - Structural time series models: Standard models can be fitted with
StructTS()
in stats. Further packages are discussed in the TimeSeries task view. - Filtering and decomposition:
decompose()
andHoltWinters()
in stats. The basic function for computing filters (both rolling and autoregressive) isfilter()
in stats. Many extensions to these methods, in particular for forecasting and model selection, are provided in the forecast package. - Vector autoregression: Simple models can be fitted by
ar()
in stats, more elaborate models are provided in package vars along with suitable diagnostics, visualizations etc. Structural smooth transition vector autoregressive models are in sstvars and panel vector autoregressions in panelvar. - Unit root and cointegration tests: urca, tseries, CADFtest. See also pco for panel cointegration tests and plm for panel unit root tests.
- Miscellaneous:
- tsDyn - Threshold and smooth transition models.
- midasr - MIDAS regression and other econometric methods for mixed frequency time series data analysis.
- gets - GEneral-To-Specific (GETS) model selection for either ARX models with log-ARCH-X errors, or a log-ARCH-X model of the log variance.
- bimets - Econometric modeling of time series data using flexible specifications of simultaneous equation models.
- dlsem - Distributed-lag linear structural equation models.
- lpirfs - Local projections impulse response functions.
- apt - Asymmetric price transmission models.
Data sets
- Textbooks and journals: Packages AER, Ecdat, and wooldridge contain a comprehensive collections of data sets from various standard econometric textbooks (including Greene, Stock & Watson, Wooldridge, Baltagi, among others) as well as several data sets from the Journal of Applied Econometrics and the Journal of Business & Economic Statistics data archives. AER and wooldridge additionally provide extensive sets of examples reproducing analyses from the textbooks/papers, illustrating various econometric methods. In pder a wide collection of data sets for “Panel Data Econometrics with R” (Croissant & Millo 2018) is available. The PoEdata package on GitHub provides the data sets from “Principles of Econometrics” (4th ed, by Hill, Griffiths, and Lim 2011).
- Penn World Table: pwt provides versions 5.6, 6.x, 7.x. Version 8.x and 9.x data are available in pwt8 and pwt9, respectively.
- Time series and forecasting data: The packages expsmooth, fma, and Mcomp are data packages with time series data from the books “Forecasting with Exponential Smoothing: The State Space Approach” (Hyndman, Koehler, Ord, Snyder, 2008, Springer) and “Forecasting: Methods and Applications” (Makridakis, Wheelwright, Hyndman, 3rd ed., 1998, Wiley) and the M-competitions, respectively.
- Empirical Research in Economics: Package erer contains functions and datasets for the book of “Empirical Research in Economics: Growing up with R” (Sun 2015).
- Panel Study of Income Dynamics (PSID): psidR can build panel data sets from the Panel Study of Income Dynamics (PSID).
- World Bank data and statistics: The wbstats package provides programmatic access to the World Bank API.
Miscellaneous
- Model tables: A flexible implementation of side-by-side summary tables for a wide range of statistical models along with corresponding visualizations and data summary tables is provided in modelsummary. Other implementations as well as further utilities for integrating econometric and statistical results in scientific papers etc. are discussed in the ReproducibleResearch task view.
- Matrix manipulations: As a vector- and matrix-based language, base R ships with many powerful tools for doing matrix manipulations, which are complemented by the packages Matrix and SparseM.
- Optimization and mathematical programming: R and many of its contributed packages provide many specialized functions for solving particular optimization problems, e.g., in regression as discussed above. Further functionality for solving more general optimization problems, e.g., likelihood maximization, is discussed in the the Optimization task view.
- Bootstrap: In addition to the recommended boot package, there are some other general bootstrapping techniques available in bootstrap or simpleboot as well some bootstrap techniques designed for time-series data, such as the maximum entropy bootstrap in meboot or the
tsbootstrap()
from tseries. The fwildclusterboot (archived) package provides a fast wild cluster bootstrap implementation for linear regression models, especially when the number of clusters is low. - Inequality: For measuring inequality, concentration and poverty the package ineq provides some basic tools such as Lorenz curves, Pen’s parade, the Gini coefficient, Herfindahl-Hirschman index and many more. wINEQ provides these and other inequality measures for weighted data along with bootstrapping methods.
- Structural change: R is particularly strong when dealing with structural changes and changepoints in parametric models, see strucchange and segmented.
- Exchange rate regimes: Methods for inference about exchange rate regimes, in particular in a structural change setting, are provided by fxregime.
- Global value chains: Tools and decompositions for global value chains are in gvc and decompr.
- Regression discontinuity design: A variety of methods are provided in the rdd, rdrobust, and rdlocrand packages. The rdpower package offers power calculations for regression discontinuity designs. And rdmulti implements analysis with multiple cutoffs or scores.
- Gravity models: Estimation of log-log and multiplicative gravity models is available in gravity.
- z-Tree: zTree can import data from the z-Tree software for developing and carrying out economic experiments.
- Numerical standard errors: nse implements various numerical standard errors for time series data, especially in simulation experiments with correlated outcome sequences.
CRAN packages
Related links
- Articles: Special Volume on “Econometrics in R” in JSS (2008)
- Book: Applied Econometrics with R (Kleiber & Zeileis; 2008)
- Book: Introduction to Econometrics with R (Hanck, Arnold, Gerber, & Schmelzer; 2021)
- Book: Introduction to Econometrics with R (Oswald, Robin, & Viers; 2020)
- Book: Causal Inference: The Mixtape (Cunningham; 2021)
- Book: Hands-On Intermediate Econometrics Using R (Vinod; 2008)
- Book: Learning Microeconometrics with R (Adams; 2021)
- Book: Panel Data Econometrics with R (Croissant & Millo; 2018)
- Book: Principles of Econometrics with R (Colonescu; 2016)
- Book: Spatial Econometrics (Kelejian & Piras; 2017)
- Book: Statistical Inference via Data Science (Ismay & Kim; 2022)
- Book: The Effect (Huntington-Klein; 2022)
- Book: Using R for Introductory Econometrics (Heiss; 2019)
- Course: Applied Empirical Methods (Goldsmith-Pinkham; 2021)
- Course: Data Science for Economists (McDermott; 2021)
- Course: Econometrics In-Class Labs (Ransom; 2021)
- Course: Introduction to Econometrics (Rubin; 2021)
- Course: PhD Econometrics (Rubin; 2022)
- Course: Program Evaluation for Public Service (Heiss, 2022)
- Course: Statistical Rethinking (McElreath; 2022)
- Website: Stata2R
Other resources
- CRAN Task View: CausalInference
- CRAN Task View: Finance
- CRAN Task View: MachineLearning
- CRAN Task View: Optimization
- CRAN Task View: Psychometrics
- CRAN Task View: ReproducibleResearch
- CRAN Task View: Spatial
- CRAN Task View: Survival
- CRAN Task View: TimeSeries
- R-Forge Project: countreg
- GitHub Project: PoEdata