Working Paper
Inference with Arbitrary Clustering
Fabrizio Colella, Rafael Lalive, Seyhun Orcan Sakalli, Mathias Thoenig
IZA Discussion Paper n. 12584 - [pdf, latest version]
Abstract
Analyses of spatial or network data are now very common. Nevertheless, statistical inference is challenging since unobserved heterogeneity can be correlated across neighboring observational units. We develop an estimator for the variance-covariance matrix (VCV) of OLS and 2SLS that allows for arbitrary dependence of the errors across observations in space or network structure and across time periods. As a proof of concept, we conduct Monte Carlo simulations in a geospatial setting based on U.S. metropolitan areas. Tests based on our estimator of the VCV asymptotically correctly reject the null hypothesis, whereas conventional inference methods, e.g., those without clusters or with clusters based on administrative units, reject the null hypothesis too often. We also provide simulations in a network setting based on the IDEAS structure of coauthorship and real-life data on scientific performance. The Monte Carlo results again show that our estimator yields inference at the correct significance level even in moderately sized samples and that it dominates other commonly used approaches to inference in networks. We provide guidance to the applied researcher with respect to (i) whether or not to include potentially correlated regressors and (ii) the choice of cluster bandwidth. Finally, we provide a companion statistical package (acreg) enabling users to adjust the OLS and 2SLS coefficient’s standard errors to account for arbitrary dependence.
Fabrizio Colella, Rafael Lalive, Seyhun Orcan Sakalli, Mathias Thoenig
IZA Discussion Paper n. 12584 - [pdf, latest version]
Abstract
Analyses of spatial or network data are now very common. Nevertheless, statistical inference is challenging since unobserved heterogeneity can be correlated across neighboring observational units. We develop an estimator for the variance-covariance matrix (VCV) of OLS and 2SLS that allows for arbitrary dependence of the errors across observations in space or network structure and across time periods. As a proof of concept, we conduct Monte Carlo simulations in a geospatial setting based on U.S. metropolitan areas. Tests based on our estimator of the VCV asymptotically correctly reject the null hypothesis, whereas conventional inference methods, e.g., those without clusters or with clusters based on administrative units, reject the null hypothesis too often. We also provide simulations in a network setting based on the IDEAS structure of coauthorship and real-life data on scientific performance. The Monte Carlo results again show that our estimator yields inference at the correct significance level even in moderately sized samples and that it dominates other commonly used approaches to inference in networks. We provide guidance to the applied researcher with respect to (i) whether or not to include potentially correlated regressors and (ii) the choice of cluster bandwidth. Finally, we provide a companion statistical package (acreg) enabling users to adjust the OLS and 2SLS coefficient’s standard errors to account for arbitrary dependence.
Stata Package
To install the package, please type the following in the Stata command window:
ssc install acreg
Alternatively, you can download the package form this website by typing:
net install acreg, from(https://acregstata.weebly.com/uploads/2/9/1/6/29167217) replace
ssc install acreg
Alternatively, you can download the package form this website by typing:
net install acreg, from(https://acregstata.weebly.com/uploads/2/9/1/6/29167217) replace
To erase the package please type the following in the Stata command window:
findfile acreg.ado
erase `r(fn)'
findfile acreg.sthlp
erase `r(fn)'
findfile acregpackcheck.ado
erase `r(fn)'
findfile acreg.ado
erase `r(fn)'
findfile acreg.sthlp
erase `r(fn)'
findfile acregpackcheck.ado
erase `r(fn)'
F. A. Q.
Examples
If you use the acreg package, please cite the following:
Colella, Fabrizio; Lalive, Rafael; Sakalli, Seyhun Orcan; Thoenig, Mathias. (2019) Inference with Arbitrary Clustering, IZA Discussion Paper n. 12584
Colella, Fabrizio; Lalive, Rafael; Sakalli, Seyhun Orcan; Thoenig, Mathias. (2019) Inference with Arbitrary Clustering, IZA Discussion Paper n. 12584