Code: Group clustering with panel estimations
Clustering
N - Number of observations; M - Number of cluster; K - Number of regressors (rank)
Clustering Fixed Effects (FE) estimations
\[\frac{(N-K)}{(N-K)-(M-1)} \cdot (X'X)^{-1} \, \sum_{j=1}^M u_{M}'u_{M} \, (X'X)^{-1} \cdot \frac{M}{M-1} \frac{N-1}{N-K}\]where $u = X_{j} e_{j}$.
Clustering First Difference (FD) estimations
\[(X'X)^{-1} \, \sum_{j=1}^M u_{M}'u_{M} \, (X'X)^{-1} \cdot \frac{M}{M-1} \frac{N-1}{N-K}\]where $u = X_{j} e_{j}$.
Greate data frame with a group structure
# packages
library(dplyr)
library(lfe)
# data structure
N<-100 #number of obeservations
Y<-8 #number of years
G<-20 #number of groups
d <- data.frame(
id = rep(1:N, each = Y), # individual id
year = rep(1:Y, N) + 2000,
gid = rep(1:G, each = Y * (N/G)) #group id
)
# covariates
d$x1 <- rnorm(N * Y, 0, 1)
d$x2 <- rnorm(N * Y, 0, 1)
# error term
d$e <- rnorm(N * Y, 0, 12)
# group treatment
g<-unique(d[,c("gid","year")])
g$treat<-sample(x=c(0,1),size=G * Y,replace=T)
d<-merge(d,g,by=c("gid","year"),all.x=T,sort=F)
# coefficients and outcomes
coef.x1 <- 1
coef.x2 <- 2
coef.treat <- 3
d$y<-coef.x1*d$x1 + coef.x2*d$x2 + coef.treat*d$treat + d$e
# first differences
a<-lapply(
split(d,d$id),
function(s){
# s<-split(d,d$id)[[1]]
n<-c("y","x1","x2","treat")
for (v in n){
# print(v)
s[,paste0("d.",v)]<-s[,v]-dplyr::lag(s[,v])
}
return(s)
}
)
d<-do.call("rbind",a)
Fixed effects model
f<-"y ~ treat + x1 + x2 | id + year | 0 | gid"
f<-as.formula(f)
e<-felm(formula=f,data=d)
summary(e)
##
## Call:
## felm(formula = f, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.504 -7.736 -0.216 7.756 34.279
##
## Coefficients:
## Estimate Cluster s.e. t value Pr(>|t|)
## treat 3.9349 0.6036 6.519 3.03e-06 ***
## x1 1.3643 0.3622 3.766 0.00131 **
## x2 2.0491 0.4017 5.101 6.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.06 on 690 degrees of freedom
## Multiple R-squared(full model): 0.1785 Adjusted R-squared: 0.04869
## Multiple R-squared(proj model): 0.06721 Adjusted R-squared: -0.08015
## F-statistic(full model, *iid*):1.375 on 109 and 690 DF, p-value: 0.01061
## F-statistic(proj model): 33.14 on 3 and 19 DF, p-value: 9.426e-08
First difference model
f<-"d.y ~ d.treat + d.x1 + d.x2 | year | 0 | gid"
f<-as.formula(f)
e<-felm(formula=f,data=d)
summary(e)
##
## Call:
## felm(formula = f, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.121 -11.719 -0.356 11.053 62.594
##
## Coefficients:
## Estimate Cluster s.e. t value Pr(>|t|)
## d.treat 4.5580 0.8220 5.545 4.19e-08 ***
## d.x1 1.2221 0.3635 3.362 0.000816 ***
## d.x2 2.0939 0.6486 3.228 0.001304 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.21 on 690 degrees of freedom
## (100 observations deleted due to missingness)
## Multiple R-squared(full model): 0.08241 Adjusted R-squared: 0.07044
## Multiple R-squared(proj model): 0.0779 Adjusted R-squared: 0.06587
## F-statistic(full model, *iid*):6.885 on 9 and 690 DF, p-value: 1.573e-09
## F-statistic(proj model): 18.18 on 3 and 19 DF, p-value: 8.244e-06
Links and References
Angrist, J. D. & Pischke, J.-S. Mostly harmless econometrics: An empiricist’s companion Princeton university press, 2009