Code: Group clustering with panel estimations

March 20, 2020

Clustering

N - Number of observations; M - Number of cluster; K - Number of regressors (rank)

Clustering Fixed Effects (FE) estimations

\[\frac{(N-K)}{(N-K)-(M-1)} \cdot (X'X)^{-1} \, \sum_{j=1}^M u_{M}'u_{M} \, (X'X)^{-1} \cdot \frac{M}{M-1} \frac{N-1}{N-K}\]

where $u = X_{j} e_{j}$.

Clustering First Difference (FD) estimations

\[(X'X)^{-1} \, \sum_{j=1}^M u_{M}'u_{M} \, (X'X)^{-1} \cdot \frac{M}{M-1} \frac{N-1}{N-K}\]

where $u = X_{j} e_{j}$.

Greate data frame with a group structure

# packages
library(dplyr)
library(lfe)
# data structure
N<-100 #number of obeservations
Y<-8 #number of years
G<-20 #number of groups
d <- data.frame(
  id = rep(1:N, each = Y), # individual id
  year = rep(1:Y, N) + 2000,
  gid = rep(1:G, each = Y * (N/G)) #group id
)
# covariates
d$x1 <- rnorm(N * Y, 0, 1)
d$x2 <- rnorm(N * Y, 0, 1)
# error term
d$e <- rnorm(N * Y, 0, 12)
# group treatment
g<-unique(d[,c("gid","year")])
g$treat<-sample(x=c(0,1),size=G * Y,replace=T)
d<-merge(d,g,by=c("gid","year"),all.x=T,sort=F)
# coefficients and outcomes
coef.x1 <- 1
coef.x2 <- 2
coef.treat <- 3
d$y<-coef.x1*d$x1 + coef.x2*d$x2 + coef.treat*d$treat +   d$e
# first differences
a<-lapply(
  split(d,d$id),
  function(s){
    # s<-split(d,d$id)[[1]]
    n<-c("y","x1","x2","treat")
    for (v in n){
      # print(v)
      s[,paste0("d.",v)]<-s[,v]-dplyr::lag(s[,v])
    }
    return(s)
  }
)
d<-do.call("rbind",a)

Fixed effects model

f<-"y ~ treat + x1 + x2  | id + year | 0 | gid"
f<-as.formula(f)
e<-felm(formula=f,data=d)
summary(e)

##
## Call:
##    felm(formula = f, data = d)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -35.504  -7.736  -0.216   7.756  34.279
##
## Coefficients:
##       Estimate Cluster s.e. t value Pr(>|t|)
## treat   3.9349       0.6036   6.519 3.03e-06 ***
## x1      1.3643       0.3622   3.766  0.00131 **
## x2      2.0491       0.4017   5.101 6.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.06 on 690 degrees of freedom
## Multiple R-squared(full model): 0.1785   Adjusted R-squared: 0.04869
## Multiple R-squared(proj model): 0.06721   Adjusted R-squared: -0.08015
## F-statistic(full model, *iid*):1.375 on 109 and 690 DF, p-value: 0.01061
## F-statistic(proj model): 33.14 on 3 and 19 DF, p-value: 9.426e-08

First difference model

f<-"d.y ~ d.treat + d.x1 + d.x2  | year | 0 | gid"
f<-as.formula(f)
e<-felm(formula=f,data=d)
summary(e)

##
## Call:
##    felm(formula = f, data = d)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -50.121 -11.719  -0.356  11.053  62.594
##
## Coefficients:
##         Estimate Cluster s.e. t value Pr(>|t|)
## d.treat   4.5580       0.8220   5.545 4.19e-08 ***
## d.x1      1.2221       0.3635   3.362 0.000816 ***
## d.x2      2.0939       0.6486   3.228 0.001304 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.21 on 690 degrees of freedom
##   (100 observations deleted due to missingness)
## Multiple R-squared(full model): 0.08241   Adjusted R-squared: 0.07044
## Multiple R-squared(proj model): 0.0779   Adjusted R-squared: 0.06587
## F-statistic(full model, *iid*):6.885 on 9 and 690 DF, p-value: 1.573e-09
## F-statistic(proj model): 18.18 on 3 and 19 DF, p-value: 8.244e-06

Links and References

Angrist, J. D. & Pischke, J.-S. Mostly harmless econometrics: An empiricist’s companion Princeton university press, 2009

Share on

Twitter Facebook LinkedIn

Elías Cisneros

Code: Group clustering with panel estimations

Clustering

Clustering Fixed Effects (FE) estimations

Clustering First Difference (FD) estimations

Greate data frame with a group structure

Fixed effects model

First difference model

Links and References

Share on

You may also enjoy

Shift-share

Corruption and deforestation

Media: Publicaion highlighted in Nature Climate Change

Presentation: EPG Online: Palm oil and the politics of deforestation