CESER: An R Package to Compute Cluster Estimated Standard Errors

Diogo Ferrari

(1) Overview

Introduction

A common problem in regression analysis that requires correction of the estimated standard errors of the regression coefficients is the correlation between the residuals in observations that share some observed grouping features. For instance, people that live in the same city, state, or country can display a more similar behaviour than people randomly sampled from different cities, states, or countries. The example extends for any data in which some observations have shared characteristics or belong to the same collective entity or institutional setting. For instance, people from the same school, patients from the same hospital, or groups of the same gender or race can behave more similarly than people across those groups. The within-group correlation can be caused by unobserved shared characteristics of the observations in the groups, such as some unobserved school-specific educational policies, or the unobserved patterns of behavior of doctors in different hospitals.

Non-zero within-group correlations violate a common assumption of classical multivariate regression models, namely that the residuals are independent, or simply uncorrelated. If one mistakenly assumes the residuals are independent/uncorrelated, the estimated standard errors of the regression coefficients will be biased downward, which leads to smaller estimated confidence intervals, and therefore higher chances to reject the hypothesis that the coefficients are null. It can misguide researchers and lead them to be overconfident that their working hypothesis of non-zero effect is true. We can see that easily with a simple example.

Suppose we estimate the following population regression model:

y = X β + ε

M1 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ y = X\beta + \varepsilon \] \end{document}

where X ∈ (1, ℝ^k), β ∈ ℝ^{(k+1) × 1}, y ∈ ℝ, and the last element is the error (or deviance) term ɛ ∈ ℝ. We collect i = 1, …, n observations to estimate β, which gives the statistical equation for each i with the following residuals e:

y i = X i β + e i .

M2 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {y_i} = {X_i}\beta {\rm{ }} + {e_i}. \] \end{document}

We usually take X as given (measured without error) and use the OLS estimator $β^$ M25 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} of β, which is obtained by finding the argument that minimizes the square residuals (e) between observed outcome (y) and the outcome if no error had occurred (Xβ):

β^= arg min β (e T e) = arg min β (y − X β) T (y − X β) .

M3 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \beta} = \displaystyle{\mathop{\rm argmin}_{\beta}}({e^T}e) = \displaystyle{\mathop{\rm argmin}_{\beta}} {(y - X\beta )^T}(y - X\beta). \] \end{document}

Assuming X^TX is invertible, the first order condition gives the solution for that optimization problem:

β^= (X T X) − 1 X T y .

M4 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta = {({X^T}X)^{ - 1}}{X^T}y. \] \end{document}

Up to this point, if we were simply computing an OLS point estimate of β using $β^$ M40 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} , no assumptions would be needed about the distribution of the residuals (e_i). We impose assumptions about the distribution of e to go one step further and make inferences about $β^$ M41 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} and investigate its statistical properties. The distribution of our estimator $β^$ M42 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} , and therefore our inferences, comes from the assumptions about the distribution of e. Denote that distribution generically by f(e | θ), that is:

e ~ f e | θ .

M5 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ e {\sim} f\left({e\vert\theta} \right). \] \end{document}

We can easily derive the first and second moments of $β^$ M43 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} :

β^= X T X − 1 X T y = X T X − 1 X T X β + e = β + X T X − 1 X T e

M6 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta = {\left( {{X^T}X} \right)^{ - 1}}{X^T}y = {\rm{ }}{\left( {{X^T}X} \right)^{ - 1}}{X^T}\left( {X\beta + {\rm{ }}e} \right){\rm{ }} = {\rm{ }}\beta + {\left( {{X^T}X} \right)^{ - 1}}{X^T}e \] \end{document}

which gives:

(1)

μ β = E [β^| X, θ] = β + (X T X) − 1 X T E [e | θ]

M7 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\mu _\beta } = {\mathbb E}[\hat \beta |X,\theta ] = \beta + {({X^T}X)^{ - 1}}{X^T}{\mathbb E}[e|\theta ] \] \end{document}

and

(2)

Σ β^= V ar [β^| X, θ] = (X T X) − 1 X T V ar [e | θ] X (X T X) − 1 .

M8 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\Sigma _{\hat \beta }} = {\mathbb V}{\rm{ar}}[\hat \beta {\rm{| }}X,\theta ] = {({X^T}X)^{ - 1}}{X^T}{\rm{\mathbb V}{ar}}[e{\rm{|}}\,\,\theta ]X{({X^T}X)^{ - 1}}. \] \end{document}

Assumptions about f(e | θ) will give the small sample properties of the estimator $β^$ M44 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} . The classical assumption is that all residuals e comes from the same normal distribution with mean zero, and that they are uncorrelated. That is:

(3)

e ~ N (0, σ 2 I)

M9 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ e{\rm{ {\sim} }}{\cal N}(0,{\sigma ^2}I) \] \end{document}

If we assume that 𝔼[e | θ] = 0, as in the expression (3), then $β^$ M45 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} is unbiased (𝔼[ $β^$ M46 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} | X, θ] = β), and its standard error is simply:

(4)

s e (β^) = (X T X) − 1 σ^2

M10 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ se(\hat \beta ) = \sqrt[{}]{{{{({X^T}X)}^{ - 1}}{{\hat \sigma }^2}}} \] \end{document}

with the estimated variance of e given by []:

σ^2 = (y − X β^) T (y − X β^) n − (K + 1) .

M11 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \sigma ^2} = \frac{{{{(y - X\hat \beta )}^T}(y - X\hat \beta )}}{{n - (K + 1)}}. \] \end{document}

Equation (4) provides the exact confidence interval for $β^$ M47 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} :

(5)

C I [β^] = (β^− t * se (β^), β^+ t * se (β^)) .

M12 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ CI[\hat \beta ] = (\hat \beta - t*{\rm{se}}(\hat \beta ),\hat \beta + t*{\rm{se}}(\hat \beta )). \] \end{document}

In the expression (5), the value of t comes from a t-distribution and it is given by:

p (T < | t |) = 1 − α .

M13 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ p(T < {\rm{|}}t|) = 1 - \alpha. \] \end{document}

The common practice is to choose α = 0.05, which gives the 95% confidence interval of $β^$ M48 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} .

The standard output of the lm() function to estimate linear models in R assumes the zero-mean normal distribution with uncorrelated residuals, which gives the estimated standard errors shown in equation (4) above [, , ].

The clustering problem emerges in grouped data. Consider that each observation i belongs to a group g that there are G groups in the data; and that the error terms, (e), for individual observations in the same group are correlated. Following the examples above, let us say that multiple observations come from the same schools, hospitals, or countries. It is likely that the assumption of independence of the residuals is violated because individuals of the same group probably share some unobserved characteristics that affect their behavior, which creates a non-zero correlation between the residuals within the observed groups. Then, keeping all the other assumptions of the classical regression model, the distribution of the disturbances can be more generally denoted by:

e ~ N (0, Σ) .

M14 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ e{\rm{ {\sim} }}{\cal N}(0,\Sigma ). \] \end{document}

In this case the standard errors of $β^$ M49 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} under the assumption of independence or zero correlation of the residuals (se( $β^$ M50 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} )) differ from the standard errors computed when the within-group correlations are taken into account (se_g( $β^$ M51 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} )):

s e (β^) = (X T X) − 1 σ^2 ≠ (X T X) − 1 (X T Σ^X) (X T X) − 1 = s e g (β^)

M15 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ se(\hat \beta ) = \sqrt[{}]{{{{({X^T}X)}^{ - 1}}{{\hat \sigma }^2}}} \ne \sqrt[{}]{{{{({X^T}X)}^{ - 1}}({X^T}\hat \Sigma X){{({X^T}X)}^{ - 1}}}} = s{e_g}(\hat \beta ) \] \end{document}

Typically, se( $β^$ M52 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} ) < se_g( $β^$ M53 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} ). It means that assuming uncorrelated residuals produces confidence intervals of $β^$ M54 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} that are smaller than the true ones, and that the researcher will be overconfident about the range of values of the linear coefficients that seem consistent with the data.

There are some approaches to deal with that problem. One is to adjust the confidence intervals. Imbens and Kolesar [] adjust the number of the degree of freedom of the t-distribution, producing larger values of t used to construct the confidence intervals. Another approach uses bootstrap methods [, , , ] (see also []). For lack of space, below we review briefly only two other approaches, the Cluster Robust Standard Errors (CRSE), which is widely-used by practitioners, and the Cluster Estimated Standard Errors (CESE) proposed by Jackson [], whose implementation in R is originally presented in this paper, alongside an applied example and a brief discussion of cases in which one of these two methods, CRSE or CESE, may be preferred.

Clustered Standard Errors Corrections

Cluster Robust Standard Errors (CRSE)

The CRSE is the routine solution used by researchers to deal with the estimation of clustered standard errors in grouped data [, , , ]. If the individual-level observations are divided into groups g (e.g., schools, countries, etc.), and g = 1, …, G, we can rewrite the estimated variance of $β^$ M55 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} in equation (2) as:

(6)

Σ^β = (X T X) − 1 ∑ g = 1 G X g T Σ^g X g (X T X) − 1

M16 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \Sigma _\beta } = {({X^T}X)^{ - 1}}\left[ {\sum\limits_{g = 1}^G {X_g^T} {{\hat \Sigma }_g}{X_g}} \right]{({X^T}X)^{ - 1}} \] \end{document}

The key problem is how to estimate $Σ^g$ M38 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \Sigma _g} \] \end{document} , the variance-covariance matrix of the residuals for group g. The CRSE solution is to use the raw estimated residuals from the OLS estimates of β, and compute $Σ^g$ M59 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \Sigma _g} \] \end{document} using y_g and X_g, the output variable and the covariates, respectively, of observations in group g. It gives the CRSE estimator $Σ^g CRSE$ M27 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \Sigma _g^{{\rm{CRSE}}} \] \end{document} as follows:

Σ^g CRSE = (y g − X g β^) (y g − X g β^) T = e g e g T

M17 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \Sigma _g^{{\rm{CRSE}}} = ({y_g} - {X_g}\hat \beta ){({y_g} - {X_g}\hat \beta )^T} = {e_g}e_g^T \] \end{document}

In practice, to compute the CRSE we don’t need to estimate Σ_g. We just need to compute the covariance matrix of the scores $s^g = X g T e g$ M37 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat s_g} = X_g^T{e_g} \] \end{document} for each group g, and use $X g T Σ^g X g = X g T e g e g T X g$ M26 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ X_g^T{\hat \Sigma _g}{X_g} = X_g^T{e_g}e_g^T{X_g} \] \end{document} . The R package sandwich provides some functions to estimate clustered standard errors using the CRSE solution [], and the package clubSandwich provides many other functionalities, including some to improve performance with small samples [].

Djogbenou et al. [] demonstrate the asymptotic validity under general conditions for the CRSE solution. Some limits include poor reliability of the estimated errors if the number of clusters is small and sensitivity both to heterogeneity across clusters and variability of cluster sizes. Djogbenou et al. [] provide an extensive treatment of the topic. The CRSE can be biased downward for small samples and possibly for large samples as well and seriously underestimate the true standard errors in many cases [, , ]. Jackson [] also shows other conditions that lead the $Σ^g CRSE$ M60 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \Sigma _g^{{\rm{CRSE}}} \] \end{document} to provide values that underestimate the true $Σ β$ M61 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\Sigma _\beta } \] \end{document} , and therefore the confidence intervals of the regression coefficients. The author proposes an alternative approach to estimate Σ_g called CESE, which I discuss next.

Cluster Estimated Standard Errors (CESE)

Jackson [] proposes an approach labeled CESE to estimate the standard errors in grouped data with within-group correlation in the residuals. The approach is based on the estimated expectation of the product of the residuals. Assuming that the residuals have the same variance-covariance matrix within the groups, if we denote by σ_ig = $σ g 2$ M29 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \sigma _g^2 \] \end{document} and ρ_ig = ρ_g the variance and the covariance, respectively, of the residuals within the group g, then the expectation of the product of the residuals is given by (see [] for details):

(7)

Σ g = E [e g e g T] = σ g 2 (I g − P g) + ρ g [ι g ι g T − (I g − P g) − (P g ι g ι g T + ι g ι g T P g) + X g (X T X) − 1 ∑ g = 1 G X g T ι g ι g T X g (X T X) − 1 X g]

M18 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \begin{array}{l} {\Sigma _g} = {\mathbb E}[{e_g}e_g^T] = \sigma _g^2({I_g} - {P_g}) + {\rho _g}[{\iota _g}\iota _g^T - ({I_g} - {P_g}) - ({P_g}{\iota _g}\iota _g^T + {\iota _g}\iota _g^T{P_g})\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, + {X_g}{({X^T}X)^{ - 1}}\left( {\sum\limits_{g = 1}^G {X_g^T} {\iota _g}\iota _g^T{X_g}} \right){({X^T}X)^{ - 1}}{X_g}] \end{array} \] \end{document}

where ι_g is a unitary column vector, I_g is a g × g identity matrix, and $P g = X g X T X − 1 X g T$ M39 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {P_g} = {X_g}{\left( {{X^T}X} \right)^{ - 1}}X_g^T \] \end{document} . Equation (7) can be rewriten concisely as:

(8)

Σ g = σ g 2 Q 1 g + ρ g Q 2 g .

M19 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \Sigma _g^{} = \sigma _g^2{Q_{1g}} + {\rho _g}{Q_{2g}}. \] \end{document}

The equation above explicitly shows that the expectation of the cross-product of the residuals is a function the data through Q₁_g and Q₂_g and the unknown variance $σ g 2$ M62 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \sigma _g^2 \] \end{document} and correlation ρ_g of the residuals e_g in each group g. The CESE solution is to explore the linear structure of equation (8) and to estimate $σ g 2$ M63 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \sigma _g^2 \] \end{document} and ρ_g as if the estimated values of $e g e g T$ M30 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {e_g}e_g^T \] \end{document} were random deviances from their expectations. Denote ξ that deviance. Then

(9)

e g e g T = E [e g e g T] + ξ = σ g 2 Q 1 g + ρ g Q 2 g + ξ = Σ g + ξ .

M20 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \begin{array}{*{20}{l}} {{e_g}e_g^T}&{ = {\mathbb E}[{e_g}e_g^T] + \xi }\\ {}&{ = \sigma _g^2{Q_{1g}} + {\rho _g}{Q_{2g}} + \xi }\\ {}&{ = \Sigma _g^{} + \xi .} \end{array} \] \end{document}

The estimates of $σ g 2$ M64 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \sigma _g^2 \] \end{document} and ρ_g are obtained using the OLS estimator. That is, if we denote $Ω g = (σ g 2, ρ g) T$ M31 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\Omega _g} = {(\sigma _g^2,{\rho _g})^T} \] \end{document} , q_1g (or q_2g) the vectorized diagonal and lower triangle of Q_1g (or Q_2g) stacked into a n_g(n_g + 1)/2 column vector, q_g = [q_1g, q_2g], and s_eg the corresponding elements of $e g e g T$ M65 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {e_g}e_g^T \] \end{document} stacked into a column vector as well, then the OLS CESE estimator $Ω^g = (σ^g 2, ρ^g) T$ M32 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \Omega _g} = {(\hat \sigma _g^2,{\hat \rho _g})^T} \] \end{document} of the variance and correlation of the residuals in group g is given by

Ω^g = arg min Ω g (s e g − q g Ω g) T (s e g − q g Ω g) .

M21 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \Omega _g} = \displaystyle{\mathop{\rm argmin}_{\Omega _g}} {({s_{eg}} - {q_g}{\Omega _g})^T}({s_{eg}} - {q_g}{\Omega _g}). \] \end{document}

As pointed above for the OLS estimator of β, if we assume that $q g T q g$ M33 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ q_g^T{q_g} \] \end{document} is invertible, the first order condition gives:

(10)

Ω^g = (q g T q g) − 1 q g T s e g .

M22 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\hat \Omega _g} = {(q_g^T{q_g})^{ - 1}}q_g^T{s_{eg}}. \] \end{document}

We can rewrite the equation (10) as:

(11)

σ^g 2 ρ^g = q 1 g T q 1 g q 1 g T q 2 g q 2 g T q 1 g q 2 g T q 2 g − 1 q 1 g T s e g q 2 g T s e g .

M23 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \left[ {\begin{array}{*{20}{c}} {\hat \sigma _g^2}\\ {{{\hat \rho }_g}} \end{array}} \right] = {\left[ {\begin{array}{*{20}{c}} {q_{1g}^T{q_{1g}}}&{q_{1g}^T{q_{2g}}}\\ {q_{2g}^T{q_{1g}}}&{q_{2g}^T{q_{2g}}} \end{array}} \right]^{ - 1}}\left[ {\begin{array}{*{20}{c}} {q_{1g}^T{s_{eg}}}\\ {q_{2g}^T{s_{eg}}} \end{array}} \right]. \] \end{document}

As explained above for the OLS estimates of β, the estimators of $σ g 2$ M66 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \sigma _g^2 \] \end{document} and ρ_g do not require per se any assumption on ξ, unless we want to construct confidence intervals for the estimates of those parameters.

The CESE is attractive when its assumptions hold and the CRSE is believed to be unreliable. Jackson [] shows that CESE produces larger standard errors for the coefficients and much more conservative confidence intervals than the CRSE, which is known to be biased downward in the cases mentioned above. CESE is also less sensitive to the number of clusters and to the heterogeneity of the clusters, which can be a problem for both CRSE and bootstrap methods.

However, it is important to notice, that the CESE is not a replacement for the CRSE because these two methods are based on different parametric assumptions. The CESE requires some assumptions that can be considered stronger than the CRSE approach, as equations (7) to (11) indicate (see more details in []). Each approach may be better suited to different situations. One example is the CESE assumption that the residuals have the same variance-covariance matrix within the groups. For instance, if we cluster by geographic location, but individual data is observed at different points in time as in Bertrand et al. [], then the assumption of the same within-cluster residual variation is probably violated, and we would have to cluster the standard errors by time as well. Another example: When one uses fixed-effect models for the clusters, and the correlation of the residuals comes only from cluster-level effect, the cluster fixed effects explain all the variation in at the cluster-level, and the term ρ_g will be close to zero. In that case, CESE may be a less appealing alternative. However, when the limitations of the CRSE discussed above are a problem, the CESE is a better choice and produces more conservative standard errors.

I implemented CESE in R. It is available in the package named ceser. The next section presents some details of the implementation as well as an example ilustrating how to use the software in practice.

Implementation and architecture

Computing the CESE

The package ceser provides a function vcovCESE() that takes the output of the function lm() (or any other that produces compatible outputs) and computes the Cluster Estimated Standard Errors (CESE). The basic structure of the function is:


R> vcovCESE(mod, cluster = NULL, type=NULL)

The parameter mod receives the output of the lm() function. The parameter cluster can receive a right-hand side R formula with the summation of the variables in the data that will be used to cluster the standard errors. For instance, if one wants to cluster the standard errors by country, one can use:


R> vcovCESE(…, cluster = ~ country, …)

To cluster by country and gender, simply use (note that it means that each cluster contains observation for one gender and one country):


R> vcovCESE(…, cluster = ~ country + gender, …)

The parameter cluster can also receive, instead of a formula, a string vector with the name of the variables that contain the groups to cluster the standard errors. If cluster = NULL, each observation is considered its own group to cluster the standard errors.

The parameter type receives the procedure to use for heterokedasticity correction. Heterokedasticity occurs when the diagonal elements of Σ are not constant across observations. The correction can also be used to deal with underestimation of the true variance of the residuals due to leverage produced by outliers. The package includes five types of correction. In particular, type can be either “HC0”, “HC1”, “HC2”, “HC3”, and “HC4” []. Denote e_c the corrected residuals. Each option produce the following corretion:

HC0: e i c = e i HC1: e i c = e i n n − k HC2: e i c = e i 1 1 − h i i HC3: e i c = e i 1 1 − h i i HC4: e i c = e i 1 (1 − h i i) δ i

M24 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \begin{array}{*{20}{c}} {{\rm{HC0:}}}{{e_{ic}} = {e_i}}\\ {{\rm{HC1:}}}{{e_{ic}} = {e_i}\left( {\sqrt[{}]{{\frac{n}{{n - k}}}}} \right)}\\ {{\rm{HC2:}}}{{e_{ic}} = {e_i}\left( {\frac{1}{{\sqrt[{}]{{1 - {h_{ii}}}}}}} \right)}\\ {{\rm{HC3:}}}{{e_{ic}} = {e_i}\left( {\frac{1}{{1 - {h_{ii}}}}} \right)}\\ {{\rm{HC4:}}}{{e_{ic}} = {e_i}\left( {\frac{1}{{\sqrt[{}]{{{{(1 - {h_{ii}})}^{{\delta _i}}}}}}}} \right)} \end{array} \] \end{document}

where k is the number of covariates, h_ii is the i^th diagonal element of the matrix P = X(X^TX)^–1X^T), and $δ i = min (4, h i i n k)$ M34 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\delta _i} = \min (4,{h_{ii}}\frac{n}{k}) \] \end{document} .

The estimation also corrects for cases in which $ρ g > σ 2 g$ M35 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\rho _g} > {\sigma ^2}g \] \end{document} . Following Jackson [], we use $σ^g 2 = (ρ^g + 0.02)$ M36 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \sigma _g^2 = ({\hat \rho _g} + 0.02) \] \end{document} in those cases.

Example with application

In applied regression analyses, the practioner is usually interested in estimating the linear coefficients and their standard errors to evaluate if the confidence interval of the point estimates of the coefficients includes the null value. It means that two quantites of interest are $β^$ M56 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} and se( $β^$ M57 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} ).

In this section, we compare the standard output of the lm() function with the standard errors of the linear coefficients produced by the CRSE, as computed by the widely used R package sandwich [], and those produced by the ceser package, which contains my implementation of the CESE method proposed by Jackson []. As discussed in the previous section, in general the CESE should be more conservative, produce larger estimates of the standard errors, and result in wider confidence intervals.

To ilustrate how to use the ceser package, and to compare the three estimates of the standard errors (raw, CRSE, and CESE), we use the data set dcese provided with the ceser package. The data set was used in Jackson [] and comes from Elgie et al. []. It contains information of 310 (i = 1, …, 310) observations across 51 countries (g = 1, …, 51). The outcome variable is the number of effective legislative parties (enep). The explanatory variables are: the number of presidential candidates (enpc); a measure of presidential power (fapres); the proximity of presidential and legislative elections (proximity); the effective number of ethnic groups (eneg); the log of average district magnitudes (logmag); an interaction term between the number of presidential candidates and the presidential power (enpcfapres = enpc × fapres), and another interaction term between the log of the district magnitude and the number of ethnic groups (logmag_eneg = logmag × eneg). Elgie et al. [] present regression analyses showing a strong relationship between enpc and fapres, enpc, and their interaction. The effective number of legislative parties increases with the number of presidential candidates, but decreases with presidential power. The interactive term has a positive coefficient, implying the negative association between the number of legislative parties and presidential power attenuates as the number of candidates increases. They use a variety of standard errors corrections, including CRSE. We reproduce their study here, and include the estimation of the standard errors using CESE as in Jackson [].

Let us start with the functions that provide the variance covariance matrix of the estimated coefficients $β^$ M58 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \hat \beta \] \end{document} . For all the examples below, we use the HC3 correction. The Table 1 below uses also HC1 for comparison. Let us start by loading the package and the data:


R> library(ceser)
R> data(dcese)

Table 1

Comparing raw standard errors, CRSE, and CESE.


COVARIATE	STD. ERRORS

	ESTIMATE	RAW	CRSE_HC1	CRSE_HC3	CESE

(Intercept)	2.7043	0.5848	0.4886	0.6135	1.2641

enpc	0.3040	0.1889	0.2517	0.3050	0.3542

fapres	–0.6118	0.1654	0.2038	0.2727	0.3784

enpcfapres	0.2078	0.0604	0.0826	0.1039	0.1127

proximity	–0.0224	0.2786	0.2544	0.3208	0.3738

eneg	–0.0657	0.1479	0.1415	0.1659	0.1986

logmag	–0.1815	0.2463	0.4387	0.5229	0.4737

logmag_eneg	0.3605	0.1334	0.2883	0.3473	0.2463

COVARIATE	CONFIDENCE INTERVALS

	ESTIMATE	RAW	CRSE_HC1	CRSE_HC3	CESE

(Intercept)	2.7043	(1.558, 3.85)	(1.747, 3.662)	(1.502, 3.907)	(0.227, 5.182)

enpc	0.3040	(–0.066, 0.674)	(–0.189, 0.797)	(–0.294, 0.902)	(–0.39, 0.998)

fapres	–0.6118	(–0.936, –0.288)	(–1.011, –0.212)	(–1.146, –0.077)	(–1.354, 0.13)

enpcfapres	0.2078	(0.089, 0.326)	(0.046, 0.37)	(0.004, 0.411)	(–0.013, 0.429)

proximity	–0.0224	(–0.568, 0.524)	(–0.521, 0.476)	(–0.651, 0.606)	(–0.755, 0.71)

eneg	–0.0657	(–0.356, 0.224)	(–0.343, 0.212)	(–0.391, 0.259)	(–0.455, 0.324)

logmag	–0.1815	(–0.664, 0.301)	(–1.041, 0.678)	(–1.206, 0.843)	(–1.11, 0.747)

logmag_eneg	0.3605	(0.099, 0.622)	(–0.205, 0.926)	(–0.32, 1.041)	(–0.122, 0.843)

Before estimating the linear model, we need to sort the data using the cluster variables (this is necessary to estimate the CESE using the ceser package, but it is not necessary to estimate the CRSE using the sandwish package). In our example, we will cluster the data by country. Hence:


R> dcese = dcese[order(df$country), ]

Estimate the linear model using the lm() function.


R> mod = lm(enep ~ enpc + fapres + enpcfapres
                 + proximity + eneg + logmag
                 + logmag_eneg, data=dcese)

The estimated raw values of the variance covariance matrix obtained by running the standard R function from the stats package [] are:


R> vcov (mod)

	(Intercept)	enpc	fapres	enpcfapres	proximity
(Intercept)	0.34193	–0.080109	–0.06498717	0.0227605	–0.0416369
enpc	–0.08011	0.035697	0.02401318	–0.0102825	0.0059204
fapres	–0.06499	0.024013	0.02734250	–0.0090018	–0.0004345
enpcfapres	0.02276	–0.010283	–0.00900179	0.0036430	–0.0014388
proximity	–0.04164	0.005920	–0.00043452	–0.0014388	0.0776196
eneg	–0.03580	–0.001477	–0.00251785	0.0007025	–0.0039084
logmag	–0.05448	–0.006981	0.00017420	0.0021400	–0.0023836
logmag_eneg	0.02532	0.001833	–0.00007513	–0.0007721	–0.0009086
	eneg	logmag	logmag_eneg
(Intercept)	–0.0358050	–0.0544826	0.02532042
enpc	–0.0014768	–0.0069809	0.00183259
fapres	–0.0025179	0.0001742	–0.00007513
enpcfapres	0.0007025	0.0021400	–0.00077214
proximity	–0.0039084	–0.0023836	–0.00090860
eneg	0.0218856	0.0222887	–0.01190289
logmag	0.0222887	0.0606796	–0.02995518
logmag_eneg	–0.0119029	–0.0299552	0.01778317

The CRSE, using countries as the grouping variable, obtained using the vcovCL() function of the sandwich package [] are:


R> library(sandwich)
R> vcovCL(mod, cluster = ~country, type=“HC3”)

	(Intercept)	enpc	fapres	enpcfapres	proximity
(Intercept)	0.376409	–0.0929549	–0.06620	0.022499	–0.0315432
enpc	–0.092955	0.0930327	0.05081	–0.026847	0.0000196
fapres	–0.066198	0.0508080	0.07437	–0.024184	–0.0177849
enpcfapres	0.022499	–0.0268474	–0.02418	0.010785	0.0020836
proximity	–0.031543	0.0000196	–0.01778	0.002084	0.1029317
eneg	0.001905	–0.0165885	–0.02183	0.007097	–0.0200007
logmag	–0.030573	–0.0642203	–0.04945	0.022924	–0.0285040
logmag_eneg	–0.002075	0.0124010	0.02094	–0.007229	0.0317879
	eneg	logmag	logmag_eneg
(Intercept)	0.001905	–0.03057	–0.002075
enpc	–0.016589	–0.06422	0.012401
fapres	–0.021832	–0.04945	0.020940
enpcfapres	0.007097	0.02292	–0.007229
proximity	–0.020001	–0.02850	0.031788
eneg	0.027519	0.06041	–0.039241
logmag	0.060413	0.27344	–0.158061
logmag_eneg	–0.039241	–0.15806	0.120629

In a similar fashion, the CESE are obtained by simply running the function vcovCESE() of the ceser package:


R> vcovCESE(mod, cluster = ~country, type=”HC3”)

	(Intercept)	enpc	fapres	enpcfapres	proximity
(Intercept)	1.59804	–0.3565890	–0.326045	0.0928614	–0.086959
enpc	–0.35659	0.1254735	0.104834	–0.0354704	–0.003333
fapres	–0.32604	0.1048342	0.143206	–0.0389794	–0.017879
enpcfapres	0.09286	–0.0354704	–0.038979	0.0126978	0.003218
proximity	–0.08696	–0.0033328	–0.017879	0.0032179	0.139695
eneg	–0.08737	0.0028258	–0.007081	0.0010940	–0.005680
logmag	–0.22422	0.0009845	0.006688	0.0038080	0.009776
logmag_eneg	0.08381	–0.0058250	–0.011500	0.0008569	0.004472
	eneg	logmag	logmag_eneg
(Intercept)	–0.087372	–0.2242235	0.0838093
enpc	0.002826	0.0009845	–0.0058250
fapres	–0.007081	0.0066880	–0.0115004
enpcfapres	0.001094	0.0038080	0.0008569
proximity	–0.005680	0.0097761	0.0044718
eneg	0.039433	0.0481561	–0.0231003
logmag	0.048156	0.2244237	–0.1048418
logmag_eneg	–0.023100	–0.1048418	0.0606626

Note that the estimated standard errors are ordered as expected. The raw standard errors are smaller than CRSE, which by its turn are smaller than CESE for almost all coefficients:

The standard errors for each method are:


R> sqrt(diag(vcov(mod)))

(Intercept)	enpc	fapres	enpcfapres	proximity
0.58475	0.18894	0.16536	0.06036	0.27860
eneg	logmag	logmag_eneg
0.14794	0.24633	0.13335


R> sqrt(diag(vcovCL(mod, cluster=~country, type=“HC3”)))

(Intercept)	enpc	fapres	enpcfapres	proximity
0.6135	0.3050	0.2727	0.1039	0.3208
eneg	logmag	logmag_eneg
0.1659	0.5229	0.3473


R> sqrt(diag(vcovCESE(mod, cluster=~country, type=“HC3”)))

(Intercept)	enpc	fapres	enpcfapres	proximity
1.2641	0.3542	0.3784	0.1127	0.3738
eneg	logmag	logmag_eneg
0.1986	0.4737	0.2463

Summary tables with the raw standard errors, CRSE, and CESE are easy to produce. The package lmtest is specially useful for that purpose. The package ceser integrates nicely with the lmtest package and the function coeftest() of that package, which can be used to create summary tables with the different standard errors. The raw estimates are:


R> summary(mod)
Call:
lm (formula = enep ~ enpc + fapres + enpcfapres + proximity +
                eneg + logmag + logmag_eneg, data = dcese)
Residuals:

Min	1Q	Median	3Q	Max
–3.559	–0.819	–0.361	0.377	9.039


Coefficients:

	Estimate	Std.Error	t value	Pr (>\|t\|)
(Intercept)	2.7043	0.5848	4.62	0.0000056	***
enpc	0.3040	0.1889	1.61	0.10871
fapres	–0.6118	0.1654	–3.70	0.00026	***
enpcfapres	0.2078	0.0604	3.44	0.00066	***
proximity	–0.0224	0.2786	–0.08	0.93589
eneg	–0.0657	0.1479	–0.44	0.65748
logmag	–0.1815	0.2463	–0.74	0.46193
logmag_eneg	0.3605	0.1334	2.70	0.00727	**
–--


codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Residual standard error: 1.65 on 291 degrees of freedom
Multiple R-squared: 0.378, Adjusted R-squared: 0.363
F-statistic: 25.3 on 7 and 291 DF, p-value: <0.0000000000000002

We can obtain the summary with CRSE by country by running:


R> library(lmtest)
R> coeftest(mod, vcov = vcovCL, cluster = ~ country, type=”HC3”)


t test of coefficients:

	Estimate	Std.Error	t value	Pr(>\|t\|)
(Intercept)	2.7043	0.6135	4.41	0.000015	***
enpc	0.3040	0.3050	1.00	0.320
fapres	–0.6118	0.2727	–2.24	0.026	*
enpcfapres	0.2078	0.1039	2.00	0.046	*
proximity	–0.0224	0.3208	–0.07	0.944
eneg	–0.0657	0.1659	–0.40	0.693
logmag	–0.1815	0.5229	–0.35	0.729
logmag_eneg	0.3605	0.3473	1.04	0.300
–--


codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Similary, to use CESE instead of CRSE, simply run


R> coeftest(mod, vcov = vcovCESE, cluster = ~ country, type=”HC3”)


t test of coefficients:

	Estimate	Std.Error	t value	Pr (>\|t\|)
(Intercept)	2.7043	1.2641	2.14	0.033	*
enpc	0.3040	0.3542	0.86	0.391
fapres	–0.6118	0.3784	–1.62	0.107
enpcfapres	0.2078	0.1127	1.84	0.066	.
proximity	–0.0224	0.3738	–0.06	0.952
eneg	–0.0657	0.1986	–0.33	0.741
logmag	–0.1815	0.4737	–0.38	0.702
logmag_eneg	0.3605	0.2463	1.46	0.144
–--


codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 1 shows how the confidence intervals differ for the different estimates of the standard error of the coefficients. The CRSE are shown with both the HC₁ and HC₃ adjustments to the residuals. We can see how the CESE is more conservative, particulary for the two covariates, fapres (presidential power) and enpcfapres [the interaction between effective number of legislative parties (enpc) and presidential power (fapres)]. For them, the null value is consistent with the data when the CESE is used, but not if the other standard errors are adopted for the computation of the confidence intervals.

The user should note that the performance of the estimation is not yet optimized to handle large data sets. There are two reasons for the suboptimal performance. The first is that the current implementation uses only high-level functions in R. The second is that the software avoids storing large matrices by using some nested loops during the estimation. Future versions of the package will implement the core functions in C++ and provide compiled code with the package to improve the performance. Nevertheless, the package is fully functional, and the performance tests shows that on average the a data set with 1000 observations take 5.3 seconds to estimate, with 3000 it takes 85 seconds, and with 5000 observations it takes around 5.5 minutes to compute the standard errors.

Quality control

The package has been thoroughly quality checked and tested. The package structure successfully passes all CRAN R CMD checks and all continuous integration checks implemented in Travis, including checks to build the package on Windows, Linux, and macOS. The results of the checks can be found on Travis website https://travis-ci.org/github/DiogoFerrari/ceser.

(2) Availability

Operating system

CESER is written in R (>=2.1) and run in any operational system that supports R Statistical Software. R can be obtained freely from https://www.r-project.org/.

Programming language

R Statistical Software 2.1 or higher.

Additional system requirements

There is no additional requirements.

Dependencies

The package depends on the following R packages: magrittr, purrr, dplyr, tibble, lmtest.

List of contributors

Diogo Ferrari, Department of Political Science, University of California, Riverside
John E. Jackson, Department of Political Science, University of Michigan, Ann Arbor

Software location

Archive

Name: Cluster Estimated Standard Errors in R (CESER)

Persistent identifier: 10.5281/zenodo.4107151

Licence: MIT

Publisher: Diogo Ferrari

Version published: v1.0.0

Date published: 10/19/2020

Code repository

Name: ceser

Identifier: https://doi.org/10.5281/zenodo.4107151

Licence: MIT

Date published: 10/19/2020

Language

English

(3) Reuse potential

Firstly, the adoption of methods that deal with clustered standard errors is ubiquitous in social sciences. Currently, available packages in R only provide traditional ways (CRSE) to estimate regression models with clustered standard errors, as discussed above. The CESER package provides an easy-to-use implementation of a new method, namely CESER, as proposed in Jackson []. It is important to note that the method implemented in our package is not bounded by any specific subfield. The package is of direct interest to any researcher using regression models.

The Cluster Estimated Standard Errors in R (CESER) package is fully compatible with other R packages widely used to compute regression models in economics, psychology, political science, sociology, and many other disciplines. Those packages include the built-in R module stats to complete linear models, as well as some extensions such as glm, lmtest, lme4. Researchers using those packages can seamlessly use our package to deal with clustered standard errors. The CESER package is well-documented and contains working examples for a copy-and-paste experimentation. Moreover, code examples are provided at the package author’s personal website, including a code vignette explaining the package usage. As presented in the paper, the output of the main estimation function follows standard R format and can be manipulated by popular external packages for data visualization and reports, including tidyverse, kable, pipe computing, and ggplot2. Hence, our package can easily be reused or extended.

There are three main options for those interested in extending or contributing to the package. First, we provide full open access to the source code in the package’s GitHub repository. Users can either open a ticket requesting extensions or suggesting changes. They can also make changes to their local version of the code and open a pull request for software extension or modification using the GitHub website. Finally, users are welcome to e-mail to the principal author and request further enhancements.

[B1] Bertrand M, Duflo E, Mullainathan S. How much should we trust differences-in-differences estimates? The Quarterly journal of economics, 2004; 119(1): 249–275. DOI: https://doi.org/10.1162/003355304772839588

[B2] Cameron AC, Gelbach JB, Miller DL. Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics, 2008; 90(3): 414–427. DOI: https://doi.org/10.1162/rest.90.3.414

[B3] Chambers JM. Linear models. In Chambers JM, Hastie TJ (Eds.), Statistical Models in S, chapter 4. Wadsworth Brooks/Cole; 1992.

[B4] Djogbenou AA, MacKinnon JG, Nielsen M∅. Asymptotic theory and wild bootstrap inference with clustered errors. Journal of Econometrics, 2019; 212(2): 393–412. DOI: https://doi.org/10.1016/j.jeconom.2019.04.035

[B5] Eicker F. Limit theorems for regressions with unequal and dependent errors. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967; 1: 59–82.

[B6] Elgie R, Bucur C, Dolez B, Laurent A. Proximity, candidates, and presidential power: How directly elected presidents shape the legislative party system. Political Research Quarterly, 2014; 67(3): 467–477. DOI: https://doi.org/10.1177/1065912914530514

[B7] Esarey J, Menger A. Practical and effective approaches to dealing with clustered data. Political Science Research and Methods, 2018; 1–19. DOI: https://doi.org/10.1017/psrm.2017.42

[B8] Greene WH. Econometric analysis. Upper Saddle River, NJ: Pearson Prentice Hall; 2012.

[B9] Harden JJ. A bootstrap method for conducting statistical inference with clustered data. State Politics & Policy Quarterly, 2011; 11(2): 223–246. DOI: https://doi.org/10.1177/1532440011406233

[B10] Hayes AF, Cai L. Using heteroskedasticity-consistent standard error estimators in ols regression: An introduction and software implementation. Behavior research methods, 2007; 39(4): 709–722. DOI: https://doi.org/10.3758/BF03192961

[B11] Imbens GW, Kolesar M. Robust standard errors in small samples: Some practical advice. Review of Economics and Statistics, 2016; 98(4): 701–712. DOI: https://doi.org/10.1162/REST_a_00552

[B12] Jackson J. Corrected standard errors with clustered data. Political Analysis, 2020; 28(3): 318–339. DOI: https://doi.org/10.1017/pan.2019.38

[B13] Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika, 1986; 73(1): 13–22. DOI: https://doi.org/10.1093/biomet/73.1.13

[B14] MacKinnon JG. How cluster-robust inference is changing applied econometrics. Canadian Journal of Economics, 2019; 52(3): 851–881. DOI: https://doi.org/10.1111/caje.12388

[B15] MacKinnon JG, Webb MD. Wild bootstrap inference for wildly different cluster sizes. Journal of Applied Econometrics, 2017; 32(2): 233–254. DOI: https://doi.org/10.1002/jae.2508

[B16] Pustejovsky J. clubSandwich: Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections. R package version 0.5.3; 2021.

[B17] Pustejovsky JE, Tipton E. Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 2018; 36(4): 672–683. DOI: https://doi.org/10.1080/07350015.2016.1247004

[B18] R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2018.

[B19] Roodman D, Nielsen, M∅, MacKinnon JG, Webb MD. Fast and wild: Bootstrap inference in stata using boottest. The Stata Journal, 2019; 19(1): 4–60. DOI: https://doi.org/10.1177/1536867X19830877

[B20] White HL. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 1980; 48(4): 817–838. DOI: https://doi.org/10.2307/1912934

[B21] Wilkinson GN, Rogers CE. Symbolic descriptions of factorial models for analysis of variance. Applied Statistics, 1973; 22: 392–9. DOI: https://doi.org/10.2307/2346786

[B22] Zeileis A. Econometric computing with hc and hac covariance matrix estimators; 2004. DOI: https://doi.org/10.18637/jss.v011.i10

Journal of Open Research Software

Software Metapapers

CESER: An R Package to Compute Cluster Estimated Standard Errors

Abstract

(1) Overview

Introduction

Clustered Standard Errors Corrections

Cluster Robust Standard Errors (CRSE)

Cluster Estimated Standard Errors (CESE)

Implementation and architecture

Computing the CESE

Example with application

Quality control

(2) Availability

Operating system

Programming language

Additional system requirements

Dependencies

List of contributors

Software location

Language

(3) Reuse potential

Notes

Competing Interests

References