Given that causal inference emphasizes the study of counterfactuals, we are left with a problem: we only get to see one of two potential outcomes for any given unit. Matching is a technique which allows us to estimate the average treatment effect for the treated (ATT) by comparing outcomes for treated units and their “closest” neighbors. That is, for every treated unit \(i\), we find an untreated partner \(j\) that is identical with respect to pretreatment covariates \(X_i = X_j\). Then, we measure the difference in outcomes.
Formally, we need some assumptions:
Procedure: Let \(X_i\) take on a finite number of values, and \(M_T =\{1, 2, \dots, N_T\}\) be the set of treated units. In exact matching, for each treated unit \(i \in M\) we find control units \(j\) such that \(X_i = X_j\). Call \(M_c = \{j_1, j_2, \dots, j_{N_T}\}\) be the set of matched controls. Then, we must have that \(\star: \, \mathbb{P}\left(X_i = x \, | \, T_i = 1\right) = \mathbb{P}\left(X_i = x \, | \, T_i = 0, \, M_c \right)\). If we are able to match exactly, we can recover the ATT:
\[\begin{equation*} \begin{aligned} \tau_{ATT} & = \mathbb{E}\left[Y_i(1) \, | \, T_i = 1\right] - \mathbb{E}\left[Y_i(0) \, | \, T_i = 1 \right] \, \text{by definition of ATT}\\ & = \mathbb{E}\left[Y_i \, | \, T_i = 1\right] - \mathbb{E}\left[Y_i(0) \, | \, T_i = 1\right]\, \text{by consistency}\\ & = \mathbb{E}\left[Y_i \, | \, T_i = 1\right] - \displaystyle \sum_{x} \mathbb{E}\left[Y_i(0) \, | \, T_i = 1, X_i = x\right] \mathbb{P}\left(X_i = x \, | \, T_i = 1\right) \, \text{by the law of total probability}\\ & = \mathbb{E}\left[Y_i \, | \, T_i = 1\right] - \displaystyle \sum_{x} \mathbb{E}\left[Y_i(0) \, | \, T_i = 0, X_i = x\right] \mathbb{P}\left(X_i = x \, | \, T_i = 1\right) \, \text{by unconfoundedness}\\ & = \mathbb{E}\left[Y_i \, | \, T_i = 1\right] - \displaystyle \sum_{x} \mathbb{E}\left[Y_i \, | \, T_i = 0, X_i = x\right] \mathbb{P}\left(X_i = x \, | \, T_i = 1\right) \, \text{by consistency, again}\\ & = \mathbb{E}\left[Y_i \, | \, T_i = 1\right] - \displaystyle \sum_{x} \mathbb{E}\left[Y_i \, | \, T_i = 0, X_i = x\right] \mathbb{P}\left(X_i = x \, | \, T_i = 0, \, M_c\right) \, \text{by} \, \star \end{aligned} \end{equation*}\]
In other words, we are able to take \(\tau_{ATT}\) — normally unobservable because of potential outcomes — and rewrite it as the expected outcome under treatment minus the expected outcome of each matched unit, summed over all units. Once this is established, we need only to calculate difference in means between treated and matched units:
\[\begin{equation*} \hat{\tau}_{\text{ATT, Match}} = \frac{1}{N_T} \displaystyle \sum_{i = 1}^{N_T} (Y_i - Y_{j(i)}) \end{equation*}\]