This assignment is designed to review the materials you learn in the lab. Be sure to comment your code to clarify what you are doing. Not only does it help with grading, but it will also help you when you revisit it in the future. Please use either RMarkdown or knitr to turn in your assignment. These are fully compatible with R and LaTeX. If your code uses random number generation, please use set.seed(12345) for replicability. Please post any questions on Piazza.


1) Checking Intuition

  1. Intuitively, why should we care that the OLS estimator \(\hat{\beta}\) is “BLUE?”

  2. Explain, in words (i.e., with minimal math and no equations or algebra), the logic of the proof of the Gauss-Markov Theorem. What is the goal of the Gauss-Markov Theorem, and what steps must we take to show it is true? What assumptions are necessary?


2) Math

Let \(\hat{\beta}= (\textbf{X}^{\top}\textbf{X})^{-1}\textbf{X}^{\top}Y\), and let \(\tilde{\beta} = \textbf{C}Y\), with \(\textbf{CX} = \textbf{I}\) and \(\textbf{C} = (\textbf{X}^{\top}\textbf{X})^{-1}\textbf{X}^{\top} + \textbf{D}\), such that \(\textbf{DX} = 0_{K+1 \times K+1}\). Assume all of our standard assumptions from Checking Intuition. Show:

  1. \(\tilde{\beta}\) is an unbiased estimator for \(\hat{\beta}\)
  2. \(\mathbb{V}\left(\tilde{\beta} \,|\, \textbf{X} \right) \geq \mathbb{V}\left(\hat{\beta}\,|\, \textbf{X} \right)\)

For both proofs, explain in English why every step is a valid operation. In other words, each and every single line of your proofs must have English sentence(s) justifying them.

To get you started, your answers should look something like this:

  1. Showing \(\tilde{\beta}\) is an unbiased estimator for \(\hat{\beta}\): first, substitute \(\tilde{\beta} = \textbf{C}Y\) and \(\textbf{C} = (\textbf{X}^{\top}\textbf{X})^{-1}\textbf{X}^{\top} + \textbf{D}\) — implying that \(\tilde{\beta} = (\textbf{X}^{\top}\textbf{X})^{-1} \textbf{X}^{\top} + \textbf{D})Y\) — into our equation.
    \[\begin{equation*} \begin{aligned} \mathbb{E}\left[\tilde{\beta} - \beta \,|\, \textbf{X} \right] & = \mathbb{E}\left[ (\textbf{X}^{\top}\textbf{X})^{-1} \textbf{X}^{\top} + \textbf{D})Y - \beta \,|\, \textbf{X} \right]\\ & \vdots \\ & \text{(you fill out the rest of these steps!)} \\ & \vdots \\ & = 0 \end{aligned} \end{equation*}\]

  2. Showing \(\mathbb{V}\left(\tilde{\beta} \,|\, \textbf{X} \right) \geq \mathbb{V}\left(\hat{\beta}\,|\, \textbf{X} \right)\). This should follow the same structure and format as 2a.


3) Coding in R

(This is a continuation of the assignment from Lab 5 — you can reuse the same model if you wish)

Load the gavote data from the faraway package. Create a new variable undercount by calculating the percentage of ballots that were not counted into votes, using the variables votes and ballots. We will use undercount as our outcome variable of interest. Choose three other variables to use as predictors, and justify those choices. If you wish to construct a new variable (such as perGore from previous weeks), you may do so as long as you explain why it is meaningful. Write out your linear model and run a linear regression using lm(). Interpret your results.

Let’s manually calculate the standard errors for our predictors. Create the necessary vectors and matrices (don’t forget that we need a column of 1’s for the intercept!); pay close attention to your data types. R will store scalars generated through matrix operations as a \(1 \times 1\) matrix. Compare these standard errors with the ones generated using lm(). Are they equal?