This assignment is designed to review the materials you learn in the lab. Be sure to comment your code to clarify what you are doing. Not only does it help with grading, but it will also help you when you revisit it in the future. Please use either RMarkdown or knitr to turn in your assignment. These are fully compatible with R and LaTeX. If your code uses random number generation, please use set.seed(12345) for replicability. Please post any questions on Piazza.


1) Checking Intuition

Think about an outcome \(Y\) you care about for your research. Name at least two explanatory variables \(X\) that you think influence \(Y\). Discuss what each of the four least squares assumptions — the linear model, exogeneity, homoskedastic/mean-zero errors, and independent observations — mean in terms of your variables of interest.


2) Math

  1. Let \(\textbf{A}\) be any \(n \times n\) matrix. Show that \(\textbf{AA} = \textbf{I}\) if and only if \((\textbf{I} - \textbf{A})(\textbf{I} + \textbf{A}) = 0\).

  2. Let \(\textbf{A}\) be a \(2 \times 2\) matrix, such that \(\textbf{A} = \begin{bmatrix}a & b \\ c & d\end{bmatrix}\). Under what conditions is \(\textbf{AA} = \textbf{I}\)?


3) Coding in R

Load the gavote data from the faraway package. Create a new variable undercount by calculating the percentage of ballots that were not counted into votes, using the variables votes and ballots. We will use undercount as our outcome variable of interest. Choose three other variables to use as predictors, and justify those choices. If you wish to construct a new variable (such as perGore from last week), you may do so as long as you explain why it is meaningful. Write out your linear model and run a linear regression using lm(). Interpret your results.

Let’s manually calculate the coefficients for our predictors. Create the necessary vectors and matrices (don’t forget that we need a column of 1’s for the intercept!); pay close attention to your data types. R will store scalars generated through matrix operations as a \(1 \times 1\) matrix. Compare these coefficients with the ones generated using lm(). Are they equal?