This assignment is designed to review the materials you learn in the lab. Be sure to comment your code to clarify what you are doing. Not only does it help with grading, but it will also help you when you revisit it in the future. Please use either RMarkdown or knitr to turn in your assignment. These are fully compatible with R and LaTeX. If your code uses random number generation, please use set.seed(12345) for replicability. Please post any questions on Piazza.


Consider the simple linear model:

\[\begin{equation*} Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \end{equation*}\]

1) Checking Intuition

Think of two random variables \(X\) and \(Y\) that you are interested in for your research. Describe the relationship between those variables both conceptually and in terms of the linear model. What does each term — \(\beta_0\), \(\beta_1\), and \(\epsilon_i\) — mean in terms of this relationship? Explain what assumptions are necessary about \(\epsilon_i\) for the linear model to work.

2) Coding in R

Load the gavote data from the faraway package using the following code:

library(faraway)
gavote <- force(gavote)
  1. Create a new variable undercount by calculating the percentage of ballots that were not counted into votes, using the variables votes and ballots. Create a new variable perGore that is the percentage of votes for Al Gore.

  2. Explain, conceptually, what undercount and perAA measure.

  3. Plot the relationship between gore and perAA. Looking at the plot (i.e., without calculating it formally), does the correlation seem to be closest to 0, 1, or -1? Why?

  4. Calculate the correlation between gore and perAA using both cor() and the mathematical formula for correlation. Are they the same?

  5. Run a linear regression with undercount as the response and perAA as the predictor. Write out the equation for the linear model. Summarize the regression results and explain what they mean.

  6. Calculate the total sum of squares (TSS), explained sum of squares (ESS), residual sum of squares (RSS), and \(R^2\) for your linear regression model.