This assignment is designed to review the materials you learn in the lab. Be sure to comment your code to clarify what you are doing. Not only does it help with grading, but it will also help you when you revisit it in the future. Please use either RMarkdown or knitr to turn in your assignment. These are fully compatible with R and LaTeX. If your code uses random number generation, please use set.seed(12345) for replicability. Please post any questions on Piazza.


1) Checking Intuition

  1. Think about a relationship between an outcome \(Y\) and explanatory variable(s) \(X\) you are interested in for your research. Assume all or most of our standard OLS assumptions. Describe an example of what an influential point might look like in this context. If such an observation were in your data, briefly describe what you might do to account for it.

  2. Explain why multicollinearity is problematic in the OLS context.


2) Coding in R

(This is a continuation of the assignment from Lab 5 and Lab 6 — you can reuse the same model if you wish)

Load the gavote data from the faraway package. Create a new variable undercount by calculating the percentage of ballots that were not counted into votes, using the variables votes and ballots. We will use undercount as our outcome variable of interest. Choose three other variables to use as predictors. If you wish to construct a new variable (such as perGore from previous weeks), you may do so as long as you explain why it is meaningful. Run a linear regression using lm().

Using plot(), cooks.distance(), and other functions, run regression diagnostics on your model. Interpret the results of these diagnostics and report any influential points.