This assignment is designed to review the materials you learn in the
lab. Be sure to comment your code to clarify what you are doing. Not
only does it help with grading, but it will also help you when you
revisit it in the future. Please use either RMarkdown or
knitr to turn in your assignment. These are fully compatible
with R and LaTeX. If your code uses random number generation,
please use set.seed(12345)
for replicability. Please post
any questions on Piazza.
Think about a relationship between an outcome \(Y\) and explanatory variable(s) \(X\) you are interested in for your
research. Assume all or most of our standard OLS assumptions. Describe
an example of what an influential point might look like in this
context. If such an observation were in your data, briefly describe what
you might do to account for it.
Explain why multicollinearity is problematic in the OLS context.
(This is a continuation of the assignment from Lab 5 and Lab 6 — you
can reuse the same model if you wish)
Load the gavote data from the faraway package.
Create a new variable undercount by calculating the percentage
of ballots that were not counted into votes, using the variables
votes and ballots. We will use undercount as
our outcome variable of interest. Choose three other variables to use as
predictors. If you wish to construct a new variable (such as
perGore from previous weeks), you may do so as long as you
explain why it is meaningful. Run a linear regression using
lm()
.
Using plot()
, cooks.distance()
, and other
functions, run regression diagnostics on your model. Interpret the
results of these diagnostics and report any influential points.