This assignment is designed to review the materials you learn in the lab. Be sure to comment your code to clarify what you are doing. Not only does it help with grading, but it will also help you when you revisit it in the future. Please use either RMarkdown or knitr to turn in your assignment. These are fully compatible with R and LaTeX. If your code uses random number generation, please use set.seed(12345) for replicability. Please post any questions on Piazza.


1) Checking Intuition

Explain, using minimal math or algebra, the following:

  1. Hypothesis testing from a frequentist perspective. That is, what are we trying to “test,” how do we do so, and why does it matter?
  2. The difference between the \(t\)-test regarding \(\beta\) and the omnibus \(F\)-test regarding multiple \(\beta\).
  3. Interaction terms. What do we risk by not using them? When can over-using them be problematic for research?
  4. Bootstrapping. When might it be necessary? How does the procedure work? Are there possible hazards to bootstrapping – in other words, why don’t we bootstrap everything?

2) Coding in R

Open the newhamp dataset from the faraway R package. This dataset contains vote counts and other demographic information from 276 wards (i.e., voting districts) in the 2008 Democratic Party presidential primary in the U.S. state of New Hampshire.

  1. Run a linear regression where pObama is a function of votesys, povrate, pci, and population. Report and interpret each coefficient substantively. Report and explain the significance (or lack thereof) of the omnibus \(F\)-test statistic for this model.
  2. Now, the same model, but include an additional variable where votesys is interacted with povrate (you will need to recode votesys so that one factor takes value 0 and the other takes value 1). Interpret, statistically and substantively, the differences between the coefficient estimates for votesys, povrate, and votesys \(\times\) povrate.
  3. Run a bootstrap with 5,000 iterations of your model from (a). Create a histogram or density plot of the empirical sampling distribution of \(\beta\) for each coefficient. Report the mean, standard errors, and 95% confidence intervals for each coefficient, and compare them to the ones originally from the model.