Normal Distribution

Definition: a continuous random variable \(Z \sim \mathcal{N}(0,1)\) if it has the following pdf: \[\begin{equation*} \phi(z) = \displaystyle \frac{1}{\sqrt{2 \pi}} \exp \left[\frac{-z^2}{2} \right] \end{equation*}\]

Likewise, the CDF of a standard normal random variable is given by: \[\begin{equation*} \Phi(z) = \displaystyle \int_{-\infty}^{z} \frac{1}{\sqrt{2 \pi}} \exp \left[\frac{-t^2}{2} \right]\,dt \end{equation*}\]

The normal distribution is symmetric around its center, which gives us some useful properties. Let \(Z \sim \mathcal{N}(0,1)\):

# I'm just plotting z from -4 to 4. Remember that it actually traverses the real line.
# Use z to shift the bounds of the plot.
# This is really important! If you're looking at something centered around
# another number, you need to adjust the field of view accordingly!
z <- c(-4,4)

# I'm using the ggplot2 library to make some nicer visuals than base R.
# If you've never used ggplot2, you can install it by entering
# `install.packages("ggplot2")` into the console.
# You only need to install a package once, but need to load that library
# every new session of R. You do that by entering `library(*)` into the console.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
# We're passing `z` into ggplot() as a data.frame object. Don't worry about this too much
ggplot(data = data.frame(z), aes(z)) +
  
  # This inputs the standard normal distribution with mean 0 and standard deviation 1
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) +
  
  # Suppose z = 1. Then -z = -1. Then phi(z) = phi(1) = phi(-z) = phi(-1)
  # This code makes the point and dotted lines for phi(1)
  geom_point(aes(x = 1, y = dnorm(1)), color = "red")+
  geom_segment(aes(x = 1, xend = 1, y = 0, yend = dnorm(1)),
               color = "red",
               linetype = "dashed")+
  annotate("text", x = 1, y = dnorm(1), hjust = -0.25, label = "phi(1)", color = "red")+
  
  # and this does the same for phi(-1)
  geom_point(aes(x = -1, y = dnorm(-1)), color = "red")+
  geom_segment(aes(x = -1, xend = -1, y = 0, yend = dnorm(-1)),
               color = "red",
               linetype = "dashed")+
  annotate("text", x = -1, y = dnorm(-1), hjust = 1.25, label = "phi(-1)", color = "red")+
  
  # This horizontal line shows that the two are equal
  geom_segment(aes(x = -1, xend = 1, y = dnorm(-1), yend = dnorm(1)),
               color = "red",
               linetype = "dashed")+
  
  # This just makes it nicer to look at
  ylab("") +
  scale_y_continuous(breaks = NULL)+
  theme_bw()

# And this works for any value of z! For instance, z = +/-0.5 and z = +/-2:
# z = +/-0.5
ggplot(data = data.frame(z), aes(z)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
  geom_point(aes(x = 0.5, y = dnorm(0.5)), color = "blue")+
  geom_segment(aes(x = 0.5, xend = 0.5, y = 0, yend = dnorm(0.5)),
               color = "blue",
               linetype = "dashed")+
  annotate("text", x = 0.5, y = dnorm(0.5), hjust = -0.25, label = "phi(0.5)", color = "blue")+
  geom_point(aes(x = -0.5, y = dnorm(-0.5)), color = "blue")+
  geom_segment(aes(x = -0.5, xend = -0.5, y = 0, yend = dnorm(-0.5)),
               color = "blue",
               linetype = "dashed")+
  annotate("text", x = -0.5, y = dnorm(-0.5), hjust = 1.25, label = "phi(-0.5)", color = "blue")+
  geom_segment(aes(x = -0.5, xend = 0.5, y = dnorm(-0.5), yend = dnorm(0.5)),
               color = "blue",
               linetype = "dashed")+
  scale_y_continuous(breaks = NULL)+
  theme_bw()

# and z = +/-2
ggplot(data = data.frame(z), aes(z)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
  geom_point(aes(x = 2, y = dnorm(2)), color = "blue")+
  geom_segment(aes(x = 2, xend = 2, y = 0, yend = dnorm(2)),
               color = "blue",
               linetype = "dashed")+
  annotate("text", x = 2, y = dnorm(2), hjust = -0.25, label = "phi(2)", color = "blue")+
  geom_point(aes(x = -2, y = dnorm(-2)), color = "blue")+
  geom_segment(aes(x = -2, xend = -2, y = 0, yend = dnorm(-2)),
               color = "blue",
               linetype = "dashed")+
  annotate("text", x = -2, y = dnorm(-2), hjust = 1.25, label = "phi(-2)", color = "blue")+
  geom_segment(aes(x = -2, xend = 2, y = dnorm(-2), yend = dnorm(2)),
               color = "blue",
               linetype = "dashed")+
  scale_y_continuous(breaks = NULL)+
  theme_bw()


# Let's use the standard normal again
# Suppose Z = 0
ggplot(data = data.frame(z), aes(z))+
  
  # This just plots the normal PDF. you can change the mean and sd later
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1))+
  
  # This specifies area under the curve visually. Remember that this goes from -infinity to Z
  # `fill` just specifies color and `alpha` is transparency
  stat_function(fun = dnorm,
                xlim = c(-4, 0), # Remember, we want the area from the lower bound to Z = 0
                geom = "area",
                fill = "red",
                alpha = 0.5)+
  
  # And this is the right partition from Z to +infinity
  stat_function(fun = dnorm,
                xlim = c(0,4),
                geom = "area",
                fill = "blue",
                alpha = 0.5)+
  
  # And this is just to look nice
  scale_y_continuous(breaks = NULL)+
  theme_bw()

# Suppose, then, that Z = 1.
# Z = 1 is red and Z = -1 is blue/purple
ggplot(data = data.frame(z), aes(z))+
  stat_function(fun = dnorm)+
  stat_function(fun = dnorm,
                xlim = c(-4, 1),
                geom = "area",
                fill = "red",
                alpha = 0.5)+
  stat_function(fun = dnorm,
                xlim = c(-4, -1),
                geom = "area",
                fill = "blue",
                alpha = 0.5)+
  scale_y_continuous(breaks = NULL)+
  theme_bw()

# If we add those areas up, we get the whole sample space with area = 1
ggplot(data = data.frame(z), aes(z))+
  stat_function(fun = dnorm)+
  stat_function(fun = dnorm,
                xlim = c(-4, 1),
                geom = "area",
                fill = "red",
                alpha = 0.5)+
  stat_function(fun = dnorm,
                xlim = c(1, 4),
                geom = "area",
                fill = "blue",
                alpha = 0.5)+
  scale_y_continuous(breaks = NULL)+
  theme_bw()


Remember, for a random variable \(X = \mu + \sigma Z\), we can standardize it to \(Z \sim \mathcal{N}(0,1)\). In that case, we would say \(X \sim \mathcal{N}(\mu, \sigma^2)\). A consequence is that we can use this property to derive information about any normal random variable in terms of the standard normal. To find the CDF of \(X \sim \mathcal{N}(\mu, \sigma^2)\):

  1. Note that \(X = \mu + \sigma Z \implies \frac{x - \mu}{\sigma} = Z\).
  2. Using the definition of CDF, we have: \[\begin{equation*} \begin{aligned} F_{X}(x) & = \mathbb{P}(X \leq x)\\ & = \mathbb{P}(\mu + \sigma Z \leq x)\\ & = \mathbb{P}(Z \leq \frac{x - \mu}{\sigma})\\ & = \Phi\left(\frac{x - \mu}{\sigma}\right) \end{aligned} \end{equation*}\]

Similarly, we can use this result to find the PDF: \[\begin{equation*} \begin{aligned} f_{X}(x) & = \frac{d}{dx}F_{X}(x)\\ & = \frac{d}{dx}\Phi\left(\frac{x - \mu}{\sigma}\right)\\ & = \frac{1}{\sigma}\frac{1}{\sqrt{2\pi}} \exp \left[\frac{(x - \mu)^2}{2\sigma^2} \right]\\ \implies \mathbb{P}(a \leq X \leq b) & = \Phi\left(\frac{b - \mu}{\sigma}\right) - \Phi\left(\frac{a - \mu}{\sigma}\right) \end{aligned} \end{equation*}\]


Exercise: Using Properties of the Normal Distribution
Let \(X \sim \mathcal{N}(-5,4)\):
a) Find \(\mathbb{P}(X \leq 0)\)
b) Find \(\mathbb{P}(-7 \leq x \leq -3)\)

Solution:
a) Since \(\mu = -5\) and \(\sigma^2 = 4\), we can determine \(\mathbb{P}(X \leq 0)\) using these properties: \[\begin{equation*} \begin{aligned} \mathbb{P}(X \leq 0) & = F_{X}(x = 0)\\ & = \Phi\left(\frac{0 - \mu}{\sigma}\right)\\ & = \Phi\left(\frac{0 - (-5)}{\sqrt{4}}\right)\\ & = \Phi\left(\frac{5}{2}\right) \end{aligned} \end{equation*}\]

Then, we can use the pnorm() function in R to evaluate that quantity.

pnorm(q = 5/2, mean = 0, sd = 1) # why are we using this mean and standard deviation?
## [1] 0.9937903

And we can double-check using pnorm() directly:

pnorm(q = 0, mean = -5, sd = sqrt(4))
## [1] 0.9937903

If we were to plot the area under the curve, it would look something like this:

z <- c(-13:3) # Shifting the bounds to something else

ggplot(data = data.frame(z), aes(z))+
  
  # Just drawing the black PDF curve
  stat_function(fun = dnorm,
                n = 101,
                args = list(mean = -5, sd = 2))+
  
  # Now the area under the curve
  stat_function(fun = dnorm,
                n = 101,
                args = list(mean = -5, sd = 2),
                xlim = c(-13, 0),
                geom = "area",
                fill = "red",
                alpha = 0.5)+
  ylab("")+
  theme_bw()

  1. Simply plug and chug: \[\begin{equation*} \begin{aligned} \mathbb{P}(-7 \leq X \leq -3) & = \Phi\left(\frac{-3 - (-5)}{\sqrt{4}}\right) - \Phi\left(\frac{-7 - (-5)}{\sqrt{4}}\right)\\ & = \Phi\left(\frac{2}{2}\right) - \Phi\left(\frac{-2}{2}\right)\\ & = \Phi(1) - \Phi(-1) \end{aligned} \end{equation*}\]

From here, there are a number of ways to evaluate \(\Phi(1) - \Phi(-1)\). We can do it directly using pnorm():

pnorm(q = 1, mean = 0, sd = 1) - pnorm(q = -1, mean = 0, sd = 1)
## [1] 0.6826895

We can also use a property of \(\Phi(z)\): namely, that \(\Phi(z) = 1 - \Phi(-z)\), which implies \(\Phi(z) - \Phi(-z) = [1 - \Phi(z)] - \Phi(-z)\), which is \(1 - 2 \Phi(-z)\) With pnorm(), that is:

1 - 2* pnorm(q = -1, mean = 0, sd = 1)
## [1] 0.6826895

Multiple Random Variables

Definition: Let \(X\) and \(Y\) be random variables. The joint distribution function of \(X\) and \(Y\) is the function \(F : \mathbb{R}^2 \to [0,1]\), defined by \(F(x,y) = \mathbb{P}(X \leq x, Y \leq y)\).


Exercise: Using a Joint PDF
Let \(X\) and \(Y\) be two jointly continuous random variables with the following joint PDF: \[\begin{equation*} f_{XY}(x,y) = \begin{cases} x + cy^2 & \text{if } 0 \leq x \leq 1, 0 \leq y \leq 1\\ 0 & \text{otherwise} \end{cases} \end{equation*}\]

  1. Find the constant \(c\).
  2. Find \(\mathbb{P}(0 \leq X \leq \frac{1}{2}, 0 \leq Y \leq \frac{1}{2})\).

Solution:

  1. Remember that \(\displaystyle \int_{X} \int_{Y} f(x,y)\,dx\,dy = 1\). Since \(X\) and \(Y\) are each defined over \([0,1]\), we simply integrate the PDF \(f_{XY}(x,y) = x + cy^2\) over those bounds. \[\begin{equation*} \begin{aligned} 1 & = \displaystyle \int_{X} \int_{Y} f(x,y)\,dx\,dy\\ & = \int_{0}^{1} \int_{0}^{1} x + cy^2\,dx\,dy\\ & = \int_{0}^{1} [\frac{1}{2}x^2 + cy^2x]_{x=0}^{x=1}\,dy\\ & = \int_{0}^{1} \frac{1}{2} + cy^2\,dy\\ & = [\frac{1}{2}y + \frac{1}{3}cy^3]_{y=0}^{y=1}\\ & = \frac{1}{2} + \frac{1}{3}c\\ \implies c &= \frac{3}{2} \end{aligned} \end{equation*}\]

  2. Now, we need to change the bounds of integration to reflect the problem. Using our result that \(c = \frac{3}{2}\) from part a), we evaluate: \[\begin{equation*} \begin{aligned} \mathbb{P}(0 \leq X \leq \frac{1}{2}, 0 \leq Y \leq \frac{1}{2}) & = \displaystyle \int_{0}^{1/2} \int_{0}^{1/2} (x + \frac{3}{2}y^2) \,dx\,dy\\ & = \int_{0}^{1/2} [\frac{1}{2}x^2 + \frac{3}{2}y^2 x]_{x=0}^{x=\frac{1}{2}}\,dy\\ & = \int_{0}^{1/2} (\frac{1}{8} + \frac{3}{4}y^2)\,dy\\ & = [\frac{1}{8}y + \frac{3}{12}y^3]_{y=0}^{y=\frac{1}{2}}\\ & = \frac{1}{8}(\frac{1}{2}) + \frac{3}{12}(\frac{1}{2})^3\\ & = \frac{3}{32} \end{aligned} \end{equation*}\]


Definition: Multivariate Normal

An \(n\)-dimensional multivariate normal random vector \(\mathbf{x} = (x_1, \dots, x_n)\) with the following density function: \[\begin{equation*} f(\mathbf{x}) = \frac{1}{\sqrt{(2 \pi)^n |\Sigma|}} \exp \left[-\frac{1}{2} (\mathbf{x} - \mu)^{\top} \Sigma^{-1}(\mathbf{x} - \mu)\right] \end{equation*}\] where \(\mu\) is an \(n \times 1\) vector of means and \(\Sigma\) is an \(n \times n\) positive definite covariance matrix – in other words, the diagonal entries for \(\Sigma\) are \(\sigma_{x_1}^2, \dots, \sigma_{x_n}^2\).


Definition: Conditional distributions

Let \(X\) and \(Y\) be random variables with marginal functions \(f_{X}(x)\) and \(f_{Y}(y)\) respectively, and the joint probability function \(f(x,y)\). The conditional distribution of \(Y\) given \(X\) is defined by: \[\begin{equation*} f(y|x) = \frac{f(x,y)}{f_{X}(x)} \end{equation*}\]


Definition: Independence
\(X\) and \(Y\) are said to be independent if \(f(x,y) = f_X(x) \cdot f_{Y}(y)\), or if the joint distribution is the product of each marginal distribution.


Definition: Marginal distribution

Let \(X\) and \(Y\) be random variables. Then, the marginal distribution of \(X\), \(f_{X}(x)\) is as follows:


Exercise: Using Joint, Marginal, and Conditional Distributions

Let \(X\) and \(Y\) be random variables. Prove that if \(X\) and \(Y\) are independent, then \(f(y|x) = f_{Y}(y)\).

Solution: By definition, \(f(y|x) = \frac{f(x,y)}{f_{X}(x)}\). Independence means that \(f(x,y) = f_X(x) \cdot f_{Y}(y)\), so we can substitute \(f(y|x) = \frac{f_X(x) \cdot f_{Y}(y)}{f_{X}(x)} = f_{Y}(y)\).


Summarizing Distributions

We can summarize most distributions with just a few numbers:
1) Central Tendency: where is the center of the distribution?
2) Spread: who spread is the distribution around the center?

Definition: Expectation
Let \(X\) be a a random variable with probability mass/density function \(f_{X}(x)\). Then the expectation of \(X\) is defined as:
\[\begin{equation*} \mathbb{E}[X] = \begin{cases} \displaystyle \sum_{X} x \cdot f(x) & \text{if $X$ is discrete}\\ \displaystyle \int_{X} x \cdot f(x)\, dx & \text{if $X$ is continuous} \end{cases} \end{equation*}\]

Let \(X\) and \(Y\) be random variables, and \(a\) and \(b\) be constants. Then:
- \(\mathbb{E}[a] = a\)
- \(\mathbb{E}[a \cdot X] = a \cdot \mathbb{E}[X]\)
- \(\mathbb{E}[a \cdot X + b] = a \cdot \mathbb{E}[X] + b\)
- \(\mathbb{E}[a \cdot X + b \cdot Y] = a \cdot \mathbb{E}[X] + b \cdot \mathbb{E}[Y]\)
- If \(X\) and \(Y\) are independent, then \(\mathbb{E}[X \cdot Y] = \mathbb{E}[X] \cdot \mathbb{E}[Y]\)

Law of Iterated Expectations
Let \(X\) and \(Y\) be random variables. Then, \(\mathbb{E}[X] = \mathbb{E}\left[\mathbb{E}[X | Y]\right]\)

Definition: Variance
Let \(X\) and \(Y\) be random variables, and \(a\) be a constant. The variance is defined as:
\[\begin{equation*} \mathbb{V}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - \mathbb{E}[X]^2 \end{equation*}\]

The variance has some important properties:
- \(\mathbb{V}(X + a) = \mathbb{V}(X)\)
- \(\mathbb{V}(a \cdot X) = a^2 \cdot \mathbb{V}(X)\)
- The covariance is \(\text{Cov}(X,Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])] = \mathbb{E}[X \cdot Y] - \mathbb{E}[X] \cdot \mathbb{E}[Y]\)
- \(\mathbb{V}(X \pm Y) = \mathbb{V}(X) + \mathbb{V}(Y) + 2 \cdot \text{Cov}(X,Y)\)
- If \(X\) and \(Y\) are independent, \(\text{Cov}(X,Y) = 0\), meaning \(\mathbb{V}(X \pm Y) = \mathbb{V}(X) + \mathbb{V}(Y)\)

Exercise: Deriving a Variance
Let \(X \sim \text{Bern}(p)\). We showed in class that \(\mathbb{E}[X] = p\). Find \(\mathbb{V}(X)\).

Solution: \(\mathbb{V}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2\). We know\(\mathbb{E}[X] = p\), so \(\mathbb{E}[X]^2 = p^2\). We need to find \(\mathbb{E}[X^2]\): \[\begin{equation*} \begin{aligned} \mathbb{E}[X^2] & = \mathbb{P}(X = 1) (1)^2 + \mathbb{P}(X = 0) (0)^2\\ & = \mathbb{P}(X = 1) \end{aligned} \end{equation*}\]

Thus, \(\mathbb{V}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2 = p - p^2\), which can be factored to \(p \cdot (1-p)\).