Random Variables

Definition: Random Variable

A random variable (often abbreviated to “R.V.”) is a function that maps from the sample space of an experiment to the real numbers. Mathematically, we express this as \(X: \Omega \to \mathbb{R}\), where \(X\) is the random variable and the numerical value for some outcome \(\omega\) is \(X(\omega)\). There are two types of random variable: discrete and continuous.

Discrete Distributions

Definition: Discrete Random Variable

A random variable \(X\) is discrete if the value it takes with positive probability is finite or countably-infinite. We might think of this as any variable that can be expressed in terms of integers \(\mathbb{Z}\), for instance.

What are some examples of discrete variables we might use in political science?

Definition: The Bernoulli Distribution

A random variable \(X\) has a Bernoulli distribution with parameter \(p\) if \(\mathbb{P}(X = 1) = p\) and \(\mathbb{P}(X = 0) = 1-p\). This is written as \(X \sim \text{Bern}(p)\) (spoken in English as “\(X\) is distributed Bernoulli \(p\)”). We might call such a variable a Bernoulli random variable.

In other words, \(X\) takes the value 1 with probability \(p\) and the value 0 with probability \(1-p\), such that \(p \in [0,1]\).

What are some examples of Bernoulli random variables we might use in political science?

Definition: Probability Mass Function

A Probability Mass Function (often abbreviated to “PMF”) is a function that gives the probability that a discrete random variable is exactly equal to some value. Mathematically, we express this as \(\mathbb{P}(X = x)\). A PMF has a couple of useful properties:

  1. Non-negativeness: \(\mathbb{P}(X = x) \geq 0\) if \(x\) is in the sample space, and \(\mathbb{P}(X = x) = 0\) otherwise.
  2. Summation to 1: \(\displaystyle \sum_{j=1}^{\infty} \mathbb{P}(X = x_j) = 1\).

Definition: Cumulative Distribution Function

The Cumulative Distribution Function (often abbreviated to “CDF”) is a function that returns the probability that a variable is less than a particular value. Mathematically, we express this as \(F_X(x) \equiv \mathbb{P}(X \leq x)\). A CDF has a few useful properties:

  1. Increasing: if \(x_1 \leq x_2\), then \(\mathbb{P}(X \leq x_1) \leq \mathbb{P}(X \leq x_2)\).
  2. Convergence to 0 and 1: \(\displaystyle \lim_{x \to -\infty} \mathbb{P}(X \leq x) = 0\) and \(\displaystyle \lim_{x \to \infty} \mathbb{P}(X \leq x) = 1\).
  3. Right-continuous: for some number \(a\), \(\mathbb{P}(X \leq a) = \displaystyle \lim_{x \to a^+} \mathbb{P}(X \leq x)\)

Definition: The Bernoulli Distribution PMF and CDF

The PMF of the Bernoulli Distribution can be expressed as

\[\begin{equation*} \mathbb{P}(X = x) = f(x;p) = \begin{cases} p & \text{if } x = 1\\ 1-p & \text{if } x = 0 \end{cases} \end{equation*}\]

The CDF of the Bernoulli Distribution can be expressed as

\[\begin{equation*} \mathbb{P}(X \leq x) = F_X(x;p) = \begin{cases} 0 & \text{if } x < 0\\ 1 - p & \text{if } 0 \leq x \leq 1\\ 1 & \text{if } x > 1 \end{cases} \end{equation*}\]

Definition: The Binomial Distribution

Let \(X\) be the number of successes in \(n\) independent Bernoulli trials each with success probability \(p\). Then, \(X\) follows a Binomial Distribution with parameters \(n\) and \(p\), expressed as \(X \sim \text{Bin}(n,p)\).

The PMF of the Binomial Distribution is

\[\begin{equation*} \mathbb{P}(X = x) = f(x;n,p) = {n \choose x} p^x (1-p)^{n-x} \end{equation*}\]

What are some examples of Binomial random variables we might use in political science?

Exercise: Using the Binomial Distribution

Given that \(n = 12\) and \(p = 0.75\), use R to find the probability of seeing a) 8 successes and b) 0 successes.

Solution: we can do this a couple of ways: first, by hand.

# a) n = 12, p = 0.75, and x = 8
# We plug those values into our PDF
choose(n = 12, k = 8) * 0.75^(8) * (1 - 0.75)^(12 - 8)
## [1] 0.1935777
# b) n = 12, p = 0.75, and x = 0
choose(n = 12, k = 0) * 0.75^(0) * (1 - 0.75)^(12 - 0)
## [1] 5.960464e-08

We can also use the dbinom() function in R.

# a) n = 12, p = 0.75, and x = 8
dbinom(x = 8, size = 12, prob = 0.75)
## [1] 0.1935777
# b) n = 12, p = 0.75, and x = 0
dbinom(x = 0, size = 12, prob = 0.75)
## [1] 5.960464e-08

Intuition Check: does it make sense that \(\mathbb{P}(X = 8)\) is greater than \(\mathbb{P}(X = 0)\)? Why?

Continuous Distributions

Definition: Continuous Random Variable

A random variable \(X\) is said to be continuous if there exists a nonnegative function of the real numbers \(\mathbb{R}\) a probability density function (often abbreviated to “PDF”) \(f_x\) such that, for any interval \((a,b)\): \[\begin{equation*} \mathbb{P}(a \leq X \leq b) = \int_{a}^{b} f_X(x) dx \end{equation*}\]

There are some important properties of the PDF:

  1. Geometrically, the probability of a region is the area under the PDF for that region
  2. The support of \(X\) is all values such that \(f_X(x) > 0\)
  3. The probability mass at any point is 0, or \(\mathbb{P}(X = x) = \int_{x}^{x} f_X(x)dx = 0\)

The CDF of a continuous random variable \(X\) is given by \[\begin{equation*} \mathbb{P}(X \leq x) = F_X(x) = \int_{-\infty}^{x} f_X(t) dt \end{equation*}\]

What are some examples of continuous random variables we might use in political science?

Definition: The Normal Distribution

A continuous random variable \(Z\) follows the standard normal distribution, written \(Z \sim \mathcal{N}(0,1)\), if it has the PDF:

\[\begin{equation*} \phi(z) = \int_{-\infty}^{z} \frac{1}{\sqrt{2 \pi}} e^{\frac{-t^2}{2}}dt \end{equation*}\]

The standard normal distribution has a bunch of nice properties. Here are a few:

  1. It has mean \(\mu = 0\) and variance/standard deviation \(\sigma^2 = \sigma = 1\)
  2. \(\displaystyle \int_{-\infty}^{\infty} \phi(z) dz = 1\)
  3. \(\phi(z) = \phi(-z)\), or it is symmetric around \(\mu = 0\)
  4. The CDF \(\Phi(z) = 1 - \Phi(-z)\)

What are some examples of normal random variables we might use in political science?