October 6
Normal Approximation to the Binomial Distribution
Calculating probabilities for a Binomial Random Variable ($X \sim B(n, p)$) can be computationally intensive when the number of trials ($n$) is very large.
The Normal Approximation allows us to use the Normal distribution (which is continuous) to estimate probabilities for the Binomial distribution (which is discrete) when $n$ is large.
The Condition (Rule of Thumb)
To ensure the approximation is accurate, the binomial distribution should be reasonably symmetric (not too skewed). We can use the normal distribution if both of the following conditions are met:
- $np \ge 5$
- $n(1-p) \ge 5$
This checks that there are at least 5 expected “successes” ($np$) and 5 expected “failures” ($n(1-p)$).
Finding the Normal Parameters
If the condition is met, we can approximate $X \sim B(n, p)$ with a Normal random variable $Y \sim N(\mu, \sigma^2)$, where:
- Mean ($\mu$): The mean of the normal distribution is the same as the expected value of the binomial.
\(\mu = np\)
- Standard Deviation ($\sigma$): The standard deviation of the normal distribution is the same as the standard deviation of the binomial.
\(\sigma = \sqrt{np(1-p)}\)
Such that
\(Y \sim N(np, np(1-p))\)
The Continuity Correction
This is the most critical step. We are using a continuous distribution (Normal) to model a discrete distribution (Binomial).
- In a discrete distribution, $P(X = k)$ is possible (e.g., the probability of exactly 5 successes).
- In a continuous distribution, the probability of any single exact value is zero ($P(Y = 5) = 0$).
The continuity correction bridges this gap by representing the discrete integer $k$ as a continuous interval from $k - 0.5$ to $k + 0.5$. We add or subtract 0.5 from the discrete value(s) to include the “full bar” of the binomial histogram.
Correction Rules
Let $X$ be the discrete Binomial variable and $Y$ be the continuous Normal variable.
| Discrete (Binomial) |
Continuous (Normal) with Correction |
| $P(X = k)$ |
$P(k - 0.5 \le Y \le k + 0.5)$ |
| $P(X \le k)$ |
$P(Y \le k + 0.5)$ |
| $P(X < k)$ |
$P(Y \le k - 0.5)$ (Same as $P(X \le k-1)$) |
| $P(X \ge k)$ |
$P(Y \ge k - 0.5)$ |
| $P(X > k)$ |
$P(Y \ge k + 0.5)$ (Same as $P(X \ge k+1)$) |
Mnemonic:
- To include a value (like in $\le$ or $\ge$), expand the interval outwards by 0.5.
- To exclude a value (like in $<$ or $>$), shrink the interval inwards by 0.5.
Summary of Steps
- Identify $n$ and $p$ from the binomial problem.
- Check Condition: Verify that $np \ge 5$ and $n(1-p) \ge 5$.
- Find Parameters: Calculate the mean $\mu = np$ and standard deviation $\sigma = \sqrt{np(1-p)}$.
- Apply Continuity Correction: Adjust your discrete value(s) $k$ to the continuous interval using the 0.5 correction.
- Calculate Z-Score(s): Use the adjusted value(s), $\mu$, and $\sigma$:
\(Z = \frac{Y - \mu}{\sigma}\)
- Find Probability: Use a standard Z-table to find the probability associated with your calculated z-score(s).
Example
Problem: A fair coin is tossed 400 times. What is the probability of getting exactly 210 heads?
- Identify $n$ and $p$:
- $n = 400$
- $p = 0.5$ (fair coin)
- Check Condition:
- $np = 400 \times 0.5 = 200$. This is $\ge 10$.
- $n(1-p) = 400 \times (1 - 0.5) = 200$. This is $\ge 10$.
- Condition is met. We can use the approximation.
- Find Parameters:
- $\mu = np = 200$
- $\sigma = \sqrt{400 \times 0.5 \times 0.5} = \sqrt{100} = 10$
- So, $Y \sim N(200, 10^2)$.
- Apply Continuity Correction:
- We want $P(X = 210)$.
- Using the correction: $P(210 - 0.5 \le Y \le 210 + 0.5) = P(209.5 \le Y \le 210.5)$.
- Calculate Z-Scores:
- For $Y = 209.5$: $Z_1 = \frac{209.5 - 200}{10} = \frac{9.5}{10} = 0.95$
- For $Y = 210.5$: $Z_2 = \frac{210.5 - 200}{10} = \frac{10.5}{10} = 1.05$
- Find Probability:
- We need $P(0.95 \le Z \le 1.05) = P(Z \le 1.05) - P(Z \le 0.95)$.
- From Z-table: $0.8531 - 0.8289 = 0.0242$.
- Answer: The probability of getting exactly 210 heads is approximately 2.42%.