October 8

Central Limit Theorem (CLT)

The Central Limit Theorem (CLT) is one of the most fundamental concepts in statistics. It states that the sampling distribution of the sample mean ($\bar{X}$) will be approximately normally distributed, regardless of the original population’s distribution, as long as the sample size ($n$) is sufficiently large.

In simple terms: If you take many random samples from any population (even a skewed one) and calculate the mean of each sample, the histogram of those sample means will form a bell-shaped (normal) distribution.

Key Conditions

For the CLT to apply, the following conditions are generally required:

Randomization: Samples must be selected randomly.
Independence: Sampled values must be independent of each other. (If sampling without replacement, the sample size $n$ should be less than 10% of the population size $N$).
Large Sample Size: The sample size ($n$) must be “large enough.”
- Rule of Thumb: $n \ge 30$ is a commonly accepted guideline if the original population is not normal.
- If Population is Normal: If the original population is already normally distributed, the sampling distribution of $\bar{X}$ is exactly normal for any sample size $n$ (even $n=2$).

The Sampling Distribution of the Sample Mean ($\bar{X}$)

Let’s say we draw a sample of size $n$ from a population that has:

Population Mean: $\mu$
Population Standard Deviation: $\sigma$

The Central Limit Theorem describes the resulting distribution of all possible sample means ($\bar{X}$):

1. Mean of the Sample Means ($\mu_{\bar{X}}$)

The mean of the sampling distribution is equal to the original population mean.

\[\mu_{\bar{X}} = \mu\]

2. Standard Deviation of the Sample Means ($\sigma_{\bar{X}}$)

The standard deviation of the sampling distribution is called the Standard Error (SE). It is the population standard deviation divided by the square root of the sample size.

$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$ This formula shows that as the sample size $n$ increases, the standard error $\sigma_{\bar{X}}$ decreases. This means our sample means will be clustered more tightly around the true population mean $\mu$.

The Theorem (In Summary)

As $n$ gets large, the distribution of $\bar{X}$ approaches a normal distribution:

\[\bar{X} \sim N\left(\mu_{\bar{X}}, \sigma_{\bar{X}}^2\right) \quad \text{which is} \quad \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\]

Using Z-Scores with the CLT

Because the sampling distribution of $\bar{X}$ is normal, we can use a z-score to find probabilities. The z-score formula is modified to use the parameters of the sampling distribution (its mean and standard error).

Z-Score for a Sample Mean:
\[Z = \frac{\bar{X} - \mu_{\bar{X}}}{\sigma_{\bar{X}}} = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}\]
Interpretation: This z-score tells us how many standard errors a particular sample mean ($\bar{X}$) is away from the population mean ($\mu$).

Example

Problem: The average weight of a certain species of apple is $\mu = 150$ grams, with a standard deviation of $\sigma = 15$ grams. The distribution of weights is unknown and may be skewed.

If we take a random sample of $n = 36$ apples:

1. Describe the sampling distribution of the sample mean ($\bar{X}$).

Mean ($\mu_{\bar{X}}$): $\mu_{\bar{X}} = \mu = 150$ g.
Standard Error ($\sigma_{\bar{X}}$): $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{36}} = \frac{15}{6} = 2.5$ g.
Shape: Since $n = 36 \ge 30$, the CLT applies. The sampling distribution of $\bar{X}$ is approximately normal.
Result: $\bar{X} \sim N(150, 2.5^2)$

2. What is the probability that the mean weight of the sample ($\bar{X}$) is 153 grams or less?

We want $P(\bar{X} \le 153)$.
Calculate z-score: $Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} = \frac{153 - 150}{2.5} = \frac{3}{2.5} = 1.20$
Find probability from Z-table: $P(Z \le 1.20) \approx 0.8849$.
Answer: There is an 88.49% chance that the mean weight of a 36-apple sample will be 153 grams or less.