Nate's Notes

Collection of notes for various classes I've taken.

Buy Me A Coffee

October 8

Central Limit Theorem (CLT)

The Central Limit Theorem (CLT) is one of the most fundamental concepts in statistics. It states that the sampling distribution of the sample mean ($\bar{X}$) will be approximately normally distributed, regardless of the original population’s distribution, as long as the sample size ($n$) is sufficiently large.

In simple terms: If you take many random samples from any population (even a skewed one) and calculate the mean of each sample, the histogram of those sample means will form a bell-shaped (normal) distribution.

Key Conditions

For the CLT to apply, the following conditions are generally required:

  1. Randomization: Samples must be selected randomly.
  2. Independence: Sampled values must be independent of each other. (If sampling without replacement, the sample size $n$ should be less than 10% of the population size $N$).
  3. Large Sample Size: The sample size ($n$) must be “large enough.”
    • Rule of Thumb: $n \ge 30$ is a commonly accepted guideline if the original population is not normal.
    • If Population is Normal: If the original population is already normally distributed, the sampling distribution of $\bar{X}$ is exactly normal for any sample size $n$ (even $n=2$).

The Sampling Distribution of the Sample Mean ($\bar{X}$)

Let’s say we draw a sample of size $n$ from a population that has:

The Central Limit Theorem describes the resulting distribution of all possible sample means ($\bar{X}$):

1. Mean of the Sample Means ($\mu_{\bar{X}}$)

The mean of the sampling distribution is equal to the original population mean.

\[\mu_{\bar{X}} = \mu\]

2. Standard Deviation of the Sample Means ($\sigma_{\bar{X}}$)

The standard deviation of the sampling distribution is called the Standard Error (SE). It is the population standard deviation divided by the square root of the sample size.

\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\) This formula shows that as the sample size $n$ increases, the standard error $\sigma_{\bar{X}}$ decreases. This means our sample means will be clustered more tightly around the true population mean $\mu$.

The Theorem (In Summary)

As $n$ gets large, the distribution of $\bar{X}$ approaches a normal distribution:

\[\bar{X} \sim N\left(\mu_{\bar{X}}, \sigma_{\bar{X}}^2\right) \quad \text{which is} \quad \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\]

Using Z-Scores with the CLT

Because the sampling distribution of $\bar{X}$ is normal, we can use a z-score to find probabilities. The z-score formula is modified to use the parameters of the sampling distribution (its mean and standard error).

Example

Problem: The average weight of a certain species of apple is $\mu = 150$ grams, with a standard deviation of $\sigma = 15$ grams. The distribution of weights is unknown and may be skewed.

If we take a random sample of $n = 36$ apples:

1. Describe the sampling distribution of the sample mean ($\bar{X}$).

2. What is the probability that the mean weight of the sample ($\bar{X}$) is 153 grams or less?