Nate's Notes

Collection of notes for various classes I've taken.

Buy Me A Coffee

November 14

1. Confidence Intervals (CIs)

A confidence interval (CI) provides a range of plausible values for an unknown population parameter (such as the population mean $\mu$ or proportion $p$) based on data from a sample.

General Form

The basic structure for most confidence intervals is: \(\text{Point Estimate} \pm \text{Margin of Error}\) \(\text{Point Estimate} \pm (\text{Critical Value}) \times (\text{Standard Error})\)


CIs for a Population Mean ($\mu$)

🔹 Case 1: Population SD ($\sigma$) is Known (Rare)

When you know the true population standard deviation $\sigma$, you use the Z-distribution (standard normal). \(\bar{x} \pm z_{\alpha/2} \left(\frac{\sigma}{\sqrt{n}}\right)\)

🔹 Case 2: Population SD ($\sigma$) is Unknown (Common)

When $\sigma$ is unknown, you must estimate it using the sample standard deviation ($s$). This introduces more uncertainty, so we use the t-distribution. \(\bar{x} \pm t_{\alpha/2, \nu} \left(\frac{s}{\sqrt{n}}\right)\)


Interpretation of a 95% CI

This is a critical concept and a common point of confusion.

Steps to Construct a CI for $\mu$ ($\sigma$ unknown)

  1. Check Conditions:
    • Random: The data comes from a random, representative sample.
    • Independent: Observations are independent (or $n < 10\%$ of the population).
    • Normal/Large: The population is approximately normal, OR the sample size is large ($n \ge 30$) so the Central Limit Theorem (CLT) applies.
  2. Compute Statistics: Calculate the sample mean $\bar{x}$ and sample standard deviation $s$.
  3. Find Critical Value: Determine the confidence level (e.g., 95% $\rightarrow \alpha=0.05 \rightarrow \alpha/2 = 0.025$). Find $t_{\alpha/2, \nu}$ using degrees of freedom $\nu = n-1$.
  4. Calculate & Conclude:
    • Compute Standard Error (SE): $SE = \frac{s}{\sqrt{n}}$
    • Compute Margin of Error (ME): $ME = t_{\alpha/2, \nu} \times SE$
    • Form the interval: $\bar{x} \pm ME$
    • State the conclusion in context.

Single-Sided vs. Double-Sided CIs

2. Hypothesis Testing

A hypothesis test is a formal procedure to assess the validity of a claim (hypothesis) about a population parameter using sample data.

The Setup

  1. Null Hypothesis ($H_0$): The “status quo” or “no effect” hypothesis. It always contains an equality sign (e.g., $\mu = \mu_0$, $\mu \le \mu_0$). We assume this is true to begin the test.
  2. Alternative Hypothesis ($H_a$): The claim we are looking for evidence for. It never contains equality (e.g., $\mu \neq \mu_0$, $\mu > \mu_0$, or $\mu < \mu_0$).
  3. Significance Level ($\alpha$): The threshold for “unlikely.” It’s the probability of a Type I Error (rejecting $H_0$ when it is actually true) that we are willing to accept. Common choices are $0.05$, $0.01$.

The Test Statistic

The test statistic measures how far our sample estimate is from the null hypothesis value, in units of standard error. For a mean ($\sigma$ unknown): \(t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}\) This value follows a t-distribution with $\nu = n-1$ degrees of freedom (assuming $H_0$ is true).

The p-value

The p-value is the single most important concept in hypothesis testing.

Definition: The p-value is the probability, assuming the null hypothesis ($H_0$) is true, of observing a test statistic as extreme (or more extreme) than the one computed from the sample.

The Decision

We compare the p-value to our significance level $\alpha$.

Note: We never “accept $H_0$.” We only find that we don’t have enough evidence to throw it out.

3. t vs. Z — When to Use Each

This is a common source of confusion. The key is to know what parameter you’re testing (mean or proportion) and whether the population standard deviation ($\sigma$) is known.

Parameter Population SD ($\sigma$) Sample Size ($n$) Test Statistic Notes
Mean ($\mu$) Known Any Z (Very rare in practice)
Mean ($\mu$) Unknown Small ($n < 30$) t (Requires population to be approx. normal)
Mean ($\mu$) Unknown Large ($n \ge 30$) t (CLT applies. $t$ is always correct.)
    Alternatively Z (Many texts use Z as an approximation since $t \to Z$ as $n \to \infty$. $t$ is safer.)
Proportion ($p$) (N/A) Large Z (Requires $np \ge 10$ and $n(1-p) \ge 10$. SE = $\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$)

Key Takeaway: The t-distribution is used when we estimate $\sigma$ with $s$. It has “heavier tails” than the Z-distribution to account for the extra uncertainty from this estimation. As $n$ gets large, this extra uncertainty becomes negligible, and the $t$-distribution becomes virtually identical to the Z-distribution.

4. Worked Examples

Example 1: 95% CI for a Mean ($\sigma$ unknown)


Example 2: One-Sample t-Test (Two-Sided)


Example 3: Sample Size Planning


Example 4: Two-Sample CI (Independent, Equal Variances Assumed)


Example 5: Quick Checklist for Exam Answers

When writing up a hypothesis test: