[!def] Sampling distribution of $\hat{p}$ The sampling distribution for $\hat{p}$ based on a sample size $n$ from a population with a true proportion $p$ is nearly normal when: 1. The sample's observations are independent, e.g. are from a simple random sample 2. We expected to see at least 10 successes and 10 failures in the sample, i.e. $np \ge 10$ and $n(1-p)\ge 10$ (success-failure condition) When these conditions are met, the sampling distribution of $\hat{p}$ is nearly normal with mean $p$ and standard error $SE = \sqrt{\frac{p(1-p)}{n}}$

A confidence interval provides a range of plausible values for the parameter $p$, and when $\hat{p}$ can be modeled using a normal distribution, the confidence interval for $p$ takes the form $\hat{p} \pm z^* \times SE$

[!check] Confidence interval for a single proportion 1. Prepare: Identify $\hat{p}$ and $n$, and determine what confidence level you wish to use. 2. Check: Verify the conditions to ensure $\hat{p}$ is normal. Use $\hat{p}$ in place of $p$ to check the success-failure condition. 3. Calculate: If the conditions hold, compute $SE$ using $\hat{p}$, find $z^*$, and construct the interval. 4. Conclude: Interpret the confidence interval in the context of the problem.

[!check] Hypothesis testing for a single proportion 1. Prepare: Identify the parameter of interest, list hypotheses, identify the significance level, and identify $\hat{p}$ and $n$ 2. Check: Verify conditions to ensure $\hat{p}$ is nearly normal under $H_0$. Use the null value ($p_0$) to check the success-failure condition. 3. Calculate: If the conditions hold, compute the standard error using $p_0$ 4. Conclude: Evaluate the hypothesis test by comparing the p-value to $\alpha$, and provide a conclusion in the context of the problem.

When conditions aren't met¶

What happens when the success-failure condition fails? Or the independence condition fails? The strategy to generate the interval or p-value change. When the success-failure condition isn't met, we can simulate the null distribution of $\hat{p}$ using the null value, $p_0$. For a confidence interval when the success-failure condition isn't met, the Clopper-Pearson interval is used.

Choosing a sample size when estimating a proportion¶

Often a sample size is chosen to be large enough that the margin of error is sufficiently small that the sample is useful.

The margin of error for a sample proportion is $z^* \sqrt{\frac{p(1-p)}{n}}$.

[!example] A university is doing a survey to determine what fraction of students support a $200 per year tuition increase to pay for a new building. How big of a sample is required to ensure the margin of error is smaller than 0.04 using a 95% confidence interval? Our goal is to find the smallest value of $n$ such that the margin of error is smaller than 0.04. A 95% confidence interval corresponds to $z^* = 1.96$ $1.96 \times \sqrt{\frac{p(1-p)}{n}} \lt 0.04$

There are two unknowns: $p$ and $n$. If we have an estimate of $p$, we can use that to solve for $n$. If we don't have such an estimate, we use the worst case value 0.5 for $p$. $1.96 \times \sqrt{\frac{0.5(1-0.5)}{n}} \lt 0.04$ $1.96^2 \times \frac{0.25}{n} \lt 0.04^2$ $1.96^2 \times \frac{0.25}{0.04^2} \lt n$ $600.25 \lt n$ We would need over 600.25 (so 601) participants to ensure the sample proportion is whithin 0.04 of the true proportion within 95% confidence.