Skip to content

[!def] Sampling distribution of \(\hat{p}\) The sampling distribution for \(\hat{p}\) based on a sample size \(n\) from a population with a true proportion \(p\) is nearly normal when: 1. The sample's observations are independent, e.g. are from a simple random sample 2. We expected to see at least 10 successes and 10 failures in the sample, i.e. \(np \ge 10\) and \(n(1-p)\ge 10\) (success-failure condition) When these conditions are met, the sampling distribution of \(\hat{p}\) is nearly normal with mean \(p\) and standard error \(SE = \sqrt{\frac{p(1-p)}{n}}\)

A confidence interval provides a range of plausible values for the parameter \(p\), and when \(\hat{p}\) can be modeled using a normal distribution, the confidence interval for \(p\) takes the form \(\hat{p} \pm z^* \times SE\)

[!check] Confidence interval for a single proportion 1. Prepare: Identify \(\hat{p}\) and \(n\), and determine what confidence level you wish to use. 2. Check: Verify the conditions to ensure \(\hat{p}\) is normal. Use \(\hat{p}\) in place of \(p\) to check the success-failure condition. 3. Calculate: If the conditions hold, compute \(SE\) using \(\hat{p}\), find \(z^*\), and construct the interval. 4. Conclude: Interpret the confidence interval in the context of the problem.

[!check] Hypothesis testing for a single proportion 1. Prepare: Identify the parameter of interest, list hypotheses, identify the significance level, and identify \(\hat{p}\) and \(n\) 2. Check: Verify conditions to ensure \(\hat{p}\) is nearly normal under \(H_0\). Use the null value (\(p_0\)) to check the success-failure condition. 3. Calculate: If the conditions hold, compute the standard error using \(p_0\) 4. Conclude: Evaluate the hypothesis test by comparing the p-value to \(\alpha\), and provide a conclusion in the context of the problem.

When conditions aren't met

What happens when the success-failure condition fails? Or the independence condition fails? The strategy to generate the interval or p-value change. When the success-failure condition isn't met, we can simulate the null distribution of \(\hat{p}\) using the null value, \(p_0\). For a confidence interval when the success-failure condition isn't met, the Clopper-Pearson interval is used.

Choosing a sample size when estimating a proportion

Often a sample size is chosen to be large enough that the margin of error is sufficiently small that the sample is useful.

The margin of error for a sample proportion is \(z^* \sqrt{\frac{p(1-p)}{n}}\).

[!example] A university is doing a survey to determine what fraction of students support a $200 per year tuition increase to pay for a new building. How big of a sample is required to ensure the margin of error is smaller than 0.04 using a 95% confidence interval? Our goal is to find the smallest value of \(n\) such that the margin of error is smaller than 0.04. A 95% confidence interval corresponds to \(z^* = 1.96\) \(1.96 \times \sqrt{\frac{p(1-p)}{n}} \lt 0.04\)

There are two unknowns: \(p\) and \(n\). If we have an estimate of \(p\), we can use that to solve for \(n\). If we don't have such an estimate, we use the worst case value 0.5 for \(p\). \(1.96 \times \sqrt{\frac{0.5(1-0.5)}{n}} \lt 0.04\) \(1.96^2 \times \frac{0.25}{n} \lt 0.04^2\) \(1.96^2 \times \frac{0.25}{0.04^2} \lt n\) \(600.25 \lt n\) We would need over 600.25 (so 601) participants to ensure the sample proportion is whithin 0.04 of the true proportion within 95% confidence.