The sample mean \(\bar{x}\) (the mean value of all samples) can be modeled using a normal distribution when certain conditions are met.

[!def] Central Limit Theorem for the sample mean When we collect a sufficiently large sample of \(n\) independent observations from a population with mean \(\mu\) and standard deviation \(\sigma\), the sampling distribution of \(\bar{x}\) will be nearly normal with a mean of \(\mu\) and a Standard Error (SE) of \(\sigma \div \sqrt{n}\)

However, when the standard deviation is not known, we have to use the \(t\)-distribution.

The two conditions required for modeling \(\bar{x}\)¶

Independence: the sample observations must be independent, i.e. simple random sample, or a sample of a random process like rolling a die
Normality: When a sample is small, the observations must come from a normally distributed population.

How to perform the normality check¶

There is no perfect way to check if a sample is normal, so instead two rules of thumb are used: - \(n < 30\): If sample size \(n\) is less than 30 and there are no clear outliers in the data, we assume the data come from a nearly normal distribution - \(n \ge 30\): If the sample size \(n\) is at least 30 and there are no particularly extreme outliers, then we assume the sampling distribution of \(\bar{x}\) is nearly normal, even if the distribution of individual observations is not.

The \(t\)-distribution¶

We cannot directly calculate the standard error for \(\bar{x}\) since we rarely know the population standard deviation \(\sigma\) in the real world. We can estimate the standard error similar to how we estimate the standard error for a sample proportion: \(\(SE=\frac{\sigma}{\sqrt{n}}\approx\frac{s}{\sqrt{n}}\)\) However, when the sample size is small, this leads to problems when using the normal distribution to model \(\bar{x}\). Instead, we use the \(t\)-distribution, which looks similar to a normal distribution, but with thicker tails. It has one parameter - degrees of freedom, \(df\) - and as \(df\) increases, the \(t\)-distribution approaches a normal distribution.

[!def] Degrees of freedom (\(df\)) The degrees of freedom describes the shape of the \(t\)-distribution. The larger the degrees of freedom, the more closely the distribution approximates a normal model. When modeling \(\bar{x}\) using the \(t\)-distribution, use \(df=n-1\).

Similar to the normal distribution, we use either statistical software or a table to find the values of the tails.

One sample \(t\)-confidence intervals¶

In the normal model, we use \(z^*\) and the standard error to determine the width of a confidence interval. The formula is similar with \(t\)-distributions: \(\(\bar{x}\pm t^*_{df}\times\frac{s}{\sqrt{n}}\)\) Similar to normal distribution, we use a table or software to find the cutoff point corresponding to the confidence level, \(t^*_{df}\)

Confidence interval for a single mean¶

Prepare: Identify \(\bar{x}, s, n\) and determine what confidence level you wish to use
Check: Verify conditions to ensure \(\bar{x}\) is nearly normal
Calculate: If conditions hold, compute \(SE\), find \(t^*_{df}\), and construct the interval
Conclude: Interpret the confidence interval in the context of the problem

Hypothesis testing for a single mean¶

Prepare: Identify parameter of interest, list out hypotheses, identify significance level, identify \(\bar{x}, s, n\)
Check: Verify conditions to ensure \(\bar{x}\) is nearly normal
Calculate: If conditions hold, compute \(SE\), compute T-score, identify p-value
Conclude: Evaluate the hypothesis test by comparing the p-value to the significance level \(\alpha\), and provide a conclusion in the context of the problem.