When we have more than two means to compare, we use a method called analysis of variance (ANOVA) and a new test statistic called \(F\). ANOVA uses a single hypothesis test to check whether the means across many groups are equal: - \(H_0\): The mean outcome is the same across all groups. - \(H_A\): At least one mean is different.

Three conditions must be checked before performing ANOVA: - the observations are independent within and across groups - the data within each group are nearly normal - the variability across groups is about equal

Strong evidence favouring the alternative hypothesis in ANOVA is described by unusually large differences among the group means. Assessing the variability of the group means relative to the variability among individual observations within each group is key to ANOVA's success.

ANOVA and the \(F\)-test¶

ANOVA in this context focuses on one question: is the variability in the sample means so large that it seems unlikely to be from chance alone?

The variability between many groups is called the mean square between groups (MSG), and it has an associated degrees of freedom, \(df_G = k-1\), when there are \(k\) groups. The MSG can be thought of as a scaled variance formula for means. If the null hypothesis is true, any variation in the sample means is due to chance and shouldn't be too large.

The MSG alone isn't very useful - we need a benchmark to compare it to. We compute a pooled variance estimate, called the mean square error (MSE) which has an associated degrees of freedom value \(df_E = n-k\). MSE is essentially a measure of the variability within the groups.

When the null hypothesis is true, any differences among the sample means are only due to chance, and the MSG and MSE should be about equal. The test statistic \(F\) is given by \(\(\large F = \frac{MSG}{MSE}\)\) We can use the \(F\) statistic to evaluate the hypotheses in what is called an \(F\)-test.

A p-value can be computed from the \(F\) statistic using an \(F\)-distribution, which has two parameters: \(df_1\) and \(df_2\). For the \(F\) statistic in ANOVA, \(df_1 = df_G\) and \(df_2 = df_E\).

The larger the observerd variability in the sample means (MSG) relative to the within-group observations (MSE), the larger \(F\) will be and the stronger the evidence against the null hypothesis. Since larger valuse of \(F\) represent stronger evidence against the null hypothesis, we use the upper tail of the distribution to compute a p-value.

[!def] The \(F\) statistic and the \(F\)-test Analysis of variance (ANOVA) is used to test whether the mean outcome differs across 2 or more groups. ANOVA uses a test statistic \(F\), which represents a standardized ratio of variability in the sample means relative to the variability within the groups. If \(H_0\) is true and the model conditions are satisfied, the statistic \(F\) follows an \(F\) distribution with parameters \(df_1=k-1\) and \(df_2=n-k\). The upper tail of the \(F\) distribution is used to represent the p-value.

Extras¶

Formula for MSG¶

\(\(\large MSG=\frac{1}{df_G}SSG=\frac{1}{k-1}\sum_{i=1}^{k}n_i(\bar{x}_i-\bar{x})^2\)\) where \(SSG\) is the sum of squares between groups and \(n_i\) is the sample size of group \(i\).