An objective measure for finding the best line¶

When talking about the "best line" we are talking about the line with the smallest residuals. A common practice is to choose the line that minimizes the sum of the squared residuals: \(\(e_1^2+e_2^2+...+e_n^2\)\) This line is known as the least squares line.

Conditions for the least squares line¶

When fitting a least squares line, we generally require - Linearity: The data should show a linear trend - Nearly normal residuals - Constant variability: The variability of points around the line remains roughly constant. - Independent observations

Finding the least squares line¶

The slope of the least squares line can be estimated by \(\(\large b_1=\frac{s_y}{s_x}R\)\)

The point \((\bar{x}, \bar{y})\) should be on the least squares line.

The point-slope is given by \(\(\large y - y_0 = b_1 \times (x - x_0)\)\) The slope describes the estimated difference in the \(y\) variable if the explanatory variable \(x\) for a case happened to be one unit larger. The intercept describes the average outcome of \(y\) if \(x = 0\) and the linear model is valid all the way to \(x = 0\) (which is often not the case).

Using \(R^2\) to describe the strength of a fit¶

It is more common to evaluate the strength of a linear relationship between two variables using \(R^2\) as opposed to just \(R\).