An objective measure for finding the best line¶
When talking about the "best line" we are talking about the line with the smallest residuals. A common practice is to choose the line that minimizes the sum of the squared residuals: \(\(e_1^2+e_2^2+...+e_n^2\)\) This line is known as the least squares line.
Conditions for the least squares line¶
When fitting a least squares line, we generally require - Linearity: The data should show a linear trend - Nearly normal residuals - Constant variability: The variability of points around the line remains roughly constant. - Independent observations
Finding the least squares line¶
The slope of the least squares line can be estimated by \(\(\large b_1=\frac{s_y}{s_x}R\)\)
The point \((\bar{x}, \bar{y})\) should be on the least squares line.
The point-slope is given by
\(\(\large y - y_0 = b_1 \times (x - x_0)\)\)
The slope describes the estimated difference in the \(y\) variable if the explanatory variable \(x\) for a case happened to be one unit larger. The intercept describes the average outcome of \(y\) if \(x = 0\) and the linear model is valid all the way to \(x = 0\) (which is often not the case).
Using \(R^2\) to describe the strength of a fit¶
It is more common to evaluate the strength of a linear relationship between two variables using \(R^2\) as opposed to just \(R\).