____ ____ ____ ____ _________ ____ ____ ____ ____ ||y |||i |||t |||s ||| |||s |||i |||t |||e || ||__|||__|||__|||__|||_______|||__|||__|||__|||__|| |/__\|/__\|/__\|/__\|/_______\|/__\|/__\|/__\|/__\|
We will define the set of points as follows
And the line of best fit will be defined as
Finally to express the error of a line just sum the error for each point. The error for a point is the squared difference between the point's coordinate and the lines estimate for the point's coordinate, . The reason for squaring it is to make it always positive. Otherwise a positive error and negative error could cancel out. The reason for squaring instead of using an absolute value is for ease of the derivative. As an expression for each point it is . The total error for the whole line can be written as
Going forward I will be leaving the bounds off of the summations, as they will all be the same and only muddle up the notation.
The next step after finding the error function is to find the minimum.
It would be really convenient if there was one minimum and it was the only critical point on the function.
After expanding the function it is evident that its highest coefficient term of both and is parabolic.
That shows that it will have only one critical point.
Next is checking if the critical point is a minimum or if it is a maximum or saddle point.
As goes to positive or negative infinity the error function must go to positive infinity.
Less obviously but with similar logic the same is true for .
Having shown that the critical point is the point we are looking for, the next step is to find it.
We will start by taking the partial derivatives of the error function.
Now all that remains is to find where both of those equal zero. Starting with first set it to zero, then divide both sides by two. Then distribute the summations and you are left with
The can be distributed out of the summation and the can be moved to the other side.
For the next step remember what bounds on the summations are. The term means the sum of added for each point. The term means the total of all the coordinates. If we divide both sides by the number of points then will change to and will change to the average value. Now we have the equation
showing that line of best fit goes through the average point. The other equation doesn't turn out as neat, but still isn't too bad to solve. Doing very similar steps as to the last equation you can find that
Changing all the summations to averages will not be as nice as last time, but still make it easier to work with.
You can then plug in the other formula to that and solve for to get
The result from the can be used in the equation to find the line of best fit.