Y-it's site in a flaming text gif

We will define the set of points P as follows

P = { ( 1,1 ) , ( 2,2 ) , ( 3,4 ) ... }

And the line of best fit will be defined as

y = m x + b

Finally to express the error of a line just sum the error for each point. The error for a point is the squared difference between the point's y coordinate and the lines estimate for the point's x coordinate, m x + b . The reason for squaring it is to make it always positive. Otherwise a positive error and negative error could cancel out. The reason for squaring instead of using an absolute value is for ease of the derivative. As an expression for each point it is ( y - m x - b ) 2 . The total error for the whole line can be written as

E ( m , b ) = ( x , y ) P ( y m x b ) 2

Going forward I will be leaving the bounds off of the summations, as they will all be the same and only muddle up the notation. The next step after finding the error function is to find the minimum. It would be really convenient if there was one minimum and it was the only critical point on the function. After expanding the function it is evident that its highest coefficient term of both x and y is parabolic. Σ ( y 2 + m 2 x 2 + b 2 2 y m x 2 y b + 2 m x b ) That shows that it will have only one critical point. Next is checking if the critical point is a minimum or if it is a maximum or saddle point. As b goes to positive or negative infinity the error function must go to positive infinity.

Less obviously but with similar logic the same is true for m.

Having shown that the critical point is the point we are looking for, the next step is to find it. We will start by taking the partial derivatives of the error function. E m = Σ ( 2 m x 2 2 y x + 2 x b ) E s = Σ ( 2 b 2 y + 2 m x )

Now all that remains is to find where both of those equal zero. Starting with E s first set it to zero, then divide both sides by two. Then distribute the summations and you are left with

0 = Σ b Σ y + Σ m x

The m can be distributed out of the summation and the Σy can be moved to the other side.

Σ y = Σ b + m Σ x

For the next step remember what bounds on the summations are. The term Σb means the sum of b added for each point. The term Σx means the total of all the x coordinates. If we divide both sides by the number of points then Σb will change to b and Σx will change to the average x value. Now we have the equation

y = m x + b

showing that line of best fit goes through the average point. The other equation doesn't turn out as neat, but still isn't too bad to solve. Doing very similar steps as to the last equation you can find that

m Σ x 2 + b Σ x = Σ x y

Changing all the summations to averages will not be as nice as last time, but still make it easier to work with.

m x 2 + b x = x y

You can then plug in the other formula to that and solve for m to get

m = x y x y x 2 x 2

The result from the can be used in the y = m x + b equation to find the line of best fit.