For some linear regression problems the closed form solution, also known as the **normal equation,** is a better way to find optimum values of theta. If you know calculus then, to minimize the cost function you can find the partial derivatives of the cost function, with respect to each value of theta, and set each derivative to zero then solve for each value of theta. Don't worry if you are not familiar with calculus, it turns out we can derive the normal equation from these partial derivatives and this results in the following equation;

To make a bit more sense of this consider a concrete example such as building a model of house prices. Consider we have a training set consisting of only 4 samples, that is *m = 4. * Supposing *e*ach training sample has 4 features, x_{1} to x_{4}, as well as a bias feature x_{0} set to 1. We could tabulate this in the following way;

x_{0} |
Size (Metres^{2}) |
Number of bedrooms | Age (Years) | Energy rating | Price (k$) |

1 | 850 | 5 | 25 | 3 | 780 |

1 | 520 | 4 | 16 | 2 | 690 |

1 | 580 | 3 | 40 | 4 | 320 |

1 | 360 | 1 | 32 | 5 | 300 |

So now we can simply plug these values into our normal equation. Here is the python code that does exactly that;

Lets plug these values of theta into our hypothesis function, using the first house in the table above as an example to see how closely it predicts the actual price;

Here our linear hypotheses function gives us a predicted price of 686, for the first house in our table. This is considerably less than the actual price of 780. This is primarily because we have such a tiny sample size it is impossible to make any sensible prediction. Another reason is even though we have optimum values of theta, our linear hypotheses does not accurately represent the data because, like most real world data, it is non linear. We will look at ways to address this issue shortly.