Let's see a training set shown below:
In this training set, we can make hypothesis function like this:
$$h_\theta = \theta_0 + \theta_1x$$
The $\theta_i$ is parameters. Below graphs show shapes depends on the $\theta_i$:
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.
$$J(\theta_0, \theta_1)=\frac{1}{2m}\sum_{i=1}^m {(\hat{y_i}-y_i)}^2=\frac{1}{2m}\sum_{i=1}^m {(h_\theta(x_i)-y_i)}^2$$
To break it apart, it is $\frac{1}{2}\bar{x}$ where $\bar{x}$ is the mean of the squares of $h_\theta(x_i)-y_i$, or the difference between the predicted value and the actual value.
This fuction is otherwise called the "Squared error function", or "Mean squared error(MSE)". The mean is halved (1/2) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the (1/2) term. the following image summarizes what the cost function does:
The error is a real value minus a hypothesis value. (It is same as hypothesis value minus real value.) So, some errors are negative values and the others are positve values. When we plus all these values, it will be close to 0. If then, we can estimate average of errors. That's why we square every errors.
The cost function can estimate average of total error, for high accuracy, we have to set the hypothesis to minimize cost function.
Q. Consider the plot below of $h_\theta(x)=\theta_0+\theta_1$. What are $\theta_0$ and $\theta_1$?
Ans. θ(0) is 0.5 and θ(1) is 1.
Cost Function - Intuition Ⅱ (0) | 2020.10.26 |
---|---|
Model Representation (0) | 2020.10.26 |
Unsupervised Learning (0) | 2020.10.26 |
Supervised Learning (0) | 2020.10.26 |
What is Machine Learning? (0) | 2020.10.26 |
댓글 영역