Welcome back. Now we talk about regression. This is the last of the three supervised learning algorithms we are going to cover. So let's have a very quick review of high-school algebra. So, Y equals C, C now is a constant. If I have coordinates like this X and Y. Y equal C that means X is out of the picture. So we just have this. This is the C So in this case, Y equals Beta zero plus Beta one times X. What is this?, It is X and Y. So, we can randomly just pick one, something like this. Okay. So, this one says, this is Beta zero when X equals zero so it's here, then beta one is the slope of this line. But now let's look at this as the learning problem. We have data points here. Then we try to use this line to approximate the function we want to learn. Then, we have some errors. Right?. So, that's why we write this in terms of Y minus, it's the left-hand side minus the right-hand side there's a error. So how do we do?. So we just say we have error now, so we need to make sure whatever we do, the error should be minimized. How can we do that?. Well, the data sets that we have. So now we have propersumer approximation. In this case, I have so many data points. But I want to find a simple model to explain all of these data points. In this case, the simplest one I can say Y equals Beta zero. So Beta zero now it is a parameter I want to estimate. So I just to move along this Y axis. Right?. I can do this. It's very good that you see for all of these data points errors is small but not good for this error points. So if I do this here then the problem is reversed. Then good for these data points not good for this data points. So obviously there is an optimal data line. Somewhere here, I can minimize the total number of errors. So, this is like approximation I tried to minimize the error. Error equals Y minus beta zero. More complicated case. In this case, I have also these data points. Obviously, I cannot use constant line to approximate these points. So, I can do this. I can do this or I can also do this. Alright. Now my issue is still epsilon equals Y minus Beta zero plus Beta 1X, I want to minimize the error. So if we use this linear line here, then this is the error. Right?. You can add this error. So, we try to minimize the error and by finding Beta zero and Beta one. Because now I have so many data points, I need another efficient representation for learning. This is the case. Now in regression, we have all these data points. Of course I can find a very complicated function to fit all of these data points. But is it necessary? Is it good to do so? Or we just use the linear regression. The linear regression is just one line here. It's a linear equation. We can find one linear equation, this is the pattern we would like to learn to fit all of these data points. Probably it can do a good job. But if I also want to more complicated one, I can do this right? You say what's wrong with this. Now this is of course not linear anymore. One potential problem is called over-fit. We don't want to over-fit our data because we are performing learning task. For learning, we don't just deal with what we have now. We want to work on something in the future we don't know. In most cases, a simple model works well. So now when we have data points more than one or two and many, we can use a convenient representation matrix to represent. So, the previous equation we can rewrite into Y, this is a vector and X is a matrix and W. Now these are the parameters we want to estimate. Then we also have epsilon, this is the error. So now we need to minimize the error. But error can be positive or negative. In the previous slide we can see error can be positive and negative. That's why we usually either use error square or absolute values. In this case we use squares, the error squares. So that they don't cancel each other. Because positive and negatives errors do cancel each other. Then we want to minimize the error. It's called the least square method. This method is so popular and you can find almost in all software package. So you just need to rewrite the data into this form and pick the error formula and then you can find W, W is a set of coefficients we want to learn. That's all for supervised learning segment. Thank you.