You could end up with invalid conclusions from your analysis because of errors in model design, even if you had perfect inputs, perfect data going in. There are many ways why the model could be incorrect. You could have errors in your model structure. You're doing extrapolation. You're doing feature selection. And there are some more fancy things that we'll get to. So let's look at all five of these things in turn. First, let's look at a question model structure. The thing that is important to keep in mind is that most machine learning just estimates parameters to fit a predetermined model. How do you know that the model you have is appropriate? Quite often, we end up choosing a very simple model. Not because we know that the simple model is going to work, but simply because that's easy. And so if you have a complex nonlinear process, that's going on in the world, and you've decided to build a simple linear model. You could try to fit the data to your linear model and you can learn the best linear fit, it's not going to be perfect and you're going to make a decision on whether that correctly represents anything or not. This kind of situation becomes particularly problematic when you have to do things like extrapolation. So here's this simple data, two dimensional data, where you set, you've got a perfect linear graph in the range from 1 to 4. And now you want to say, well Y equals 7, what is X equal to? Well, we really don't know. Perhaps we can assume, since it's so nice and linear between 1 and 4, that it'll continue to be linear from 4 on through 7. But perhaps not. Maybe the real world is extremely nonlinear, and our choice of linear model is wrong. And so extrapolation, in particular, is something that can be very dangerous, unless we have reason to know that our model is correctly chosen. There's an issue of feature selection. Did you know for example, that taller people are more likely to grow beards? No, you say, why is that true? Well, here's the logic, women are generally shorter than men, women don't grow beards. So if you look at the population at the universe of people who have beards, we're all men. Since men are generally taller than women, people with beards are generally taller than people without. Notice that this happened because we have a large fraction, about half the population who don't have beards and tend to be shorter. I did a perfectly reasonable correct analysis, came out with this factorial statement, which is on the first bullet here of the slide, which is completely true, but it doesn't tell us the first thing about taller versus shorter men. And that actually is the question that's perhaps much more interesting. Here's a related problem that we have. Often we have aggregated data and so we'll analyze results for a group and based on this analysis of group data, we'll ascribe results to the individual. For example, suppose we have data at the district level that tells us the districts with higher income have lower crime rates. Well, from here, we may be tempted to infer that richer people are less likely to commit crimes. Actually, this inference doesn't follow from the aggregate data. That is a possible reason why we might have got the aggregate result that we got, but there are other possibilities too, having to do with the way we have the distribution of people with incomes and crime rates and districts. One particular case of this kind of fallacy is something called the Simpson's paradox. And let's work through an example. Let's say that we're worried about gender discrimination. And I have the simple matrix here of two universities, Easy University and Hard University. And let's look at the acceptance rates of these two places. So reading the first row of this table, ten men apply to Easy University, seven of them were accepted, so that's 0.7 acceptance rate or 70% acceptance rate. Five women applied, four were accepted so we have 0.8 acceptance rate. So, if you look at Easy University, the acceptance rate for women is greater than the acceptance rate for men. Now, let's look at Hard University. Hard University is much harder to get into. Ten men applied only three got in. So, the acceptance rate for men is 0.3. Fifteen women applied and five got in, an acceptance rate of 1/3 or 0.33 which again, is a little bit higher than the acceptance rate for men. So, again at Hard University women are accepted at a slightly higher rate than men. But now let's look at the last row, which is the aggregate data of easy and hard combined. So if you look at all the universities put together, 20 men applied, 10 were accepted and so 50%. Twenty women applied and only nine were accepted, so it's 0.45. And there are fewer women being accepted than men to universities. What's the problem here? This is known as Simpson's Paradox. And this arises because the aggregate data is reflecting the combination of two separate ratios where the number of candidates in the two columns isn't the same in the two. So many more women than men apply to Hard U. And since the acceptance rates at Hard U are so much lower than the acceptance rates at Easy U, the overall result that we're seeing Is reflective of more women applying to Hard U rather than discrimination against women in this made up example.