Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

來自 University of Houston System 的課程

Math behind Moneyball

36 個評分

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

從本節課中

Module 10

You will learn how Kelly Growth can optimize your sports betting, how regression to the mean explains the SI cover jinx and how to optimize a daily fantasy sports lineup. We close with a discussion of golf analytics.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

In this video we'll talk about a very important statistical concept called

regression towards the mean.

And then we'll apply it to sports betting.

Okay, so, Sir Francis Galton, I believe, was the first person to come up with this.

I believe he was a eugenicist, whatever that means.

Sort of like a biologist who looked at genes.

And so what he observed was the following.

Tall parents have tall kids, but

their kids are closer to average than the parents are.

Okay, and so the way you can explain this, we've talked briefly about correlation.

Correlation, you can find it with the correl function.

Is a measure of linear association between two variables.

So it's between minus 1 and plus 1, and it's unit free.

So a correlation near plus 1 means that basically your two variables,

let's call them x and y, tend to go up and down together.

Correlation near minus 1, when x goes up, y tends to go down,

and when x is above average, y tends to be below average.

Correlation near 0 means a weak measure of linear association.

But what you can show is if you run a regression with one dependent variable, so

you predict y as a function of x, let's say.

You can say your predicted value for y,

we'll get this tied to sports in a few minutes.

How many standard deviations above average will it be?

And the answer is, it's not blowing in the wind,

it's the correlation times how many standard deviations x is above average.

So if your correlation between a parent's height and

a kid's height is 0.5, and x was the height of the father and

y is the height of the son, you'd expect the height of the son to be 0.5.

Take the number of standard deviations above average the father's height is,

cut that in half, and that's how many standard deviations above average you

would expect the son's height to be.

And since correlations between plus 1 or minus 1, in terms of standard deviations,

y will be closer to its mean in standard deviations than x is,

in other words, in terms of z-score.

And we'll see where this

has an application in sports in a couple of minutes.

But let's take a simple example.

Okay, so let y be let's say the daughter's height.

And x is the father's height.

And so let's suppose the mean of x is 69 inches.

Mean of y is 65 inches.

Let's suppose the standard deviation of x is 3 inches, to keep it simple.

Standard deviation of y is 2 inches.

And the correlation is 0.4.

So take a father two standard deviations above average,

And predict the daughter's height.

Okay, well, if the father is two standard deviations above average,

the correlation's 0.4.

We predict the daughter to be 2 times 0.4 or

0.8 standard deviations above average.

So you would take the average height,

which is 65 plus 0.8 times the standard deviation of the daughter's,

which is 2, and you'd predict 66.6 inches.

But the point is, the father was two standard deviations above average,

you'll predict the daughter to be 0.8 standard deviations above average.

So what does this have to do with sports?

Well, a couple of things.

The Sports Illustrated or the Madden Jinx.

Okay, so in Sports Illustrated, someone's on the cover and then they say, well,

it's a jinx.

They don't do as well after they're on the cover.

Well, of course not.

They were on the cover because they did something really good,

unless they're members of the FIFA selection committee, I guess.

But usually you're on the cover of Sports Illustrated when you do something

that's really good.

And so you can't expect in the near future you would do as well.

The Madden video game jinx, when someone's on the cover of Madden,

it's because they had a great year.

And so there's nowhere to go but down because you had such a great year,

maybe you'll get hurt.

Okay, so where this has a real application in sports gambling

if y let's say is predicted wins next season for an NFL team.

And x is actual wins last season.

Okay, I don't have the Vegas stuff immediately available or

don't have the time to really do the example.

I mean, maybe we'll make it a homework or a test question.

But basically you'll see that y

tends to be quote the predictive wins next season tends to be closer to average wins,

which is eight for an NFL team than wins last season.

Now why is that?

If a team goes 12 and 4, what does it mean?

Their players did well and their key players probably didn't get hurt.

You expect them to do worse, or they'll maybe lose people to free agency.

If a team was 4 and 12, well, probably they had a lot of injuries or

they have young players they're developing.

You expect them to win more games.

So we can sort of look at this with the NFL and

you can see why the phenomenon's called regression towards the mean.

Your prediction is closer to the mean than your independent variable.

Now there's basically a stronger tendency to brush towards the mean in the NFL than

the NBA, and I think that's because a couple reasons.

The NFL, they vary the schedule.

If you play poorly, you get an easier schedule.

If you play well, you get a harder schedule.

The NBA doesn't do that.

And the other things is if you got LeBron James,

you're going to have a good team no matter else who's on that team, or Steph Curry.

And so basically you got the,

one player can make a much bigger difference in basketball.

And in football I don't think any one player,

except maybe possibly the quarterback, could make a huge difference.

But any one great NBA player will make a big difference in

the performance of an NBA team.

Okay, so let's try and examine this.

We have data, we have 2012 wins for every NFL team and 2011 wins for every NFL team.

So we want to predict y from x.

Now there's a couple of ways you can do this, and we'll get the correlation.

But you can do what's called the trend curve.

You can graph this stuff.

Now you should put what you want on the y-axis to be on the right, so

I flipped this around.

And if I do Insert > Scatter Plot, I can get a nice scatter plot of this data.

X-axis is 2011.

And so basically I can find the best fitting line here.

And this is how many wins they won last year.

So the x-axis is 2011 wins,

y-axis 2012 wins.

And you can do right-click, Add Trendline.

You can do straight line, show the equation.

And show the r square.

And you can see that's a line of positive slope, but the correlation is the square

root of the r squared, where you take the same sign as the slope of the line.

So if I take the square root of that 0.0773,

the correlation is 0.28 between the number

of wins an NFL team gets one year and the number of wins they get the next year.

So I can actually get all this stuff by their Excel functions,

which I don't think we've talked about.

I have them right here.

But if I want the correlation between the 2012 wins,

it doesn't matter which you put first for correlation.

Get 0.278.

Now the slope of that regression line, we see it's 0.262.

And if there's a slope function, you have to put the y column first.

Actually it would be easier if I name this stuff.

So we'll name this stuff Formulas >

Create from Selection, names in top row.

So if I take slope, there's a slope function.

And if I hit F3, the 2012 plays the role of y.

Hit F3, the 2011 plays the role of x.

And I get the slope is 0.262.

If you want the intercept, you could do an intercept function.

So you do F3 which is 2012, you gotta put the y first.

F3, the 2011 comes next.

Okay, now I can get the standard deviation of the 2011 and 2012.

Standard deviation of 2011.

And the standard deviation of 2012.

And while you're at it, you can get an r squared.

There's an RSQ function.

Doesn't matter what goes first, but we'll put 2012 first.

F3, 2011.

And you get 0.08 there, rounding off.

Now these things we don't really need here.

This would be the standard error of the regression which we talked about

from the analysis tool pack.

If you want to do the standard error how accurate this tends to be,

you'd put 2012 first because it's what you're trying to predict, 2011 second.

Okay.

So let's take an NFL team that was two standard deviations above average.

I mean that's a really good.

Let's suppose an NFL team won 14 games last year.

Predict wins next year.

Now you could just plug 14 in here, and there's nothing wrong with that.

You would get the intercept plus the slope times 14.

Now we know the average both years is 8, so

we would predict them to only win about 10 games if they won 14 last year,

which is pretty small, but to do that in terms of regression towards the mean.

So 14 wins is how many standard deviations above average?

Well, you take 14 minus the mean of 8.

You divide by the standard deviation of 2011.

So that's 1.83 standard deviations above average.

So you predict 2012.

You take this correlation times that to be.

So predict 2012.

0.509 standard deviations above average.

And what that would simply be would be eight wins is average.

You would take 0.509 times the standard deviation for 2012.

And I got 9.57 instead of 9.54.

I mean, that's virtually a wash there.

I mean, I've rounded off a little bit.

But that gives you a good feel how regression towards the mean works.

You'll almost always expect the sporting

team the next season to play closer to average than they played last season.

I think that's enough said about that.

Well, just one reference.

Again, that book Thinking, Fast and Slow by Kahneman.

I just never have ready his.

Has a great chapter on regression towards the mean.

With some fantastic examples.

And even Kahneman, a Nobel Prize winner,

had trouble internalizing the meaning of regression towards the mean.

In his own work until he really thought about it.

So it's not an obvious concept.

But essentially what it means is when you predict a dependent variable,

your prediction from a linear regression will be closer to average,

in terms of standard deviations, than your independent variable.