Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

來自 University of Houston System 的課程

Math behind Moneyball

43 個評分

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

從本節課中

Module 9

You will learn how to rate NASCAR drivers and get an introduction to sports betting concepts such as the Money line, Props Bets, and evaluation of gambling betting systems.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay, in this video we'll begin our discussion of sports gambling.

So what, replace the start, then looking at NFL point spreads.

So basically an NFL point spread is a bookmakers prediction

about the outcome of the game from the standpoint of points.

We'll talk about the money line.

Which is a prediction of who will win the game in the next video for NBA.

But here are some NFL point spreads.

I've got about twenty five hundred of them.

So visitor prediction from the standpoint of the visiting team.

So minus seven means Las Vegas was setting a line of minus seven for

the visiting team.

Which means if the visiting team won by More than minus seven,

or loss by six or less, you'd win the bet on the visiting team.

If the visiting team, [COUGH], excuse

me, lost by eight or more, you would lose the bet on the visiting team.

And if they lost by seven, you would basically, it would be a push and

nobody would win or lose money.

And the way it works on these bets usually is.

If you loose the bet.

You bet ten dollars you loose 11 dollars.

And if you win the bet, you win ten dollars.

So Vegas can put half the money on each side then they're guaranteed to win.

Because for everybody who wins ten dollars,

there is a person who looses eleven dollars.

And so if you think about that,

if you think of that as $21 bet, they would keep $1.

So that would be their edge there.

As another example, if a team was favored by, let's take a decimal here,

5.5 If this same team was favored by five and a half,

they would have to win by six or more for you to win a bet on the visiting team.

And by five or less, they would have to win by five or less including losses for

you to lose the bet.

And there could be no tie with the half point bet there.

Okay, so these are sort of forecasts for games.

And so when you look at forecasts, you want to evaluate them on two quantities.

Bias, now bias often means prejudice, and

that's been a lot in the news lately, okay.

But here, bias means on the average, are the forecasts right.

Are they not too high or too low?

And this is burring for the check in your business or your forecast on the average.

You want to know not too high or not too low.

In other words if you look at the error which is the actual amount

the visiting team

Minus the predicted win margin by Vegas.

You would like that average to be zero, but the average is significantly positive.

It means the visiting team did better than expected.

If the average is significantly negative,

or the visiting team did significantly worse than expected.

So here we computed the error.

The visiting team did a point worse than expected.

Here they did ten points worse.

We've also tabulated did the visiting team cover the bet.

It basically, if the error,

if the visiting outcome is greater than the visiting prediction.

Okay then yes, they covered the bet.

If the visiting team outcome was worse than the visiting prediction,

no they didn't cover the bet.

Otherwise it's a push.

In other words, here the visiting team was a minus three point favor, or

a three point underdog.

And that's exactly what happened.

So no money would change hands on that.

Vegas would be very, very disappointed.

Okay, so bias is an average forecast, not too high or too low.

Accuracy is really the standard deviation of the forecast.

That's how we.

And we did the NFL simulation.

Analysis a couple of videos ago I forgot to mention the standard deviation

on the NFL games is around 14 points and you'll see that here.

Okay, so let's check for bias.

Let's average those errors.

Hopefully it's near zero, if it's significantly positive.

And we can talk in a later video.

It's basically the bias significant or not.

It turns out it's not.

Okay, if there was a bias bookmaker better could explain.

If you average these errors,

it's about 0.03 I think or 0.04.

That means the average team, visiting team here get .04 points better than

the bookies predicted them to be, which doesn't seem significant, and it's not.

Now, what's a typical standard deviation [INAUDIBLE].

Standard deviation, of the forecast errors.

13 and half points, just around the 14 that we'd said it would be.

And are these errors normally distributed?

We'd like to think this.

It's a little harder to football, because you know most scores in football cluster

around three points or seven points.

So you're not going to see a random distribution of the errors that might.

I mean in basketball I'm sure forecast errors about the points aren't normally

distributed but in football they pretty much are.

So how do you tell something's normally distributed?

Well, we talked a bit about the normal random variable in an earlier video.

You need two things, sort of, for something to be normal.

You need it to be symmetric, and

there's a measure called skewness, that measures the symmetry of a data set.

About it's central value,

if you have a normal distribution, the mean, median, and the mode.

Should all be about the same, and

this unit should be between plus one and minus one.

And I've named these ranges.

There's a skew function in Excel, so is this between plus one and minus one?

So the error column, what's the skewness, should be near zero.

It's minus .05.

There's great symmetry here.

There's almost no lack of symmetry.

Skewness, Greater than plus one or

less than minus one indicates a lack of symmetry.

Now the other thing you would need in variable is symmetric so you gotta check

for makes your data symmetric about it to me and close to zero.

The other thing is does the density

of your data if you would apply a frequency distribution.

If your data does it drop off like the normal curve?

And there's a function for that called cryptosis.

That sounds like a fatal disease.

He's got cryptosis.

Patrick Dempsey has three months to live.

Well he's off Grey's Anatomy.

Check the tabloids if you want to know why.

Okay, now kurtosis again should be measures how your histogram

drops off compared to the normal and at kurtosis should be near zero for normal.

So lets say we want kurtosis to be between minus one and plus one,

then we can make this more precise [INAUDIBLE] normal.

So it's a [INAUDIBLE] here between minus one and plus one.

So the function is [INAUDIBLE].

And it's very close to zero.

So these forecast errors, really indeed seem to be normally distributed.

And will use that assumption in next videos where we have a points spread and

we figure out the chance of a team winning a game.

We will assume the mean outcome of the game is the point spread.

And the actual score of the merchant scored the game

is normally distributed with the standard deviation based on the score.

And we'll be using a normal random variable just to make probabilities of

winning the game.

And then we can talk about the money line and show that the money line and

the point spreader are consistent if you assume that the forecast errors

are normally distributed.

Okay, but let's go is one sort of,

perhaps, law that people have over the years, I believe,

Steven Levitt, who wrote the Freakonomics book wrote an article on this.

And the book briefly talks about that.

Home underdogs seem to do pretty well.

If they're big underdogs.

So let's try this out.

So a team would be a big home underdog If the visiting team let's say is favored

by eight points or more.

And lets take a look at do those teams do worse than expected?

If the visiting team is a big favorite, okay,

I'm saying they'll do worse than expected and they won't cover the spread as much.

So let's check that, how would you check that out?

Okay, so let's figure out how many games fit this category.

The average error which we'd expect to be negative if they don't do as well.

And let's figure out the wins and losses against the point spread.

We could do the pushes but we don't care.

Okay.

So how many games are there where the visiting team is favored by at

least eight points?

>> Meaning the prediction column is at least eight,

so I do a count if the prediction column.

And I've put in quotes, greater and equal to eight

A 121 games, the visiting team was favored by eight points.

Not very often.

It's because the home team just got three points.

Okay, so now what's the average of the error column?

We probably should use the Function Wizard here so we don't have to think too hard.

So I could do it and average it.

In other words, if the point spread's greater or equal to a,

average the error column.

So I go to average if see, we use these functions all the time

Okay, so F3 If the visiting prediction,

Is greater than or equal to e, Let's average the error column.

We expect this to be negative, if I do this right Oh and it is.

And now is that significant?

If a visiting team is favored by eight points or

more, they underperform this spread by one and a half points or more.

Now what's the record?

We'll see if this is significant in a later video.

Okay.

Now, how many wins against the spread?

We could do a count of that

Should pop up in a minute.

Okay, so the first criteria would be the visitor prediction is greater equal to 80.

And the second criteria.

You don't need the quotes in there when you use the function wizard, is

That the visitor outcome Is a yes for covering.

Okay we must have done something wrong there.

Okay the visitor outcome if the visiting prediction is greater or equal to eight.

Let's try it [INAUDIBLE] Let's try it without the function.

We put count it as okay.

The first range is the literal prediction.

Greater or equal to eight.

And the next column would be the visitor, or it should be visitor cover.

I'm sorry, that was my mistake.

The visitor cover should be a yes

And they cover 52 times and now if I do a count at best.

There's two criteria here.

Okay the visitor prediction greater or equal to eight.

And then we want the visitor cover being a no.

I don't really care if that pushes it.

And there weren't any pushes because they're all wins or losses.

So in other words, the visiting favor

Visiting team's favored by at least eight points.

Covered the spread 51 of 121 columns.

Now the question is, is that significantly less than expected.

That's way less than, it's less than 50% it's 42% and so

there we have to know something about hypothesis testing to test that.

We'll come back to that in a couple of videos, okay.

In other words, it says -1.46 points,

a significant underperforming of the big visiting favorites.

And is the 42% against the spread significantly less

than 50% against the spread, which you would get, probably, by a coin toss.

So we'll see that a couple of different this is the type of stuff

you need to look at when you're interested in sports.