So, here we can see that we have a sample size of 148,654.

So, it is a huge sample size that we've been collecting it for three years.

We then see our standard deviation of 50,517 and this is in units of dollars

which means that on average a person will fall $50,000 above or below the mean value.

Then we have our actual mean of

$74,768 and let's compare this to the median which is 71,427.

So, we can see that the mean is about $3,000 more than the median and again,

that's because of our right skew that we saw.

We then have our maximum.

So, the greatest value in our data was 567,595.

So, somebody's making a lot of money in San Francisco.

We then can look at our quartiles.

So, the third and the first quartile, again,

we could make the IQR between these two to get another measure of our spread.

Finally, we have the minimum which is negative 618.1 and

this is a little perplexing because somebody should not have to pay their employer money.

So, this is most likely due to input error or maybe

there are some sorts of like tax regulations in San Francisco.

There's something underlying this minimum value that I'm not quite sure of what it is.

But that's always something that comes about when you're doing these numerical summaries,

you want to raise questions of why something might be the case.

For our final example,

we were looking at exam scores.

So, here we saw a left-skewed distribution.

We said it was centered at about 80 points and had it spread from

15-100 and we said that there would be many outliers below 50.

So, now that we have a left-skewed distribution,

what we would expect to see is that the mean will be less than the median.

So, let's see if our numerical summaries lineup with that guess.

So, here we have our numerical summaries for the exam scores.

It looks a little bit different than the last two again

just because whatever software you're using,

will look a little bit different.

So, we again have our five number summary of the min,

the 25th percentile Q1, the median,

75th percentile Q3 and the maximum and then on top of that we have the mean,

standard deviation and n the sample size.

So, for this one,

we see our median is 78 and our mean is 76.3.

So, our guess on the previous slide that the mean would be less than the median

was correct and that is all because of that left skew that we had.

We have outliers on the lower end and so that's pulling the mean towards it.

The median is what we call a robust estimate of the center,

meaning it's not influenced by outliers.

We then have a standard deviation of 14.4 meaning that on average

a user's score for this exam was about 14.4 points away from the mean and again,

we could calculate the IQR as the Q3 minus Q1.

So, for this one,

we find a Q3 minus Q1 to get the IQR.

We would end up doing 87 minus 68 and we'll get 19 as our IQR.

So typically, when it's left skewed or right skewed,

we want to include an IQR estimate because it's

a better form of letting the user know where exactly our data's falling.

The range is less robust to outliers.

So, our range here would be from

14-100 and you're not getting a good idea of where most of the data falls,

whereas the IQR does tell us where most of the data is.

To summarize numerical summaries,

we can also call these summary statistics,

we like to use them alongside our graphical representations.

So, things like histograms or box plots

to give a first impression of what our data looks like.

So, our graphical representations are usually fairly rough and

these numerical summaries on top of that allow for a lot more in depth analysis.

Depending on what software you use,

you'll typically have slightly different numerical summaries.

So, some might have only the five-number summary,

whereas others will have a standard deviation mean and sample size on top of that.