0:00

In this video,

we will discuss methods of quantifying centers of numerical distributions,

building on the previous concepts of visualizing numerical variables.

Previously, we discussed shapes of numerical distributions.

We categorize distributions into three in terms of skewness.

Left skewed, symmetric, and right skewed.

And in terms of modality, we talked about variables that have a unimodal,

bimodal, uniform, or multimodal distribution.

0:27

Another key characteristic that is of interest is the center of

the distribution, commonly used measures of center are the mean,

which is simply the arithmetic average.

The median, which is the mid point of the distribution or in other words

the 50th percentile and the mode which is the most frequent observation.

If these measurements are calculated from a sample,

they're called sample statistics.

Sample statistics are point estimates for the unknown population parameters.

Their unknown, since it's usually not feasible to have information

on all observations in the population.

These estimates may not be perfect, but if the sample is good,

meaning representative of the population, they're usually good guesses.

We usually use letters from the Latin alphabet when denoting sample statistics,

and letters from the Greek alphabet when denoting population parameters.

For example, the sample mean is x bar and the population mean is mu.

1:25

Let's give a quick example with some simulated data.

Suppose we have exam scores from 9 students.

The mean of this distribution is simply the arithmetic average of these scores.

The mode is the most frequently observed value.

In this case, we have two students who scored 88, so the mode is 88.

However, we can see that with continuous distributions,

it may be very unlikely to observe the same exact value multiple times.

Therefore, the mode of a distribution is not always a very useful measure.

The median is defined as the midpoint or the 50th percentile of the distribution.

In order to calculate the median,

we need to first sort the data in increasing order.

The we find the mid-point of the ordered data which in this case happens to be 87.

But what if we didn't have an exact midpoint of the distribution?

Say we have one more student who scored 100.

Now the sample size is 10 and with an even number of observations

there isn't a simple value that divides the data in half.

In these cases,

the median as defined as the average of the middle of the two observations.

Here, we have 87 and 88 at the middle of our distribution so

the median based on this new data set would be 87.5.

Learning how these values are calculated by hand can be important for

also understanding the concepts, but

we should note that calculations like these are rarely done by hand.

Instead, we often rely on computation, which makes life much easier for

working with data with a larger number of observations.

3:01

For example, let's revisit the life expectancy and income per person data.

We established before that the distribution of the average life

expectancies are left skewed.

The mean is 70.51 and the median is 73.34.

The mean, indicated by the pink solid line on the plot

is lower than the median indicated by the orange dashed line.

This is expected based on the shape of this distribution.

Since there's a long tail to the left, the arithmetic average is being

pulled to the lower end by the observations in the lower tail.

On the other hand, in a right-skewed distribution,

like the distribution of average income per person for each country.

The mean is roughly $12,500 while the median is only $7,000.

The mean is much higher than the median because this time the longer tail is

on the right and the few countries with the very high income levels

compared to the others pull the mean up.

3:58

So to recap, in the left skewed distribution,

the mean is generally smaller than the median.

Since the few low valued observations pull the average down, in symmetric

distributions the mean and the median are now roughly equal to each other.

And in right skewed distributions, the mean is generally higher than the median

since the few high valued observations pulled the average up.