A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

188 ratings

Johns Hopkins University

188 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 3B: Sampling Variability and Confidence Intervals

The concepts from the previous module (3A) will be extended create 95% CIs for group comparison measures (mean differences, risk differences, etc..) based on the results from a single study.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So what confidence intervals for single po-, population

summary measures, like a single population mean or proportion.

Can help give a range of possible values for some underlying truth.

But one of the ways in which these intervals become

extremely useful is when comparing populations on some outcome interest.

So we've already seen how to estimate such comparisons.

Some mean differences, differences in proportions, relative risks, et

cetera.

So in this lecture we will actually

work on putting confidence limits on these measures.

And this will allow us to actually look at a range of possible values for the

difference between the populations we're comparing and also

ascertain whether the difference is real or not.

After accounting for the uncertainty of your estimates.

Okay, so in this next set of lectures

we're going to be considering how to estimate confidence

intervals for two population comparison measures or measures of association.

Things like a difference in means between two populations.

Or difference in proportions, so relative risk,

and odds ratio, and incidence rate ratio.

Those are examples of such measures.

So we're just going to give an overview in this section to get us

started, and to set us up for the specifics that

we'll deal with in subsequent sections of this lecture set.

So upon completion of this first lecture section, you will be able to extend

the concept of sampling distributions to include

measures of association that compare two populations.

Extend the principles of confidence

interval estimation from single population quantities

to measure of association comparing two populations.

Appreciate the confidence interval computations for ratios

need to be done on the natural

log scale, and then the results then

transformed back to the ratio scale for presentation.

Explain the concept of the null value, the value meaning no

association for such measures of association and what its absence or

presence in a confidence interval signifies.

So just to remind you, something that we've talked

about before frequently in public health, medicine, and science.

Researchers or practitioners are interested in comparing two or more

outcomes between populations using data

collected on samples from these populations.

And such comparisons can be used to investigate

questions such as how do salaries differ between

males and females?

Males and females being two populations of interest.

How to cholesterol levels differ across weight groups?

So we might have four different weight groups.

We might have samples from, the four different

populations constituting persons, all persons in each weight group.

How does AZT impact the transmission of HIV from mother to child.

So we might compare those from a population of HIV positive

mothers given AZT to the population.

Not giving AZT based on a sample from each.

Or how is a drug associated with survival among patients with the disease?

And compare those in the drug to those who get some sort of placebo.

So it's not only, important to estimate the magnitude of the

difference in the outcome of interest, which we've done extensively thus far.

But all,

but also to recognize the uncertainty in this

estimate when making conclusions about the populations understudy.

The summary measures developed, developed thus far, the things we've

mentioned in the beginning are all sample-based and hence, sample statistics.

And these are subject to sampling error

just like single sampry, sample summary statistics.

So one approach to quantifying the uncertainty in our estimates

is to create confidence intervals.

But let's talk about the types of studies we'll be looking at.

In this course, and that are commonly done in public health.

First, we'll just talk about types

of two-group comparisons for continuous outcomes.

One, we've actually seen an example of already.

And we'll define it and give some more examples of

it in section b, but it's what's called a paired study.

Where we're extensively comparing two

populations through two samples, but the two samples, and hence

the two populations we're comparing, have some sort of linkage.

So for each person and observation in population one, there

is a coresponding Observation in population two.

And hence our samples are constructed in the same way.

For each person or observation in the first

sample, there's a corresponding observation in the second sample.

There's another type of study however, that we haven't encountered yet.

Except to summarize, but we haven't even

counted in terms of Dewey confidence limit, and

we'll look at the mechanics of 3D confidence

interval for a mean difference in unpaired situation.

And this is

where we have two populations from which

two samples were taken, or were assigned to.

And there's no linkage between the two populations.

So we might have patients who are randomly assigned to receive a

treatment, extensively representing the population of

all such patients given the treatment.

And we might have another

group randomly assigned to be in a control group, extensively

representing the population of all such patients given a control.

And there's no correspondence between each observation in this treatment

group and any one person or observation in the control group.

There's no inherent linkage.

So these groups are functionally independent.

And we'll be able to create a mean

that summarizes the experience in each group and look

at it's difference.

To quantify the difference in the average outcome.

But when it comes to estimating the

standard error and dealing with the uncertainty, we're

going to have to do a little bit

differently than when we had a paired situation.

We'll do the same sort of unpaired comparisons for binary outcomes.

We're not going to do paired comparisons. They exist but they're rarely used.

Then we're going to focus on unpaired. So again

this is where we have two samples from two populations that we want to compare

and there's no link between the two samples and hence the two populations.

And the same thing with time-to-event outcomes.

We will look at the unpaired situation.

So how are we going to apply the central limit theorem, and figure out how to

get confidence limits on measures of association

to compare to the populations through two samples?

Well, for differences, things that are quantified as

differences, such as the mean difference between two groups.

Or a difference in proportions between two groups.

We can actually extend the basic principles of the central limit theorem

to understand and quantify the sampling

variability of these two sample differences.

So it turns out that the difference

is the two quantities whose distribution is normal.

Have themselves a normal distribution.

So we've learned with relatively large samples, that the

distribution of a sample mean among all possible random

samples of the same size.

So if sample mean were size from a sample of size N1 we'll call it.

All possible random samples.

This was the theoretical sampling distribution

centered around the true mean for the population of one that's taken if we

have another measure, another sample from another population of size n2, it doesn't

have to be the same size as the first sample,

a similar result holds.

And, if we were to actually look at, if we

did a study over and over again, and we took

independent samples of size n1 from the first population, computed a mean on

those, and then Independent sample size n2 for a second population.

And then we look at the mean difference.

Suppose we do this study over and over again, so the second time we took

a sample size n1 from the first population, and a sample size n2.

And we got many different

estimates of the mean difference in the outcome between these two population based

on comparing samples of size one and size n one and n two.

We looked at the distribtuion of the estimated

mean differences across the different iterations of our study.

We'd find that this too is normally distributed and centered at the

true mean difference.

And the same sort of logic applies to if we were

instead summarizing binary data from samples of size, say m1 and m2.

The difference and proportions, cross multiple iterations

of the same study would also be normally

distributed around the true underlying population level difference

in proportions between the two populations we're comparing.

[SOUND]

So this is really handy.

This means, ultimately, if want to conf-, create a 95% confidence interval.

For a population mean difference based on a

single study where we have one sample, for

the first population of size n1, and another

sample for the 2nd population of size n2.

We can do this using the same old logic where we add and subtract two standard

errors or estimated standard errors of this difference in sample mean.

And I'm going to show how to estimate this

using the results from two samples, from said populations.

Theo, the theoretical difference just to give you a head start.

What the central limit theorem tells us the real

true standard error is, and this will look very familiar, is

it's the true variability of individual values on the first

population squared divided by the size of the first sample from that population.

And the true variability of the individual measures in the

second population squared, divided by the size of the second sample

we took from that second population. These pieces look somewhat familiar.

Think about that and we'll show how to actually estimate, I

don't think it'll be a surprise how to estimate these parts.

And how to do this, and create

a confidence interval for the true mean difference.

When we do proportions, things will look similar,

we'll take our observed difference in proportions and

add and subtract two estimated standard errors of the difference in proportions

which will again Be able to estimate from the two samples we have.

Just for a heads up, the theoretical standard error,

the true standard error which we can observe, just like

we can observe the true proportions, from proportions from samples

of size n1 and n2, the difference in them is

This.

And I imagine this looks, at least the pieces of this, either should be 1,

P1 times 1 minus P1 over N1 plus P2 times 1 minus

P2.

I want you to notice though, in both cases with,

we're comparing independent groups, the uncertainty, and we'll talk about this

in detail in the lecture sections, the uncertainty and the difference

in our sample estimates is an additive function of the uncertainty.

In each piece, a proportion in each group or the mean in each group.

So this is really nice. So this, this extension to the

Central Limit Theorem is natural.

It gives us an easy way to estimate the

uncertainty and the interpretations confidence interval in terms of.

Conceptually is exactly the same as when we were doing confidence intervals

through singer, single summary measures like a single mean or a proportion.

This means for most of the studies, roughly 95% of the studies

for 95% of the combination of samples we could get just by

chance from the first population and second, if we were to employ this method.

Take our estimated difference, either in means or

proportions, and add and subtract two estimated standard errors.

95% of the studies we could do, and take samples.

Get 95% of the samples.

We would get from both populations if we were to employ this method.

This interval would include the truth that we're trying to estimate, and

5% of the time, it would miss it.

So again, sampling distributions of

differences are roughly normally distributed

and centered at the true difference for large samples and there's

corrections we can make for small samples which the computer can

handle, the important thing is the concept is exactly the same.

Ratios are a bit different.

But once we get over a little hurdle,

and we've already talked about some of the quirks

of ratios and their scaling, well this is going to

play in to how we compute the confidence intervals

as well.

Once we get over that minor quirk, it's

relatively easy to handle in terms of sampling distribution.

So just to remind you, the thing with ratios is

that ratios have to be positive the way we've defined them.

And, because we're comparing positive quantities.

We either being probabilities and risks in two

groups, or instance rates, both of which are positive.

So the ratios

will always be 0 or greater.

And, so the range of possible values for

a ratio is between zero and positive infinity.

Theoretically.

So, ratios can't be negative.

But we've seen that when we're comparing two groups, if the group on top, so we're

comparing group one to group two by a ratio, group one If the group on

top has a lower value of the outcome measure,

whether it be a proportion or instance rate, then the

group on the bottom, the range of possible values for

that association on the ratio scales between zero and one.

If the group on top has a greater value than the group on the bottom the range of

possible values for that association on the ratio scales,

one all the way up to, theoretically, up to

positive infinity.

So we've seen that there's an imbalance in the ranges, it's a much more compressed

range for associations where the first sample has smaller value than the second.

So it turns out when we take things to the

log scale, if we take the natural log of values

between zero and one, the natural, the theoretical natural log

of zero is all the way out at negative infinity.

And the natural log of one is zero. So we take something.

That on the original scale is tightly constrained between zero and one, and on a

long scale it can range from basically the entire number line below zero.

If we do the same thing, for those associations in which the group The first

group has larger than the second group. And we map this to the log scale.

Then one becomes zero, and infinity, the log of infinity's still infinity.

So the range of possible val, values for ratio in the first

group Has larger value in the second is zero to positive infinity.

By taking things on the log scale, we've made equal the

ranges for which we can hgave the twpo types of association.

Additionally, if you think about it, on the

natural log scale, ratios are expressed as differences,

so for example Why I have something like this, P one hat over P two hat.

And I take the log of that.

This is equivalent to taking the log

of the first thing minus the log of the second thing.

And it turns out these differences are what we were shown Before have

equal range of possibilities when the first

is smaller than second versus the second smaller than the first.

So what does this all mean? Well the ultimate thing that this tells

us is that when we're doing a study with binary outcomes Sample size n1 and n2.

If we were to repeatedly do the study over and over

again, and randomly sample independently from the sample size n1 and

n2 from population one and population two, and get estimates of

the proportion or the, we could also say incident rates here.

Over and over again, and then compute relative

risk based on these different samples we got repeatedly.

If we then plotted a log of these ratios, these

relative risk on the different studies we had, these would be

normally distributed. Sorry for the drawing here.

The histogram of these estimates would be normally

distributed on an average would equal the log

of the true ratio we were trying to estimate.

So it's business as usual.

As long as we put these things on the launch scale,

the same holds, if we replaced these proportions with incidence rates.

And, replace this relative risk with the incidence rate ratio.

Same idea holds true.

So as it turns out, the sampling distribution for

the natural log, of ratio is normally distributed and

centered at the natural log of the true population value.

Of the ratio being estimated.

So by all the same logic we used to

figure out how we could use the theoretical result

when we're only dealing with results from one study

to actually quantify something about the uncertainty of our estimate.

In fact,

[INAUDIBLE]

the story is that it's business as usual, the same principles apply.

Most of the studies we could do would

yield an estimated of law of ratio, that fell

within plus or minus two standard errors of

the law of two ratios, we wanted to estimate.

And so if we take an interval, where we and subtract

two standard errors of our estimated ratio for 95% of the

studies we could do, and take samples from the populations randomly.

This interval would include the true value of the log

ratio, the log of the ratio that we're trying to estimate.

So let's think about two things here. What we're going to need to know

[INAUDIBLE]

before, well, we won't know the true standard error so we're

going to have to learn how to estimate these from a single study.

And we'll show that there are slightly different formulas for relative risks.

The log of the relative risk.

The log of an odds ratio when we're dealing with binary data.

And each of these based on the counts in the respective two by

two table, representing our data results, and

then there's a separate formula also for

the law of minimum instance rate.

But they're all pretty straightforward to compute.

And so we'll delve into that in lecture sections C, D, and E.

So what we're going to end up getting is endpoints, I'll just call them a and b,

that will be the confidence interval for the log of a ratio.

And you'll say, that's no good, John, because nobody thinks on the log scale.

And I agree.

But let's just think about some things here for the moment.

If we have estimates of the confidence interval for

the log ratio, and it's a natural log, we can usually get

things back to the ratio scale

by antelogging, or what we call exponentiating.

So the computations will be done on the log scale.

And then the results will be exponentiated back to the ratio scale.

This is because the sampling behavior and the log scale is normally distributed.

And we can use that standard procedure to get a 95%

confidence interval by adding or subtracting

2 standard errors on the log scale.

If we wanted to do a 99 percent confidence interval, we could go slightly further.

If we wanted to do a %90 confidence interval

we could go 1.65 standard errors in either direction.

The exact same idea applies.

So let's just talk a little bit about null values.

The null value for a measurement association in comparing

two populations is the value of this measure association.

If both the population outcome quantities being compared by this measure are equal.

And hence there is no association between this outcome and the populations.

So, for example, if I'm comparing

continuous measures between two populations by

the means and there's no population level difference in the means.

So the mean blood pressure is exactly the same

for those who received the drug versus those that received

the placebo and the true means are equal and

the difference in means at the population level is zero.

It indicate there's no association between average

blood pressure and treatment because the average

is of the same.

If I'm comparing proportions between two populations, and the proportions

are equal, then the difference of proportions would be the same.

So for example, if I looked at the pram, percent

of persons, if I was comparing HIV infant transmission among mothers

who got AZT during pregnancy, And mothers who got a

placebo and there was no effective AZT good or bad that

if the population, the proportion of infants contracting HIV born to those two

groups of mothers would be equal and the difference in proportions would be zero.

How about for ratios?

Well if I'm comparing two quantities, whether it be risks.

Odds

or incidence rates. If the at the population the

two populations have the same value. There's no difference and hence no

association These ratios will have the same numerator and denominator and the

null value would be one. The log scale, and we'll just use a

relative risk as an example, if the true population proportions

are equal, the ratio will be equal to one. And

the log if P, so if P1 equals P2 And the ratio,

true ratio is one and the log of one, one is zero.

So if the true proportions are equal at the population level then

the level null value for the log of the ratio is zero.

And remember, we can express this as a difference

in the log of the numerator and denominator, and if

those numerator and denominators are equal, then the logs are

equal and the difference is zero, so this makes sense.

So what we're going to see and think about is if the

null value appears in the confidence interval for a measure of association.

Then, and excuse the phrasing here, but this is how it's done.

Then no association between,

the outcome and the populations being

compared is a plau-, plausible conclu-conclusion.

It can't be ruled out.

So for example, if we look at the average blood pressure difference

between a group of those who receive, who are randomized receive a low fat diet

and those that receive, to receive a low carb diet for example,

if we look at the blood pressure difference.

And between say, low fat minus low carb

and the mean difference is negative four, millimeters of

mercury, indicating that in our study those in

the low fat diet had to lower blood pressures.

But the confidence interval goes from negative ten millimeters of mercury

to two millimeters of mercury.

Then after accounting for the sampling variability,

in our estimate, we get both plausible negative

and positive values for the true mean

difference, and we can't reach a conclusion, statistically.

And included at this interval is 0, the possibility of no difference.

So we would say, that after counting for the

uncertainty, we found no statistical difference in the average blood

pressures, we may call this a non statistically significant result.

Get into the language of statistical significance in the next

set of lectures, but we're just laying the groundwork here.

If a null value does not appear in the confidence interval for a measure

of association, then no association is not

within the range of possible population level associations.

And hence we can rule out, and that sounds weird verbally, can we rule out.

No associations of pop, is a possibility say there is evidence of an association.

So if we estimate for example, the relative risk with the relative risk

of HIV transmission infants from mothers. Who get AZT

versus mothers who get a placebo and the estimated relative risk is .32.

Then

we put confidence limits on that and it goes

from 0.18 to .58 when we're done with our computations.

Notice that this interval only includes values less than

one, it does not include one, and what we've found

here is that we've given a range of possible

values for the true association between AZT and infant transmission.

But after account for the uncertainty

in your estimate, all possibilities favor AZT and we've ostensibly ruled out One as

a possibility for the true value of

association and we've ruled out the idea that

AZT is not associated with placebo and

we'd say there's a statistically significant finding indicating

that after counting for the sampling variability

in our data we found a real Evidence

of a real protective association between AZT and HIV transmission to infants.

So again, if we do not include the null value in our confidence

interval for measure association be it

a mean difference, a difference in proportions,

or ratio based measure this finding

will, is called statistically significant and we're

going to get into much more detail on that terminology lectures nine and ten.

So in summary what we'll be exploring in this next

set of lectures is we can easily compute confidence intervals.

95% in other levels for things like

Mean differences between two populations, differences in proportions between two

populations, relative risks and odds ratios, and then incidence rate ratios.

And we can do this using Data from

two samples from the two populations being compared.

And the general rule for difference is. So just

the difference means your proportion is to take our observed difference,

and add and subtract. To estimate the standard errors.

And we'll show you how to estimate these in subsequent sections.

[INAUDIBLE]

difference.

And for ratios, what we're going to have to do is go on the log scale.

[SOUND]

Take the log of our estimated ratio.

Add and subtract two estimated standard errors.

The log over estimated ratio.

And then, transform this back. To the ratio scale.

So it's a little bit more labor intensive, but

ultimately we'll have computers to do this for us.

But, it doesn't betray the general principles we've developed.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.