A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

來自 Johns Hopkins University 的課程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 個評分

Johns Hopkins University

238 個評分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

從本節課中

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

In this next set of two lectures, we'll look

at the principle of designing studies to have certain characteristics.

And in Lecture 12 we're going to focus on the idea of level of precision or margin

of error, how wide we want the confidence

interval on our estimate, around our estimate to be.

To create a confidence interval for the underlying true quantity we're estimating.

So in this first section we're going to look at just the basics

of precision and sample size, something I

think you're already familiar with, but just

give an overview to take us into where we're going in terms of designing studies.

So after the end of this lecture section you'll be

able to explain and demonstrate

empirically the relationship between sample

size and precision of an estimate, and you've already seen

this and done this, but we'll just review it briefly.

In a format that lends itself to our next exploration.

And explain that for continuous measures the variability of

individual values also impacts the precision of sample based estimates.

So let's suppose a researcher was doing a

length of stay study like we've been doing.

But he or she only had access to 30

discharge records from 2012 from a large urban teaching hospital,

and the researcher wants to use these data to

make a statement about length of stay at the facility.

So what they got was a sample of 30

observations where the average length of stay was 6.3 days,

but there was a fair amount of person

to person variability in their length of stay values.

And so when they actually go ahead and create a confidence interval for

this, they take the mean of 6.3 days, they add and subtract a

little more than two standard errors because we only have 30 observations, and

they get a confidence interval that goes from 3.5 days to 9.1 days.

That's not a very substantively

useful, and it doesn't really pin down very

precisely the average length of stay at the facility.

Well, the big problem here is their margin of error is wide.

The width of the confidence interval's wide, because the two-standard error bound

is large and that's mostly because they only have 30 persons in their sample.

It also doesn't help things that the variability of the individual measurements

is relatively large.

So suppose you were the researcher, suppose

you were privy to these results, and you

thought about doing a study that was larger

to estimate the confidence interval with better precision.

Think about how you might go about doing that

based on the information from this research, researchers study.

Here's another example.

Here's summary data on charges by sex and

you've actually looked at this in one of your

practice problems based on a random sample of 500

Carotid Endarterectomy procedures performed in the state of Maryland.

And we saw that the variability in the individual cost measurements

was large in terms of dollars for both the males and females.

So if you actually go ahead, like you did in your review

exercise and create a confidence interval that mean difference of charges.

Here I'm going to switch the direction, because I can.

Direction is arbitrary, as long as we track

it, for females to males, and, this, these

numbers should look familiar to you, but we

get a pretty wide confidence interval, that includes zero.

The nagging question might be, is the fact that zero is in

the confidence interval, does that mean that the underlying true, that the null

is true, that the means are equal between males and females who had this procedure.

Or, did we fail to reject that null in the sense of not including zero, because

we didn't have precision enough to find a

difference even if it existed at the population level.

And so it kind of gets down to this idea

that again we have a pretty wide margin of error.

The two standard errors part here is, over $825.

And that comes from the fact that our standard error, for the

difference, is using very variable individual measures in each of the groups.

And while the sample sizes are, respective, they

may be too small given that much variability

in the individual values to estimate the quantity

of interest with the precision we'd rather have.

How about this situation?

Researchers design a small pilot study to estimate the percentage

of patients who have a certain minor reaction to a drug.

So 30 subjects are enrolled and nine have the reaction.

Now I'm going to estimate the confidence interval by hand here

just to illustrate the margin of error, but really given this

sample size we might want to just use a computer to

verify because it could give us the exact computations because this

central limit theorem based approach may not be perfect.

You want this sample, which may qualify as

being smaller, but let's just do this anyway.

And it turns out the results we get are close to

what you' get with the exact computation, but not quite the same.

So, they estimated 30% of the patients in their sample experienced the reaction.

It's a sizable percentage, but then when they went to quantify and

add in their uncertainty, they got a true percentage possibility between 13%

and 47%. So, 13% is certainly not as perhaps

big a deal as 47%, and it really does not allow them to make a very strong

substantive decision about the chances of having a reaction.

Well again this is fueled mostly because of our margin of error is so wide.

It's on the order of 17%.

We have to go plus or minus 17% around

our estimate to create a confidence interval for the truth.

And that's mostly a function of the fact

that there's only 30 patients in this study.

So maybe you see these data and you're interested in better

quantifying this because you think this drug could be really useful otherwise.

But you want to know about the propensity for side effects, can you think about how

you might use these data generated from this small study

to design a study that has a smaller margin of error.

Here's an interesting example, was first published

in the clinical trials journal in the 80s.

This was the clinical trial done on patients who had a peptic ulcer.

And it was two drugs.

I'm going to call them drug A and drug B. Because there's, that's easier to

pronounce than the names. But in drug A, and I'll try and say the

names at least once, but pirenzepine, there were 30 patients total, and

23 of the 30 patients experiences relief from their peptic ulcer.

In drug B, trithiozine, there were 31 patients enrolled,

and 18 of the patients were cured, or experienced relief.

So, the observed proportions, the estimated

proportion of patients who experienced relief in

the two groups was 23 out of 30 for the first group, or 77%.

And 18 out of 31, or 58% in the

second group.

So this is a pretty sizable difference, or risk difference here.

19% greater

proportion of patients were cured or healed with drug A compared to drug B,

and if you look at this on the relative scale the relative risk is 1.32.

So an individual's risk is about 32% higher of being healed

With drug A versus drug B, but if you actually go and

put confidence limits on this, because of the small numbers in

both of the groups, we have a really large margin of error.

The margin of error

is two times the standard error of 11.6%. So,

when we correct this here, plus 232, plus or minus 23%.

So if you actually put confidence limits on it, not only

is this not statistically significant, but the range of possibilities is huge.

Anywhere from actual reduced benefit of drug A

on the order of 4% to a potential increase in the difference on the order of 42%.

So, wow.

This is kind of a ridiculous result

in terms of making any substantive conclusions here.

But the actual study result looked very promising for drug A.

The fact that zero was in this

interval it's very unclear whether that's because the

underlying truth is there's no difference in

the efficacy of the drugs or because the

precision of the estimate was so poor, given a low sample size even if there

was a benefit of drug A compared to drug B, it could not be seen.

So, generally we'll see that we design studies with regards to a risk difference.

But of course, the better our precision with regards to the

risk difference, the better our precision will be with regards to

our ratio based estimates as well.

Finally, let's look at an example where a small study

is done to compare the efficacy of smoking cessation programs.

For program one, ten people enrolled and were followed

for a cumulative total of 259 days after quitting.

Four persons resumed quitting in this follow-up period.

For program two, 11 people enrolled and were followed for

a cumulative total of 267 days, with two persons resuming

smoking in the follow-up period. So the incidence rate ratio here

is over two, for program two. One compared to program two,

so the relative incidence of returning to smoking in the follow-up period is twice.

That in program one versus program two.

So, if we actually do the confidence interval computation,

remember, we have to take things to the log scale.

And it's a little harder to understand what the

margin of error means in the log scale until

we exponentiate things back to the ratio scale to

get a sense of how wide our interval is.

But let's go ahead and do this with our classic computations on the log scale.

The log of 2.06 is 0.72.

The estimated standard error is the square root of 1 over the number of events

in the first group plus 1 over the number of events in the second group.

And if you do it out and ultimately exponentiate you get a confidence

interval that goes from 0.36 to 11.7. Now I should note that this

is a very small sample, example in the

world of timed event data with the few number

of bets in both groups and if you were doing this with a computer, which is how

you'll do your computations for most of the

rest of your life, you would get a different

interval, the exact interval, just this, this is F

Y is even higher, wider, it goes from 0.30

to 22.8. It's a lot higher upper bound.

But if we, this just gives us some sense again

of the ingredients into this equation is the reason we get

such a wide interval in the log scale and hence

the exponentiated scale because this margin of error is so large.

And that's a function of the small number of events we have in both groups.

So we can think about if we solve these data, and had some estimates

of the instance rates in both groups, how many people we'd need to follow

to actually see enough events in both

groups to get a reasonable margin of error.

And so we certainly, we'd be hard pressed given the results

from a study like this, given the width of this interval,

whether we use the hand computed one or the exact to

make a definitive conclusion that these programs are not associated with smoking.

That there's no difference in the underlying returned

to smoking between program A and program B.

There is the null value of one in a ratio interval, but the interval's very wide.

And so again, we have that uncertainty about

whether the fact that we failed to reject

the null is because the underlying truth is

no difference in the rates of smoking return in

the two programs, or we just couldn't see any

difference for it there because our precision is so poor.

So in summary, and this, this just rehashed an idea

that you were pretty comfortable with, just really focused on

this idea of low precision and small studies is that

small studies, lead to estimates with large margins of error or

a low precision.

[BLANK_AUDIO]

Large uncertainty, in the estimate

and so that makes it difficult,

both to quantify our underlying

population level quantity of interest

with any substantively interpretal

range which is difficult, to quantify

a substantively useful confidence

interval. And it's difficult to interpret

the null value, the presence of the null value.

We effectively fail to reject the null hypothesis.

But because of the lack of precision

it's unclear what that means.

So we're going to see in the next section is how the

sort of start with guess for what our study results will yield.

And preliminary studies like some of the ones we showed in this

section are a great way to do that in order to actually design

a study going forward before we actually collect data to have a

certain margin of error or precision for the quantity we wish to estimate.

So the next section will show that if we come to the table with some

preliminary expectations or estimates about what we

think the outcomes of the study we have

yet to perform will be, we can

use that information to estimate the necessary sample

size or sizes we need to give the

estimates of interest a desired margin of error.