A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by 约翰霍普金斯大学

Statistical Reasoning for Public Health 2: Regression Methods

72 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 3B: More Multiple Regression Methods

This set of lectures extends the techniques debuted in lecture set 3 to allow for multiple predictors of a time-to-event outcome using a single, multivariable regression model.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Hello and welcome to Lecture 8.

In this lecture set we are going to discuss Cox Proportional

Hazards Regression for Estimation, Adjustment and Basic Prediction.

And this will parallel what we've done with logistic and

linear regression in the previous two lecture sets.

So first, let's look at some Multiple Cox Proportional Hazards Regression, and

give some examples of it.

So hopefully by the end of this section, you'll be able to interpret the estimated

hazard ratios from multiple Cox Proportional Hazard regression models in

a scientific context, and compare the results from simple and

multiple Cox regressions to assess confounding.

So let's go back to our PBC data, trial data, the randomized clinical trial on

312 patients with primary biliary cirrhosis studied at the Mayo clinic.

Patients were randomized as we know very well,

I know to either the drug DPCA or placebo.

And patients were followed from enrollment until death or

censoring, and the follow up period was up to 12 years.

So the question we had tried to answer before is,

what is the association between treatment and patient survival?

And we had seen this before in the unadjusted or crude analysis when we

did the Cox regression, the resulting hazard ratio mortality for

those on the drug compared to those on the placebo was 1.06.

Slightly elevated.

6% higher risk in the sample for those who got the drug, compared to the placebo

where the result was not statistically significant with the confidence interval

for the slope crossing 0 and the 95% confidence interval for

the exponentiated slope or the hazard ratio crossing one.

So, it is not expected that this crude unadjusted hazard ratio will be

confounded by other patient characteristics such as age, sex, and

bilirubin at time of randomization.

Why not?

Well, think about this with a randomized study,

so the distribution of these factors such age, sex, and bilirubin levels at time of

randomization should be similar in the treatment and control groups,

and hence should not distort, magnify or minimize the true association.

However, we still may want to look at a multiple regression that allows for

other predictors because other patient characteristics may

add additional information about mortality, above and beyond treatment.

And these other patient characteristics could be related to each other, and

also related to mortality.

So they may confound each others' relationships, potentially,

even if it's not likely going to affect the association with the treatment.

So let me just show you the results from simple and multiple Cox regressions.

The first set of results here are unadjusted.

This is the crude association between mortality in each predictor on its own.

So this is the crude association we had between mortality and the drug.

The hazard ratio we just quoted of 1.06 and the confidence interval.

Age was dealt with by putting into quartiles.

I put into quartiles to deal with the case where the association may not be linear.

And so I split this into quartiles.

The reference group is the first quartile, less than 42 years.

And you can see in, at least in terms of the estimates, with older age, not taking

into account any other factors, there's an increased hazard with increasing age.

Although the confidence intervals for

the estimated hazard ratios for the three older age groups all overlap somewhat.

And only the latter two are statistically significantly different than

the reference.

But on the whole it shows evidence of an increasing mortality with increasing age.

And the overall p value,

it's no surprise because some of the differences were significant.

But the overall p value for testing whether if there's any difference between

the age quartiles in terms of mortality is statistically significant.

Just one thing to think about it, it is theoretically possible and

it does happen sometimes, that none of the associations between

the other groups and the reference are statistically, significantly different.

But the predictor as a whole is still statistically significant

because each of these associations only

compares the difference between the particular group and the reference.

And it is possible that some of the differences between these

other groups are statistically significant.

So it's interesting in statistics things can be not statistically significantly

different from the same reference, but

statistically significantly only different from each other.

So that's why it's important with these categorical variables to do

the overall test as well, because you may catch things that you

wouldn't catch by simply looking at the results for some of the comparisons.

Billy Reuben, this shows the, the association we estimated in lecture three,

a 16% increase in mortality per one milligram per

deciliter increase in baseline levels measured at the time of randomization.

And there's a statistically significant sex association that shows that females

have a lower risk by an estimated 39%, but there's a lot uncertainty.

This reduced, reduction could be anywhere on the order of over 60%,

61% to just shy of just slightly lower at 2%.

There's a lot of uncertainty there.

What happens when we actually put these all on the same model and

actually compute each estimate adjusted for the other factors?

Well, as expected there's a slight change numerically in

the estimated association with the treatment but

statistically speaking the results are still not significant and the magnitude of

the association is similar after adjustment for age, bilirubin, and sex.

And we would expect not to see much difference there because of

the randomization.

If you look at the age association, the individual comparisons between each

age group and the reference the lowest quartile, a tenuate gets

slightly smaller than they were when we didn't adjust for the other things.

But the overall result is similar than increasing age is associated

with increasing risk of mortality, and the highest age group is still

statistically significantly different than the lowest, or reference group.

So just, just to remind us of who's being compared to whom here,

this hazard ratio estimate of 0.99, for

example, compares the relative hazard of mortality for those persons 42

to 50 years compared to those less than 42 years old who where in the same treatment

group have the same bilirubin measurements at baseline in or of the same sex.

The relationship between bilirubin and

mortality is unaltered after adjustment for treatment, age and sex.

And there's very little difference in the adjusted relationship between mortality

and sex as compared to the unadjusted.

So just to summarize the main findings in this analysis,

is that the relationship between mortality in treatment, or

rather the lack of the relationship after accounting for statistical significance,

and the small relative increase for those in the drug group that was

not confounded by age, sex or bilirubin levels at the time of randomization.

And we would expect that to be the case because, well, it was a randomized trial,

and so the distribution of these factors should be equivalent or

nearly equivalent between the treatment and control groups, and

should not influence or affect the overall crude association.

Age, sex, and bilirubin were statistically significant predictors of

mortality in the unadjusted comparisons.

After adjustment for each other and

treatment, all three remained statistically significant predictors with

associations very similar in magnitude to the unadjusted associations.

So the take home message here is that we could do a better job of

explaining mortality by using these three factors together.

But none of them seem to confound the relationship between mortality and

the others.

Let's look at our predictors of infant mortality in that sample of

10,300 Nepali newborns.

And we're going to look at this as a function of gestational age, and what

we're looking at here is the mortality in the six months following birth.

So you may recall we looked at this association in lecture three and we

had categorized gestational age into five categories, such that the reference com,

for the comparisons was the pre-term were less than 36 weeks gestational age group.

And one of the reasons we categorized this is and

kept it as categorical is because we saw that there was a pretty large reduction in

mortality when jumping from premature to full term.

And then there was a slight further reduction with a little bit

longer gestational ages.

But on the whole,

the big story was about that reduction that came from being full term.

And so, instead of putting in gestational ages a linear term, which would over

estimate, which would underestimate that jump from pre-term to full term,

and then over estimate the remaining impact of additional weeks or mortality.

We kept it or decided the categorical was the way to go, and these are the results

the relative hazard of mortality in the six months following birth for

those children who had gestational ages of 36 to 38 weeks relative.

The reference group was 0.41,

a 59% reduction that was statistically significant.

And similarly we got closer to 70% reduction, 67% for

the subsequent gestational ages categories.

Although these confidence intervals all overlap with each other.

So the big story of this is that there was a large reduction in mortality for

being full-term.

What are some other potential predictors of this?

Well, we had, this was embedded in that randomized trial of

maternal vitamin supplementation and similar to the previous trial,

there wasn't much of an impact with the treatment unadjusted, and

we wouldn't expect that to change with adjustment because it was randomized, but

this just shows the breakdown of some other potential predictors.

So we've got the treatment groups, which were roughly, there's a little bit of

fluctuation, but on the order of 33%, roughly a third of the children that were

born to mothers in each of the three vitamin treatment groups.

And here's the distribution of the gestational age categories.

You see a little over,

a little under a quarter of the sample were pre-termed, 22.5% were at 36 weeks.

And then this shows the remaining numbers and percentages.

The fem the sample was slightly majority female at 51.1%.

And then of the other potential predictors is maternal parity.

So just under a quarter of the mothers this was the first child they had,

they had no previous children prior to the one in this study.

Another 20% had one previous child.

And then another 43% had two to four prior children and

only less than 2% of the sample had more than eight previous children.

So we're going to look at these predictors.

We've already look at gestational age but we'll look at these other ones as

well unadjusted, and then when all are adjusted for

each other in one large multiple Cox regression model.

And we'll do it with two different levels of adjustment.

So here are the unadjusted associations.

We see what we saw before with gestational age.

Here are the treatment comparison and

you may remember from when we analyzed this in STAT reading one, there were no

significant differences in the mortality of children between those born to

mothers who got the vitamins, either vitamin A or beta-carotene in the placebo.

And the overall p value for

this was greater, that test for any difference, this was greater than .05.

The sex.

Comparison there was no difference in mortality by sex,

males had 2% higher in the sample, but it was not statistically significant.

And then there was an interesting association with maternal parody in

the unadjust level.

It was statistically significant predictor, and the reference compare group

was children born to mothers who had no previous children.

You can see that the unadjusted comparison of mortality for children born to mothers

who had one previous child compared to the group with no previous children.

There was a reduction in the mortality.

There has a ratio is 0.58, and was statistically significant.

And then when we go to 2 to four previous children compared to the same reference

there is still a reduction but not by, by less than the previous comparison.

And then similarly, with five to eight previous children,

there's still a slight reduction compared to the reference group, but

it's smaller of a reduction than the previous two.

And then when we get to the group that has eight or

more prior children, there's an increased risk,

although it's not statistically significant over the reference.

But it seems to suggest that having had some previous children is associated with

lower risk of mortality, but there's a threshold at which it becomes either

equivalent or slightly higher than the risk of not having any previous children.

If we go to the second model, I did here only included the predictors of

gestational age, which's four categories, and maternal parity which was

another four categories to see if those two were related to each other and, and

their relationship between [INAUDIBLE] caused any some court of confounding.

You can see if you look at these estimates side by side and the confidence

intervals for gestational age, were nearly identical to the unadjusted version.

And similarly, slight numerical shifts in the estimates, but pretty much

the same story with maternal parity even after adjustment for gestational age.

So it doesn't appear that gestational age and

maternal parity were related, even though both were related to mortality.

And then if we look at this third model, which includes all the predictors.

So if we wrote it out it's a long model, that has the log hazard of mortality,

at a given time, is equal to an intercept at that time.

Plus then we'd have four x's, for gestational age, and

then another two x's for treatment, an x for

sex, and then four more x's for maternal parity.

So this would have a lot of x's in it.

This would be our gestational age part.

And then we'd have treatment.

So, we'd have an indicator for vitamin A.

An indicator for placebo.

And so on, and so forth.

We then have the sex component, the indicator of male or

female, and then four more Xs, and I'll spare you my handwriting here.

Four more Xs for the gestational age categories, for

the maternal parity categories.

And if we wanted to actually get the values of these slopes we

could take the logs' for respective confidence intervals.

But the point here is there's an underlying regression model, and

watch the results on the exponentiated scale will give us these hazard ratios.

And on the whole, if you look across this model that, where for

example we look at the relationship between mortality and

gestational age adjusted for treatment, sex and

maternal parity, the results are pretty much similar or the same as they were when

we looked only at adjustment for maternal parity and in the unadjusted case.

So it, up here there's no confounding of the relationship between gestational age

and mortality by these other factors.

Similarly with treatment, the results are almost identical after adjustment which we

would expect because of the randomization and similarly with sex and

maternal parody, there's not much change in the associations above and

beyond what we saw on the other adjustments so.

So gestational age and maternal parity taken together add

more information about mortality as both are statistically significant, but

they don't appear to confound each other's association.

So in summary gestational age and

parity were both statistically significant predictors,

statistically significant predictors unadjusted and adjusted.

So they each had something to contribute above and

beyond the information from each other.

Where sex was not significant, nor was treatment.

And there was no real evidence of any confounding of these or

the other two relationships by each other, by the other factors.

So, in summary, multiple Cox regression can be used to both estimate adjusted

hazard ratios and assess the associations between timed event outcomes and multiple

predictors by one model similar or very analogous to what we did for binary

outcomes for logistic regression and continuous outcomes within the regression.

In the next section we'll look at making comparisons between more than,

between groups who differ by more than one predictor, using the results for

multiple Cox regression.

We'll talk a little bit about how to translate the estimated regression models

into survival curves for different groups to find by different values of x.

And then in the last section we'll look at several examples of the use of

Cox regression in the public health and medical literature.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.