So, in this section, we'll look at several examples of the use of multiple Cox regression from the public health and medical literature. So, this will give you an opportunity to look at the results from simple and multiple Cox regression models presented in some published journal articles. So, this first one, is something we've looked at before in the course, and we'll use as a continuing example here. This was actually a randomized trial, looking at prevention of HIV-1 infection with early antiretroviral therapy, in sero-discordant couples. So, sexual partnerships where one person was HIV positive, and the other was negative. So, what they did was, they randomized the HIV positive member of the couple to either receive aggressive therapy starting right away, or the standard recommendation had been to wait to give antiretroviral therapy until the CD4 count went below a certain threshold. So, what they found as you may recall, is that there was strong evidence of great efficacy of the early treatment. What they found is, as of February 21st, 2011, at end of the study, a total of 39 HIV-1 transmissions were observed. Of these, 28 were virologically linked to the infected partner. So, 28 transmissions from one HIV positive partner to their sero-discordant partner. Of the 28 linked transmissions, only one occurred in the early therapy group. So, the overall hazard ratio comparing these two groups in terms of the relative hazard of transmission, was 0.0496 percent lower risk of transmission for those whose HIV positive partners were randomized to the early therapy group. This was statistically significant. So, how did they do this? Well, let's start with where they talk about the basics of the methods. They say, we use the Kaplan-Meier method, to calculate event-free probabilities in person urinalysis for incidence rate for a given year. We also used Cox regression estimate relative risks which are expressed as hazard ratios and 95 percent confidence intervals, and to provide adjustment for potential prognostic factors such as the infected participant's baseline CD4 count, baseline plasma, HIV 1 RNA concentration, and sex. It's interesting, there's certainly more information about potential risk factors for transmission to be gleaned from using multiple Cox regression. But this was a randomized trial so, I'm suspecting that the adjusted results for the association of interests will be similar to the un adjusted results. In other words, those hazard ratio estimates, comparing the hazard of different outcomes for those randomized to the early treatment group compared to the standard, will be similar before and after adjustment because of randomization. The same Cox analyses were performed on linked transmissions as the outcome, any transmissions, clinical events, and composite monitoring events. We used chi-square test to compare the frequencies of the events between groups, but then ultimately Cox regression to look at comparing the risk over time. A P value of less than 0.05 was considered to indicate statistical significance. So, here are the unadjusted Kaplan-Meier curves, comparing the linked HIV transmission, the time to HIV transmission for the two groups, and as we've seen for link transmission, and for any transmission, which includes transmission to another person other than their primary sexual partner. Even with the inset here, where they reduced the scaling to blow up the graphs, visually, you can hardly see the Kaplan-Meier curve for the early group, especially in the linked situation, because there were so few cases in that group. So, what they do in this table here, is for each of these four outcomes, again, linked transmission is the main one of interest in any transmission, and then other clinical events, and composite events, but we'll just focus on linked transmissions. They actually do something more detailed than it's sometimes done in other analyses. But what they do is, they estimate within the overall hazard ratio comparing the risk of each outcome. We'll just look at link transmission for the early treatment group compared to the later treatment group. They do it overall across the entire follow-up period, and that's the estimate of 0.04, that we saw in the results we talked about before, but then they show it by a year. Actually, there was only one transmission out of the 28 linked transmissions in the early therapy group, and it must have occurred in the first year of follow up because the hazard ratio of transmission in that first year of follow-up was 0.06, comparing early treatment to standard treatment. So, only using the data in the first year of follow-up. If they only used the data in the second or third year, there were no transmissions, and beyond the third year, there were no transmissions. So, these estimated hazard ratios for these other two time periods are zero. So, you might say, ''Well, that's not indicative of proportional hazards, because this ratio isn't constant across the entire time period. It's 0.06 in the first year and then zero in subsequent.'' That's mainly attributed the fact that there was only one transmission overall. You can look for some of these other outcomes, where there were more outcomes, like they're defined clinical events, which still define the footnote on the next page. They looked at the relative hazard of clinical events, overall for the early treatment group compared to the follow up was 0.59. In the first year of the estimated hazard ratio using only that first year of data for the earlier standard comparison is 0.75, for the second year is 0.42, and for the third year it was 0.37. So, the estimates differ, but they all show a decreased hazard for those who got early treatment compared to standards. So, even though they're numerically different, but the confidence intervals tend to overlap, but even that being said, this overall computation onto the proportional hazards assumption is, you can think of as a weighted average of these three different time, followup time specific hazard ratios, and they're all less than one. So, it's probably okay to go ahead and just take a weighted average to get the overall essential association. What they did now as promised in the method section, is now they presented for each of these four outcomes linked transmission, any transmission, clinical events, caused events, the unadjusted and adjusted hazard ratios not only comparing the early therapy versus the delayed group. But on other potential predictors of transmission, the baseline CD4 count of the HIV positive partner at the start of the study the baseline viral load, whether they were male or female biologically, and whether they adhere to condom usage a 100 percent during sexual activity or not. So, let's just focus on the linked transmissions again. Here's the overall unadjusted comparison in the hazard ratio for early versus delayed, it's 0.04. We saw before notice that when they've adjusted in what they call multivariate analysis, this is a model that includes all of these things as predictors in the Cox model. It's exactly the same, and that's because of the patients or subjects randomized to the two treatment groups that it minimizes the potential for an association between any of these other factors and treatment. Even if these other factors are related to the outcome their distribution should be similar and comparable in the early therapy, and the standard or delayed therapy groups. So, that's why these two things are almost identical, and you can see that that's the case in terms of the early therapy versus delayed for the other three outcomes as well, whether it's unadjusted or adjusted, those are almost identical. They also report hazard ratios, unadjusted, and adjusted, for these other factors as well. So, just to shed a little bit more light on these things here. They included a footnote for each of these tables. It was too small to see with the table so, I blew it up here, and I just wanted to talk about the clinical events we talked about the time-specific hazard ratios, in the first table we looked at. These include stage four events, defined by the WHO, severe bacterial infections, and pulmonary tuberculosis. Let's look at another example in a policy context here, this was Medicaid in insurance exchange movements. So here's the abstract, this was published in health affairs it says, the Affordable Care Act, so this was published before the implementation of the Affordable Care Act in the United States, will extend health insurance coverage by both expanding Medicaid eligibility and offering premium subsidies for the purchase of private health insurance through states health insurance exchanges. But by definition, eligibility for these programs is sensitive to income and can change over time with fluctuating income and changes in family composition. The law specifies no minimum enrollment period, and subsidy levels will also change his income rises and falls. Using national survey data, they estimate that within six months, more than 35% of all adults with family incomes below 200% of the federal poverty level will experience a shift in eligibility from Medicaid with an insurance exchange, or the reverse. Within a year, 50% or 28 million, will. To minimize the effect on continuity and quality care, states and federal government should adopt strategies to reduce the frequency of coverage transitions and to mitigate the disruptions caused by those transitions. So what they wanted to look at is what factors were related to such transitions. So their data source with the Survey of Income and Program Participation and this is a survey conducted by the United States Census Bureau. It's administered every four months and includes detailed questions on monthly income and insurance status for each of the four prior months. The data also include the relevant federal poverty threshold for each family in each month. The 2004 survey panel contains 12 waves, covering 2004-2008 and that was the primary sample for this analysis. So, what they wanted to look at is the outcome it's for adults initially below the nominal 133% cutoff, the primary outcomes where the percentages of people whose incomes made them consistently eligible for Medicaid throughout the study period; people whose incomes risen above 133% of poverty, adults who would lose Medicaid eligibility but gain exchange eligibility under the Affordable Care Act, and people whose incomes temporarily risen above 133% but subsequently dropped down below the cutoff. For adults initially above the 133% cutoff, the outcomes were the percentages of people whose incomes remained above 133% throughout the study period so they did not experience the outcome of change in status, and people whose incomes had fallen below 133%, so they change status and then stayed at their follow-up status, and then people's whose incomes has temporarily dropped below 133% but subsequently risen back above the cut-off those that they will call churning. So, to identify risk factors for changes in eligibility they used Cox proportional hazards regression, using duration of continuous eligibility for a single program as the outcome variable. So they followed people until they had a change of status. For those that had a single change in status they use that marker, is when they change for those that were churning as they described those who fluctuate back and forth they used the first time at which they changed eligibility. Those who stayed consistent across the entire follow-up period were considered to be censored. They never had the outcome of interest because they never changed eligibility status. This analysis therefore that they used identify risk factors for switching eligibility in either direction at any point during the study period. Variables were age, sex, race, ethnicity, education, marital status, parent with a child younger than age 19 in the home, and then et cetera, et cetera. All variables were defined based on the response first month in the survey. So here's a table or where they present adjusted hazard ratios for changing status in the follow-up period, and they do this, these are all adjusted for each other and they present this table but they look at the age at the start of the study, the sex of the person, marital status, parental status, et cetera. Let's just zoom in on the first two predictors here, so, years they show the adjusted association adjusted for everything else for the period in that larger table between the age of the person and the hazard of changing eligibility. The reference group for this comparison was the oldest group 50-60 years old, and then they looked at the relative hazard adjusting for sex and all those other predictors for each of the other three age groups, and they saw that the younger age groups that consistently higher hazard than the reference oldest age group, so for example those 19 to 29 years old had 30% greater hazard of changing status over the follow-up period compared to 50 and 60-year-olds of comparable sex, et cetera, all the adjustment factors, and as we went to the next age group we're still higher but by instead of by 30% by 15%,13 for the third oldest group compared to the oldest group, and these were all statistically significant. Sex of the person, males were more likely over the follow-up period to have a change in status, 11% more likely than females who were similar in terms of the other adjustment factors and if you want to go back or go to DRI link here and get the article, you can see the adjusted hazard ratios for other predictors of changing status. So, how do they explain these results? Exhibit four was the larger table we looked at, exhibit four presents the Cox proportional hazard regression results. Identifying predictors of income fluctuations across than 133% threshold. Hazard ratios is greater than one we already know this, but it's nice that they tell their readers, indicate greater or higher likelihood of income fluctuation ratios less than one indicate a lower likelihood. Income changes were significantly more likely among younger male, so we saw those first two in the parsing of the first two predicted results, but if you went back to the table you'd also see they were more likely among married individuals. Less likely among blacks, less educated individuals and adults with children in the home. Income fluctuations were significantly less likely among adults with Medicaid or Medicare coverage compared to the uninsured and those with private insurance. The strongest predictor was initial income and if you go back and look at that I think they're calling that the strongest because it has the largest estimated hazard ratio. Eligibility changes were most common in adults at the point of Medicaid exchange market divide, that is, people with incomes of greater than 100 to less than 133%, and greater than 133 to less than 150% of poverty. Changes were moderately common among adults with incomes below the poverty level and least common for adults with incomes above 150% of the poverty level. This is a nice example of what we've talked about different situations for time to event outcomes, but, they don't have to be a clinical outcomes such as death or HIV transmission they can certainly be anything where we can quantify the occurance of it and when it occurred in the defined follow-up period. So one more example from the American Journal of Public Health in 2016, Cardiovascular Disease and Neighborhoods Social Conditions. The objectives of this study was to examine the impact of neighborhood conditions resulting from racial residential segregation on cardiovascular disease risk in socioeconomically diverse African-American sample. The study included 4,096 African-American men and women age 21 to 93 years old from the Jackson Heart study done in Jackson Mississippi. We assessed neighborhood disadvantage with a composite of eight indicators from the 2000 US Census. We assessed neighborhood level social conditions including social cohesion, violence and disorder with self-reported validated scales. So here's how they report the results. Among African-American women, each standard deviation increase in neighborhood disadvantage was associated with a 25% increased risk of cardiovascular disease after covariate adjustment. A hazard ratio of 1.25 and 95% confidence interval 1.05 to 1.49. Risk also increased as levels of neighborhood violence and physical disorder increased after covariate adjustment. But they did not observe significant associations among African-American men in the adjusted model. So here's a situation where they saw that the association between the outcome and predictor differed depending on whether the sample was male or female. So what they conclude here is that worst neighborhoods social and economic conditions may contribute to increased risk of cardiovascular disease among African-American women. Policies directly addressing these issues may alleviate the burden of cardiovascular disease in this group. So, I just want to talk a little bit into the method section just to tell you how they set up these predictors of measuring things like social cohesion and neighborhood disadvantage. What they did for neighborhood disadvantage is the development of the neighborhood disadvantage score for the Jackson Health Study has been described elsewhere in detail. But briefly what they did is they used something called exploratory factor analysis based on census tracts in the Jackson metropolitan area to develop a composite score of socio-demographic indicators from the 2000 US Census. We haven't done factor analysis in this course but basically it's a way of taking information from multiple inputs, so they had multiple survey questions related to neighborhood disadvantage or multiple factors from the census they could get at the underlying construct and neighborhood disadvantage in factor analysis attempts to create a scale based on these multiple factors. One or two scales that reduces a lot of information in multiple factors to one or two numbers that explain most of the information on those factors. So, they developed the scale by summing the standardized z-scores for each indicator. So what they did for each indicator was for each respondent in the study, they compare theirs to the average for all respondents and turn it into a z-score by taking the difference between that respondents value and the average and standardizing by the variability. They added these up across individuals and the ultimate resulting scale was the sum of these things across the factor scores for these socio-demographic indicators. How do they get it neighborhoods social environment? Well this explains the methods here and I won't go through it gets into some stuff that we haven't covered. But so far is to to say basically what they did was created a composite scores that were already adjusted for other factors like age and gender of the person. But nevertheless they can create a composite score, higher scores measuring more cohesion and lower scores measuring less cohesion of the neighborhood higher scores for neighbor. With violence and disorder representing more violence and for lower scores representing less. So finally what they did, and here's where we get him to Cox regressions, they first examined the distribution of socioeconomic and demographic characteristics and CVD risk factors according to CVD standards and the tertiles of neighborhood disadvantage. So they just looked at a cross tab of who had cardiovascular disease or developed it versus those who didn't by three categories of neighborhood disadvantage. They used Poisson regression to calculate gender-specific age adjusted incidence rates based on tertiles of each neighborhood characteristic and tested for trend by including neighborhood factor in models as ordinal variables to see whether the association was continuous or continually increasing or decreasing. Here's where they get to the Cox regression. To examine the association between neighborhood characteristics and cardiovascular disease incidence, we fit Cox proportional hazard regression models to adjust estimate adjusted hazard ratios and 95% confidence intervals. On the presence listen to this, on the basis of the presence of approximately linear relationships observed and descriptive analysis we included neighborhood characteristics as a continuous standardized scores. So they took something that was measured on a continuum and evaluated whether the linearity assumption of Cox regression was appropriate and they determined it is. They did something they make their hazard ratios comparable in terms of the direction, they reverse scored, standardize scores for social cohesion so the interpretation of the hazard ratio would be consistent with other neighborhood variables. Remember increase in the social cohesion score was good but an increase of the other scores, violence and disorder et cetera were bad and so they wanted to make it such that the increase in any of these represent worse so that they could be compared more easily without thinking about that reverse association. They fit three sequential Koch's models separately for each neighborhood variable. Model 1 adjusted for age, model to further adjusted for socioeconomic status and model three adjusted for behavioral and biological medical factors. They also examined association separately for the outcome of incident coronary heart disease and stroke events using the same three models, set of predictors. So here what they have, they present the results separately for men and women. So they allow us across the board to see whether these associations between cardiovascular disease and neighborhood factors are different for men and women and then what they do is for each of these so they- model 1, each of these comes from a model that included the particular neighborhood scale measure and age. They were only age adjusted. So we can see that across the board increase in disadvantage. Remember that social cohesion is reverse coded here. So an increase here means lower social cohesion, violence and disorder these are all associated with an increase in cardiovascular disease although the social cohesion one is not statistically significant after adjusting for age. After they additionally adjust for socioeconomic status, the estimates change slightly but the general associations still remains and this is again in women here and then model 3 they adjust for further things as well pretty much the same results as with model 2. So it doesn't look like these associations were confounded much by age or other factors that may be related to the neighborhood characteristics and cardiovascular disease, at least in women. And men if you go through this, you'll see that similarly there wasn't much confounding of the after adjusting for age they've been adjusted for age and socio-economic status than age socio-economic status and other characteristics and the results may remain pretty consistent for each of these predictors across the levels adjustment for males. But males these things were not consistently associated with an increased nor were they statistically significantly so. So this is why they initially report in that abstract we looked at that these factors tend to be worse- scores on these factors tend to be associated with increased cardiovascular disease for women but not for men. And they suggest targeting women in these communities to triage appropriately in communities where there's greater neighborhood disadvantage or lower social cohesion et cetera.