0:05

So now we're going to move on to how we interpret genome-wide association studies.

We just talked about study design,

potential for bias and confounding, and the

role of a p-value in determining

whether we have a statistically significant association.

But p-values don't tell you about the magnitude of the effect.

And they don't tell you how to interpret the result of a GWAS.

So, let's talk about that.

0:31

To begin this discussion we need to talk about

risk and how we calculate risk because ultimately what we're

trying to do is determine, if we have a genetic

variant, what is is my risk of developing a disease?

So risk is really a measure of the incidence

of disease, and this can be calculated from cohort studies.

Recall a cohort study is one where we start with

a group of people who don't have disease, we follow

them over time, and at the end of a period

of time, we count how many people actually develop the disease.

If we start with 1005 disease-free individuals, and

at the end of our time period, 105 of

them get the disease, we say that the

incidence of disease in this time period is 10%.

1:27

So let's take this back to our genome-wide association

study and again look at a single genetic variant.

I mentioned that you can look at the

entire population and get an estimate of risk.

In this case we identified a 10% risk of disease in the population in general.

But we can also break this out by genotype.

And we can ask, what is the risk of

disease in people who have a different distribution of alleles?

So two copies, one copy, or zero copies of the variant allele.

And we can do this by simply calculating the

number of people with disease in that total sub group.

And come up with what we call an absolute risk of disease,

15% in this group, 11% in this group and 9% in this group.

So again, these are absolute risks of

disease in people who have these different genotypes.

2:22

But typically in a association study, what we measure is, or

what we calculate is not an absolute risk but a relative risk.

We want to know, what is my risk if I have

a certain genotype relative to people without that genotype for example.

So we calculate something called a relative risk.

And what this is, is simply the ratio of two risks.

This effectively measures the strength of an association.

It gives you some quantification of how strong this association is.

So to calculate a relative risk, it's very simple.

You take the ratio of two absolute risks.

In this case risk in the TT genotype is 15%.

In the CC genotype 9%, so that ratio gives you a relative risk of 1.7.

What this means is that, if you have the TT genotype, you

have a 1.7 fold increased risk of disease relative to people with the CC genotype.

You can do that for the other genotype group as well.

You can compare TC to CC, and in this case, you get a relative risk of 1.2.

In other words, a 1.2 fold increased risk of disease in this group

compared to the CC genotype, which we've determined to be our reference genotype.

3:43

So, that's what we can do for a case, or, sorry, a cohort study.

But now let's move to case control studies.

Recall in a case control study, you're identifying a group of cases up front.

And then you're going and identifying a group of controls.

And you fix the number of cases, you decide you

want to collect, say 500 of them, and maybe 500 controls.

So the question is, can we calculate a risk

or incidence of disease from a case control study?

4:12

Well, the answer's no, because we've actually fixed the

number of cases and fixed the number of controls.

So you can't just say that, Oh, I have 500 cases

out of a 1000, so I have a 50% incidence of disease.

That's not how it works.

4:27

But what we can calculate from a case control study is an odds.

So an odds of disease is the probability of

having disease compared to the probability of not having disease.

So if you look at your whole population, the

odds of disease is the ratio of cases to controls.

And in this case it's 500 to 500, you could think about it as 50/50.

The odds of disease is 1.0.

So an odds of 1.0, is a 50/50 chance that you would have the disease.

So that's the odds in the whole population.

But, we can also break it down by genotype

and ask, what is the odds of disease in people

with different genotypes, people with two copies, one copy,

or no copies of the allele, or, of the genotype?

And when we do this, we can get an odds of disease for each genotype group.

In this case, the odds of disease if you have a TT genotype is 1.5.

1.3 if you have one copy.

And then 0.7 if you have no copies.

So odds above one indicate an increased likelihood of disease.

Odds below one, decreased likelihood.

So these are the odds of disease.

Very similar to the absolute risk of

disease that we calculated for a cohort study.

And similar to a cohort study, where we calculate a relative risk, now

we're going to calculate a relative odds in what we call an odds ratio.

So an odds ratio is the ratio of two odds.

So I just transposed those odds over here from the previous slide.

And, we're going to look at the ratio of two odds.

We're going to compare people with the TT genotype to people

with the CC genotype to get an odds ratio of 2.1.

And similarly, we'll do that for the TC genotype compared to the CC genotype.

And here we get an odds ratio of 1.9.

And you interpret these in a similar fashion to how

you interpreted relative risks, but instead of risk, we say odds.

2.1 fold increased odds of disease in people

with a TT genotype compared to the reference.

And in this case, a 1.9 fold increased odds of

disease in people with a TC genotype compared to the reference.

6:44

So, one thing that people who don't do a lot of epidemiological

research tend to do is, they use odds ratios and relative risks interchangeably.

They think they're the same thing.

But they're not.

As you saw, they're calculated slightly differently.

And actually, they're interpreted slightly differently.

7:05

And in some cases, when you have, for example, a rare disease,

an uncommon disease, they can actually be pretty good approximations of one another.

But when you start talking about more common diseases, more

prevalent diseases, odds ratios tend to overestimate a relative risk.

So if you look at this graph here, what I'm showing is as,

as the magnitude of an odds ratio goes up and also as the

prevalence of a disease goes up, indicated by these different lines here, the percent

by which the odds ratio overestimates the relative risk also goes up.

So when you have a very common disease, something that affects, say, 50% of

the population, an odds ratio tends to way overestimate the true relative risk.

And it's really the relative risk that we're trying to get at.

Odds ratios are in some cases a poor

approximation of it, in some cases a good approximation.

But what we really want to get at is the risk.

But sometimes we're left only with an odds and left to interpret that.

[BLANK_AUDIO].

Okay, so we, we know where we're going to measure the magnitude of an

effect with a relative risk or an odds ratio, depending on our study design.

But another thing that we need to think

about in these studies is the external validity.

So internal validity was more thinking about confounding and bias and p-values.

How valid is this study internally?

Is it sound?

But now we're going to ask a different question.

How generalizable are these results to people outside of my study population?

So in that case we need to ask, well how well did our

study population represent the population to which

these findings are going to be applied?

[BLANK_AUDIO].

8:55

In terms of genetic association studies, in terms of GWAS in particular, most of

GWAS that have been done to date have been done in populations of European ancestry.

And so the question really is, are

these results generalizable to other ethnic groups?

So there was a study that came out last year, 2013, which asks this very question.

They asked, among a large number of SNPs that were found to be associated in

European populations, were they also associated with

that same disease in populations of different ancestries?

And they measure the proportion of SNPs in that, in

those European populations that also were found associated in these groups.

And what they found was, in general 70% or more of these SNPs

indeed showed an association in these other ethnic groups as well.

So that's good news.

So what this study tells you is that while a specific genetic association might

be generalizable to these populations in terms of an overall effect, the

exact risk estimate that you apply to them in terms of the relative risk,

for example, or the, the absolute risk, may in fact be different.

The risk estimates will be much stronger in these European populations

than they will be in these other ethnic groups in general.

[BLANK_AUDIO].

Okay, so your question here is, a relative

risk can be measured directly from which study designs?

Is it case control, cohort, or both?

[BLANK_AUDIO].

The answer is b.

Relative risks can be calculated directly

from cohort studies, not case control studies.

In case control studies, we calculate odds ratios which are not exactly the same.

As we showed odds ratios tend to overestimate relative risks,

and these two risk measures should not be used interchangeably.

Unless you have a unless you have a rare disease.

[BLANK_AUDIO]