0:24

As background,

let's first revisit inverse probability of treatment weighted estimation.

And so here we'll think about just estimating the expected value of

the potential outcome among treated subjects, meaning,

if everybody was treated.

So we're just going to focus on this one potential outcome just for illustration.

But if you want to estimate the other potential outcome,

the equation would look very similar except the weights would be different.

So as a reminder, if we wanted to estimate this mean of the potential

outcomes if everybody had been treated, we could do it as follows.

Where, first, I want to note that the denominator involves the propensity score,

so we're looking at treated subjects.

And so our denominator is going to involve the propensity score because we weight by

one over the probability of that group's treatment.

And that group happens to be the treated group.

And I wrote in as a function of X.

So pi(X), just to sort of reiterate or

remind you of the idea that the propensity score does depend on the X's.

1:34

A, here, is just an indicator variable.

It's just treatment, and we're thinking of it as binary yes or no.

So A equals 1 if treatment is 0 otherwise.

And so putting that right next to the Y is just sort of guaranteeing

that's going to only include values in the sum among treated people.

1:54

So control people are going to have A equals 0, and so

their values aren't going to get counted.

And remember, we're trying to estimate the mean of Y, if everybody have been treated.

So we sum over all n subjects, but we pick off only the treated.

But then we weight by inverse of the propensity score.

So we are just, essentially,

you could think of this as a sample mean of Y in the pseudo population.

So this weighted population but there's no confounding.

And so that's the standard sort of IPTW estimator for

the mean of the potential outcome for treatment.

And so if the propensity score is correctly specified,

then this estimator is unbiased.

2:37

So correctly specifying means we got this model right so

that the true probability of treatment, given X, is actually equal to pi(X).

Okay, so that's what we mean by correctly specified.

If we get the model right, then this is an unbiased estimator.

Now let's imagine a different approach

to estimating this mean of the potential outcome.

So again, we're going to just focus on

estimating the mean of this one potential outcome, but the same kind of idea would

apply if you were going to try to estimate the other one.

So you could use some kind of outcome regression model.

So we haven't yet actually done that in this course, but

this is something you could do.

And so what we'll do is we'll specify a model that we'll call m1(X).

The 1 here is just indicating that it's among treated subjects.

So it's an outcome model restricting to the subset of

patients who were actually treated.

So you'll notice that this model, it looks like a standard kind of regression model.

It's expected value of Y, conditional on A = 1, so

among treated subjects, and also your confounders, X.

So this is just some model, right, it's some model for the mean given A and X.

But if we actually wanted the mean of the potential outcomes,

what you would have to do is you would have to take this conditional mean m1(X),

and basically, average over the distribution of the confounders.

You'd have to kind of integrate off the X's.

Or in this case, what we're doing is we're averaging over an empirical distribution,

but it can give you some intuition.

So here's an estimator, the expected value of Y1 but I'll give you some intuition.

So this first part of this is, if you are in

the treated group, then I'd say well okay, use your value of Y.

4:24

So we're taking a sum over n subjects, we're going to take an average, so

we're dividing by n.

Well, if you're in the treated group we're just going to use your value of Y.

Because remember, by our consistency assumption, if you're treated,

Y is actually equal to your Y1, to the potential outcome.

So if you're A = 1, we're just going to use your value of Y.

However, if you were in the control group and

this 1- A here is going to pick off those control group members, right?

Because if they're in the control group, A is equal to 0,

which means 1- A will equal to 1, will be equal to 1.

So it's identifying those individuals.

5:17

So these are people who where not actually treated, right?

Because I've picked them off here.

So these are people who weren't actually treated.

But now I'm going to apply this regression model from those who were treated to this

other population.

What I'm trying to do, essentially, is predict

what their value of Y would have been had they been in the treatment group.

So that's what this is doing.

5:41

So if you combine these two, what we're doing is were either,

if you're in the treatment group, we use your value of Y.

If you're in the control group,

we use the value of Y that we think you would have had.

In other words, our best guess at it, the mean from this model,

if you had actually been treated contrary to fact.

Add those up, divide by n, that's a valid estimate of the mean

potential outcome as long as you have unconfoundedness.

Given X, you've controlled for confounding.

6:12

So if the outcome model is correctly specified, this is an unbiased estimator.

The outcome model being correctly specified, again, means that the expected

value of Y given A = 1 and X is actually equal to whatever our model is m1(X).

So we have some model there, maybe it's a regression model or something.

We have to get that model right, but if we do, then this is a valid estimator.

6:38

So we've seen two different ways that you could estimate this

mean potential outcome.

So one using kind of like a regression model, but

then where you actually average over the distribution of the X's or

this inverse probability of treatment weighting.

Doubly robust estimators are going to, essentially, try to use both of those.

7:09

since these kinds of estimators are becoming more popular.

So the goal is really to sort of introduce the main ideas, try and

sort of get at understanding the concepts.

It will make it easier than to sort of, if you want to implement these in practice,

just take the next step and learn more about it.

So a doubly robust estimator is an estimator that would be unbiased if either

this propensity score model is correct or the outcome regression model is correct.

But you don't actually have to get both of them right.

7:47

And what we'll see here is that there's one part that

looks like the standard IPTW kind of estimator.

And then there's a subpart that I'm going to call an augmentation.

But this involves a regression type model here.

And then there's some other stuff that we need to just kind of make it work,

and I'll show you how it works in a minute.

But you see there's this regression kind of part on the right-hand side,

this IPTW part on the left-hand side.

And if you put this all together,

you'll end up with something that has this doubly robustness kind of property.

And we're going to explore that in a minute and see how it works.

So let's imagine first that the propensity score is correctly specified,

but the outcome model is not.

So our outcome model is wrong, so this m1 is wrong.

And by being wrong, what we mean is that the expected value of Y

given A equal 1 and X, does not equal m1(X).

So m1(X) is some model.

We got it wrong, and so that X, the expectation, doesn't line up with m1.

But the propensity score is correctly specified, we'll assume for now.

Which means that the expected value of A given X, or in other words,

the probability that A equals 1 given X is equal to pi(X).

9:09

So because we got the propensity score right, and

what I'm going to do now is just kind of walk through the intuition.

This is not a formal proof, but

just trying to sort of give you the intuition as why this does actually work.

So the thing that this is estimating, so we have a sum over n.

9:34

What this is estimating is really the expected value of the stuff that's inside

these curly brackets.

So out here we have an average, a sum, then we divide by n.

So that's a sample average.

So as the sample size grows, that becomes an expectation.

So what we really are interested in is,

is the expectation on the inside there equal to the thing we want?

And remember, the thing we want is the expected value of Y1.

10:17

The expectation of this is equal to the propensity score.

Which means that you could think of this as this whole part here as

having expectation of 0.

So essentially, you would expect the part that I'm putting in brackets here,

that part should go away in expectation.

So if you imagine you are averaging this over a large sample size,

10:40

the expected value of A should be equal to pi.

And so you would expect that difference, if you averaged it, to get very small and,

in fact, become 0.

So you expect this part on the right to go away if the propensity

score models correctly specified.

Let me say one more thing about that.

And if that goes away what are you left with?

Well you're left with this part.

11:06

And if the propensity score model was right we already said that that part is

a valid estimator, that's just our standard IPTW.

So if we get the regression model wrong,

we're still going to be fine as far as this estimator goes,

because this part will go away, that'll get small, that'll be 0.

And this part is a valid estimator of the expected value of Y1.

11:29

So now let's kind of flip things around and

say what if the propensity score model was wrong, but the outcome model was correct?

So remember, we have models,

which means we're sort of doing the best we can to get them right.

But we don't really know if either of them is right or if both are right.

11:49

So it would be nice if we had robustness, where we could get one of them wrong,

potentially.

So here, we're going to imagine the propensity score is wrong and

the outcome model is correct.

So if the propensity score model was wrong, what that means is,

the expected value of A given X, is not going to equal pi(X).

Okay, but if the outcome model is correct,

what that means is if you take expectation of Y,

conditional on X, that should equal m1(X).

That's just as background.

What I'm doing first then,

is you'll see this equal sign here is going from this step to this step.

I just rearranged some terms with some algebra, and

that will just make things easier to see.

So from the top line to the bottom line here,

I rearranged some terms, but otherwise, it's equivalent.

So now I want to look at this lower equation.

So I rearranged things for a reason,

because it makes it a little easier to see.

So the first thing to note is that now, if the outcome model is correctly specified,

then the part that I bracketed here should go to 0, right?

Because the expected value of Y, conditional on X, should be m1(X).

So if that difference should go to 0.

13:09

Now, it's being multiplied by something, and there's something in the denominator.

But those things are just going to converge to some constant essentially.

That's not going to blow up or something, and make the product not go to 0.

So this is not a formal proof, but it's just given on intuition.

So if we got the outcome model right, the part in brackets there should go to 0.

13:52

Overall in subjects, so this is just the expected value of Y given a equal 1 and

X averaged over the distribution of X.

Marginalizing out X, that is just what the expected value of Y1 is.

So if we get the regression model right,

then this estimator should be fine as well.

14:34

So you might see these kinds of terms augmented IPTW or AIPTW,

and the estimator I just showed you is an example of one of them.

And there are a lot of these kinds of estimators.

So a lot of this comes from semiparametric theory, and you can use that to

identify the best kinds of these estimators, meaning most efficient.

So there's theory that says what

the sort of most efficient versions of these would be.

So that's beyond the sort of scope of this video, but just to make you aware.

And in general, besides having this doubly robust property, which is obviously

a nice property because you get to specify two models and only have to get one right.

They also tend to be more efficient than regular IPTW estimators.

15:25

So they give you an extra bonus, that they tend to be more efficient,

meaning they have a smaller variance associated with them.

So these are a little more complicated to implement in practice, but

they tend to perform better.