This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

Loading...

來自 Johns Hopkins University 的課程

Principles of fMRI 1

338 個評分

This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

從本節課中

Week 3

This week we will discuss the General Linear Model (GLM).

- Martin Lindquist, PhD, MScProfessor, Biostatistics

Bloomberg School of Public Health | Johns Hopkins University - Tor WagerPhD

Department of Psychology and Neuroscience, The Institute of Cognitive Science | University of Colorado at Boulder

Hi.

In the past couple of modules, we've been talking about model builder.

And we've been talking about how to get a good design matrix for the GLM.

Now in this module we'll talk about GLM estimation.

So again to recap, the standard GLM can be written in the following format.

We have y which is the fMRI data, X which is the design matrix, which we

learned how to build in an appropriate manner in the previous modules.

And we have beta, which are the regression coefficients,

and epsilon, which is the noise.

So now what we're primarily interested in is learning how we can estimate beta

from this model.

So let's, for simplicity, first assume that epsilon is normally

distributed with mean zero and is i.i.d., the individual epsilons.

So, the variance covariance matrix is the identity times sigma square.

So here we're primarily interested in estimating beta and sigma squared.

So here the matrices X and

the vector Y are assumed to be known and the noise is assumed to be uncorrelated.

So, at the end of the day now, in the GLM, our goal is to find the values of beta

that minimize Y minus X beta transpose, Y minus X beta.

This is the sums of squared errors, or SSE.

So, to put this into context, we can think about this when we have

one column in our design matrix, and this is simple linear regression.

So, we have Y on the Y axis and X and on the X axis.

And so we want to find the line that best fits this data.

So, in the least square sense, that line is the line that makes

the distance from the points to the line in the Y direction.

So the e1, e2, and e3, the sum of the squared of those

residuals as small as possible.

In two dimensions, when we have two explanatory variables x1 and x2,

we look for a plane and we try to find the plane that best fits this cloud of data.

When we have a design matrix with p columns,

we want to find the p-dimensional hyperplane that best fits this data.

So, the least squares criterion,

we're going to write this as q equal to this least squares.

And what we want to do is we want to find the values of beta that make this as small

as possible.

So one way to do that is to take the derivative with respect to beta and

set the result equal to 0.

This gives us the so-called normal equations,

which is X transpose X beta hat = X transpose Y.

So, by solving for

this, we get the ordinarily square solution, which is the following.

Beta hat is equal to X transpose X inverse X transpose Y.

So that's the key equation when talking about the GLM.

That's our kind of standard estimate for beta hat.

So since X and Y are both known a priori, we can find this

estimate quite easily using our design matrix in our data.

So, the ordinary least squares solution has a number of interesting properties.

First, the expected value of beta hat is equal to beta.

So, it's an unbiased estimator.

So an average, beta hat gives us a good estimate of beta.

Also, the variance of beta hat is equal to

sigma squared times X transposes X inverse.

So, what does that mean?

Well, it turns out that according to something called the Gauss–Markov Theorem,

any other unbiased, if you think of all the other possible estimators of beta

that are unbiased.

Any other unbiased estimator of beta is going to have a larger variance than

the OLS solution, the ordinary least squares solution here.

So, any unbiased estimate that we have will have a larger variance

than the variance of beta hat.

And so, we call beta hat the best linear unbiased estimator or BLUE.

So again, to sort of recap here, if epsilon is i.i.d.,

then the Ordinary Least Square's estimate is optimal in this best linear,

unbiased sense.

So, in this case we have the model Y is equal to X, beta plus epsilon and

we get the following least squares estimate.

So if the variance of epsilon is not i.i.d., and the variance

of epsilon is equal to V sigma-squared which is not equal to the identity, then

we have to instead use something called the Generalized Least-Squares Estimate.

And that's going to be the best linear, un-biased estimator.

So, in that setting,

we also have to include estimates of V when computing beta hat.

So in that case, beta hat becomes X transpose,

V inverse X, inverse X transpose V inverse Y.

So, basically what happens here is we have to include the code

variance matrix into our estimate of beta.

Now, you'll see that the Ordinary Least Squares

estimation is a special case of this, because if you put in V equal to identity,

the estimate becomes the Ordinary Least Square solution.

So basically, when we have other correlated epsilon,

the OLS is not the optimal solution anymore and

we need to use this generalized least squares solution instead.

So here's again a recap.

So we have the model here.

And this is the Generalized Least Squares Estimate.

And so that gives us the fitted values Y hat.

And using these fitted values, we can compute the residuals as follows.

And here, the capital R is what's called the residual inducing matrix.

So, it's just sort of a notation that we often use in statistics to

say that this is a way of taking the data, and moving it into the residual space.

So even if we assume that epsilon is i.i.d.,

we still need to estimate the residual variances at sigma squared.

And so our estimate of sigma squared is going to be based on the residuals.

So, it's the transpose of the residuals time the residuals R transpose r,

divided by the trace of the residual inducing matrix mv.

And so, in our setting, in the OLS setting, our sigma hat squared is

going to be r trans plus r divided by capital N which is the length of y,

minus p, which is the number of columns in the design matrix.

So this might be familiar if you've taken a linear regression class.

However, if V is not equal to identity, so

we have auto correlated noise, things become more difficult.

And that's the focus of the next module.

>> Understanding estimation allows us to understand the GLM in a geometric way.

So let's think of why the dependent variable as a vector in the space with

one dimension per observation, usually per subject.

So now imagine a data set with three subjects.

There are three dimension in y.

So y is a vector in three dimensional space.

Let's also imagine we have a model with two parameters, two regressors.

That means that the model space spans two dimensions.

And what we can see here is a frame with three dimensions for y, and

the two dimensions that define the subspace of the model.

Now, any two parameters,

all possible in near combination to those parameters, span a plane.

And so what we see here is the plane is the subspace that is spanned by the model.

Now, what are we doing when we fit the linear model?

The goal is to minimize the sums of squared errors as you see here.

So now I've color coded the error vector in red, and

the original data in blue, and then the fit in green.

And what we're seeing here is when we solve for

beta, we calculate beta hat as X transpose X inverse X transpose Y.

And what this is doing is projecting

the data Y onto the subspace that's defined on the columns of X.

So, I like to think of the projection as shining a light source.

Imagine a light at the top of the frame you see there.

And the light's shining down and it's shining past Y.

And imagine Y as sort of a solid bar.

And it's shining down onto that plain that's defined by the model set of space.

Now, Y is going to cast a shadow that's as close as you can get to the data

in the model subspace.

And that shadow is the fit.

That's the green line, the dark line in the plane.

And that is the projection of Y, the data, onto the model subspace spanned by X.

And so it turns out this matrix here, which is also sometimes written as

X minus Y, so pseudo-inverse of X, is the matrix of a projection onto a subspace.

So that's the end of this module on estimation.