0:05

So what if we go back to our question asking if the explanatory variable was

manipulated?

And the answer is yes.

Data that come from studies in which the explanatory variable is manipulated,

are called experimental data.

Experimental data come from studies in which groups of observations are either

pre-selected or randomly assigned, and the values of an explanatory variable and

then observed on some response variable.

There are two major types of experimental studies, True Experimental studies and

Quasi Experimental studies.

There are three components of an experimental study, first,

only one explanatory variable is manipulated.

Meaning that all other variables,

that could also be related to the response variable, are held constant.

The only thing that changes, is the value of the explanatory

variable that is being manipulated by the experimenter.

Second, there must be a control group, to which

other values of the explanatory variable are compared to, on the response variable.

And third,

observations must be randomly assigned to values of the explanatory variable.

This means that every observation starts out with an equal probability of being in

each group, but is then randomly chosen to be in one group or another.

For example, an agricultural researcher might be interested

in determining the effect of a new fertilizer on plant growth.

In this study, each plant is an observation.

Fertilizer application is the explanatory variable, and

plant growth is the response variable.

2:06

After the three month period,

the researcher measures the height of each plant in both groups.

The researcher found that the plants that were fertilized grow an average

of two inches higher than the plants that were not fertilized.

As a result, the researcher then concluded that the fertilizer significantly

increased plant growth, and

recommended that farmers should be encouraged to use the fertilizer.

So you can see in this experimental study, that all other variables,

with the exception of the explanatory variable of interest,

are held constant in each group as a result of the experimental design.

Because all other factors that could affect plant growth were held constant

in this experiment, the researcher could conclude that the fertilizer

caused the plants to grow higher.

Most of the data we work with however is not produced by a true experiment.

Most of the time we can't physically control all, or

even any of the other factors that might affect our response variable.

So for most studies we are not able to determine whether one variable

causes another variable.

But we are able to determine associations.

3:07

Random assignment is another way we can control for these other factors.

The idea is that if every observation in the sample has an equal probability

of being in each of the groups, and truly, randomly end up in one group or

another, then the groups end up balanced in terms of the other factors.

So if age is a factor, then the group should have the same age variability and

this equal variability essentially controls for that factor.

And this should be the case for

any other factor, however randomization doesn't always work the way we want it to.

In fact randomization works best as your sample size approaches infinity.

Unfortunately we work with finite samples, which can often be pretty small.

The smaller the sample the greater the risk that the groups will be unbalanced on

factors that could affect how the treatment affects the response variable.

If part of your job as a data analyst is to evaluate data from studies with

random assignment, one of the first things you'll wanna do is to check for

any imbalances between your treatment and control groups

on key variables that could change how the treatment effects the response variable.

If imbalances are identified, then those variables can be included in

the statistical model to predict the response variable, so

that they can be statistically controlled.

Statistical control is another commonly used strategy.

If we include additional explanatory variables that could effect

the association between the treatment and the response, than we could examine that

association after adjusting to the other explanatory variables.

Well, these are all good strategies,

from posing as much control on a study as possible.

They're not perfect.

Nor can we possibly control for everything that could affect the association between

the treatment and response variable.

For that reason, unlike a true experiment in which we are able to hold

every other possible variable constant, we cannot determine causality.

We can only determine whether the treatment is associated with

the response variable.

Sometimes, we can't randomly assign people to a treatment or control group.

In many cases, it would be unethical to do so.

For example,

if we're conducting a study to examine the association between cocaine use and

memory processing, there's no way we could assign some participants to use cocaine.

This would be completely unethical and

we put our participants at significantly greater risk of harm.

It certainly would not outweigh the benefit of any knowledge that would be

gained by the study.

Instead, we would have to identify people who either test positive for or

self report, cocaine use and

then test for memory processing differences between users and non-users.

The manipulation of the explanatory variable is based on the fact

that our treatment and control groups are pre-selected.

In this study, cocaine users would be in our treatment group and

non-users would be in our control group.

So while it looks like an experimental design, it is missing the random

assignment piece, and we call this a quasi-experimental design.

We can increase the rigor of a quasi-experimental design

by measuring as many confounding variables as possible.

Having a control group and using a pre and post-test design whenever possible.

A quasi-experimental design will not allow us to infer causality

between an explanatory variable and our response variable.