0:02

You've reached this specific spot in the course.

Now, one of the topics that come up for discussion quite a bit is the P-value.

And more specifically, how do values influence the P-value?

And how does the setting of an hypothesis influence the P-value?

So I've decided to make this extra video lecture,

just to give you a bit of an intuitive feel of what happens to P-value when

you start playing with actual data point values and

what happens to P-value when you actually sit the hypothesis.

So I'm going to show you two groups of individuals with a certain variable and

some you may call values for them, and we're going to change them up a bit, and

you'll see the difference in the P-value.

I'm going to use some spreadsheet software, it's called LibreOffice and

show it to you.

You can download it for free and now you can use Microsoft Excel,

you can use numbers on Mac OS.

Any kind of spreadsheet software, but I choose LibreOffice, because it's free of

charge and you can download it for Mac OS, for Windows, for Linux, doesn't matter.

Everyone can install it and it has a beautiful spreadsheet software.

So, that's what I'm going to use.

I'm going to show you these two columns and we're going to play around with

the numbers a bit, and get a P-value from those values.

1:13

Let's have a look.

As mentioned, I'm going to use the spreadsheet software that is

part of the LibreOffice suite of Office programs.

You can find it at libreoffice.org.

It's a free download and you can choose whether it is for Mac Office,

for Windows or Linux.

You can uninstall it on any systems absolutely free of charge and

the spreadsheet software that comes with it is quite powerful, and

actually can do quite a bit of statistical analysis.

I've opened up LibreOffice here and it is a normal spreadsheet program.

Some of you might be familiar with it.

If not, don't worry, this is not a lecture on how to use LibreOffice.

I want to show you two groups patients or individuals that form

part of a research study, we see them here in group A and group B.

Imagine we have some available that contains continuous miracle

values racial type continuous miracle variable and

you see 13 individuals here for the first group, and

13 individuals in the second group, and we've recorded these values.

Now the beauty about the way that I've constructed these values is every time

I press, this is on a Mac.

I'm pressing Cmd+U.

These values change.

They randomly change.

I've set them up to change randomly.

The way that I've set them up though is that they do come from an underlying

population in which the parameter is normally distributed.

These values actually come from a normal distribution.

So even though they change at random here every time I hold down the Cmd key and

press U at the moment, we get a new set of 60 new values.

That's not the point though.

What I want to show you is us doing a student's T-test.

So, it's a parametric test.

We are under the assumption that these values are from a population in which

this variable has a normal distribution.

We're assuming equal variances, so we're going to do student's T-test.

Now if I go to Data down to Statistics,

look at all the statistical analysis I can do.

Some descriptive statistics.

I can do analysis of variance, correlation and regression.

And here, we have the T-test.

Very simply, it's going to ask me just to choose the variable in the first range.

We're going to do this right here, so we just choose all of them.

As I said, this is not a video on how to do these lectures.

I just want to show you.

3:53

Let's choose these.

All the values in my group number two and

I'm just going to tell it to put the results down in that little block for me.

Don't worry about this, let's see what happens.

It tells me it's done a T-test.

I just want to expand on this a little bit.

Let's say, View.

Let's just zoom in a little bit, so you can see these values nice and clearly.

So, we saw that the first group had a mean or

an average of 99.4 there abouts and group two had a mean of 108.

We're going to compare these means in this parametric test and look down here.

We have one P-value and we have a second P-value.

Now this P-value, as you can see is a one tail test and

the second P-value here is a two-tail test.

And notice how the one tail test P-value is exactly half of what

the two-tail test is.

Now, let's look at this.

I'm going to hold down Cmd and I'm going to press U.

So every time I do that, we're going to have new random values and

we're going to get a new result for our P-value.

So, let's play around with it a little bit.

Let's play around.

Let's play around.

There's a beautiful one.

Now, look at this.

5:12

I get a two-tail P-value of 0.09, if my alpha value was 0.055 and

I were to write a report for publication.

These results I would say, well,

I did not find a statistically significant difference.

But if I reported this P-value, it would be 0.04.

It is less than 0.05.

There is a statistically significant difference between these two groups.

Purely based on the way I chose the hypothesis.

And that is of course, absolutely wrong.

And that is why I emphasize so much that the statement of the hypothesis,

whether there is a difference, there's one alternate hypothesis.

So it can be either more or less, the mean of one group can be more or

less than the other or one tail that I already say beforehand that one group will

have a mean higher or lower than the other where I can go for one tail.

This must be set beforehand.

You cannot do the statistical analysis and then based on the results,

then choose which P-value you are going to take.

So you can see clearly here, what absolute difference it makes and

which alternate hypothesis you set for yourself.

So for this set of values, we have a P-value of 0.09.

But if we do a one tail test, 0.04.

This cannot be decided once analysis has been done.

For that reason, also I emphasize in this course that

we really want data to be made available freely.

If everyone can examine the data and understand why a certain hypothesis was

chosen, it is makes reading that journal article and understanding and

believing what is written so much better and I've got another little tab down here.

It's called fixed.

I just want to show you something else, as well.

Now this time, these values are fixed.

I can't you and they change annually, but look at this.

I have a two-tail of P-value of 0.07 and

a one tail up window three, I might have decided beforehand.

Well, we are going to go for a two tailed alternate hypothesis.

In other words, we say, we don't know the difference between the two loops.

One might be more.

Group two might have a larger mean than group one.

We said, we don't know.

It might go either way and I look at through these results now and

I see, well, group 1 had a mean of 101.5, group two had a mean of 106.

So certainly, group one had a lower mean.

And I go through my data again and I notice there's this one case of 120.4,

which is quite a bit higher than the mean of 101.

And I might come up with some reasoning,

some logical argument in my head that there was something wrong with that case.

I'm not saying it's a statistical outline.

I haven't done interquarter ranges to prove that it is.

I just think that I'm so close with my 0.07 and I wonder what was wrong with

this patient, and I can go through the file, and by some logical argument.

I can decide, no, no.

This patient definitely has to be removed.

There was something wrong with that measurement.

Let's delete that patient.

We can still do it.

We still have enough patients in each group.

Look what happened.

I suddenly, from omitting that one value have a statistically significant

difference between the two groups.

See how easy it is to change the P-value and to get to a desired P-value, and

I call it desired, so that it is statistically significant.

Now, I'm not showing you how to cheat.

I'm not saying that anyone in the literature does cheat,

anyone who does research does cheat.

The point here is that I really believe that data should be made

openly available and we should all know what went into that data analysis.

And by doing this course, you now have an understanding of why this is so

important and what must be in place for

you to trust In the P-value that you do find in the literature.