Remember, we simulate the experiment under the assumption of independence.

Or, in other words, leaving things up to chance.

If the results from the simulations look like the

data, then the difference between the proportions of correct guesses.

Can be said to be due to chance.

If on the other hand, results from the simulation do not

look like the data, the difference between the proportions of correct guesses

in the two groups, we can conclude was not due to

chance, but because people actually know the backs of their hands better.

So this is what our randomization distribution looks like.

The heights of the bars here basically represent what percent of the time, or how

many times within these 10,000 simulations a particular simulated p hat was achieved.

Remember the definition of the p value is the probability of

observed or more extreme outcome, given the null hypothesis is true.

And when we think about the observed, we want to

think about what was the success rate in the

back of the hand group, and what was the

success rate in the palm of the hand group?

And we want to take the

difference between these two, because that's

going to be the corresponding point estimate

that looks like our null hypothesis, but is based on our sample data.

The difference between the two proportions come out

to be roughly 33% so the p value is

calculated as the percentage of simulations that are more

than 33% away from the center of the distribution.

And the center of the distribution is always at zero because

remember we're assuming that the non hypothesis is

true and we're leaving things up to chance.

When we shuffle them into the two decks.

With a p-value of 0.16, 16%, we would fail to

reject the non hypothesis and say that no there isn't

actually convincing evidence based on these data that people are

better at or worse at or least there's some difference

between how they recognize the backs versus the palms of their hands.