OK, so we talked about our study, 60 people giving us their preference for which website they liked best. This is a one sample test of proportions. That's its formal name, and you can see that I've commented that section here with two number signs and that name. A test of proportion is where you're looking at a proportion of responses. If you find yourself counting subjects themselves, for example, each subject gives their preference, then you may be doing a test of proportions. It's called one sample, because we have a sample of their preference. So on one variable preference, we have proportions whether they liked website A or B. So that's a one sample test. In a little bit, we'll also see a two sample test of proportions. What you'll see me do in this code editor is, occasionally as we go I will highlight a line and then I will click run, or I may press control, enter on my keyboard. And that will run that line of code. And it will copy it and issue it down here in the console. And when it does that some of those lines will give output that we'll also look for in the console at the bottom. So let's go ahead and get started. We're going to load in prefsAB.CSV which is the preferences for website A or B. So as I highlight that line and click Run, you can see that there's no feedback other than it tells me it executed the line. If there were errors or warning to any line I execute, they would show down here in the bottom. Now, it's sometimes nice to then view what we've just loaded, so we say, view, press A and B. And when we do that, we can see that we have a column for subject and a column for pref, pretty simple. 60 subjects each giving us their preference, whether they preferred website A or B. Now we obviously can't tell very easily what differences might exist as we look at the just the table like that. So we'll close that and we'll keep going here. One thing that we're going to do as a matter of good practice is, we're going to recode subject as what's called a categorical factor. Variables all have different types. You can think of variables as the columns we just saw, and a categorical variable, also sometimes called a nominal variable, is one that just has categories. Subject here is already coded here as a number, we need to tell our, actually, it's as if it were a letter, for example, as if it were a name. And that way it's considered a category and not a numeric response variable. >> We can look at a summary of the data then. We can see that there are 54 different levels of subject beyond the 6 shown here, so for 60 total. And then there are 14 responses of A, and 46 of B. Well that would seem like people certainly preferred B more. But again, the question statistically is, is that preference that we see there significantly different? In fact what we're asking in one sample test is, is it significantly different from chance? If 30 people liked A and 30 people liked B we'd obviously say there's no difference. How far away from that chance point to we have to get before we say there's a significant difference? We can also plot this very simply here with a plot command. Notice the dollar sign here, that tells us that we're asking for within prefs A and B, the pref column. And I can go back up above and review this just to remind you what the table looked like as you get used to how R works here. You can see subject now because the numbers are left justified like the letters. It's thinking as a factor. And incidentally, you can also type is factor on prefs AB, the dollar sign accesses subject as the column. And it says now true. It would have said false before we recoded that. When we see the prefs here, we can see these histogram counts of how many of each. OK, so that's kind of our first exploration of the data. Pretty simple. Now we're going to go on to a Pearson Chi square test. This is a one sample test of proportions. And to do that, we are going to run the x tabs command or function. And we're going to do that on the preference column. That's what this means here. And we'll talk more about these notations as we go. If you're confused about anything the way R is written, I encourage you to look it up and spend some time familiarizing yourself. Because we don't have time to go and explain every character and every notation that we encounter. But I'll try to do my best as we go here. And then we're also going to give it the data table that we're looking at, prefs AB. So as we run that, what that does is it creates a set of counts that can be analyzed with a chi square test. And when we just execute prefs by itself it shows us what's in there. You can see again, we have 14 As and 46 Bs, and then we run the chi squared test. So here's the output, chi squared tests for given probabilities. The data is in prefs. Chi squared, that's an X but it's the chi symbol, the Greek chi symbol, is 17, that's the value of the statistic. The degrees of freedom are one, and the P value is very, very small, near zero. Well what does all this mean? Each statistical test has a statistic associated with it. Along with chi squared we'll see the t test, the f test, and perhaps some others. And those the value of that statistic is related to the distribution that that static is calculated over. The chi squared distribution in this case. What's important for our purposes is just to understand what it means if you get a significant result, and how to write that. This result is in fact significant because the P value is less than 0.05. And that's traditionally taken to be the threshold at which we would, below which we would pronounce results statistically significant. So clearly we're very small here, much less than 0.05. The degrees of freedom are a parameter into different tests. Many have just one degree of freedom, some have two, like the F test, and those are part of what we'll report. If you're interested in how degrees of freedom are calculated, it's different for each type of test, and you can look up that level of detail further online. So at this point we know there is in fact a statistically significant difference in the preferences for website B, the redesign, over website A, the older site, which I suppose we should hope if we went to the trouble of designing a new site. How do we report the statistic? Well let's go back to the glass board and I'll show you how we would write it.