After writing our introduction to the research question, we will need to add a methods section to our final report. A methods section has three parts in which we describe the sample, measures, and the statistical analyses that were conducted. In the first part, we describe our sample, which means we describe. The population from which the sample was drawn. And this is where we also describe our selection criteria. For example, in a study on young adults smoking, I might select participants in a study between the ages of 18 to 25. And if I'm interested in something like nicotine dependence, I might want to select only those participants who were current smokers. These would be my selection criteria. So what the sample section does is it tells us who or what is the population of interest. We also need to provide a sample size. How many observations? And then, we need to provide enough descriptive information for readers to understand the population. This is important because any generalizations that are made about the results of your research or your data analyses can really only be generalized to the population that you analyzed. So, it's important for readers to understand what the population is exactly, so that they know who or what to generalize to. Then we need to talk about the measures that were included in our data analyses. We need to add definitions for the variables that were analyzed, and how those variables were managed. Finally, in the analysis section, we need to provide a summary of the statistical methods used and their purpose. This is the place where we talk about how the data were split, the training or test datasets and or the type of cross-validation method we used. The methods section should only include descriptions of measures and analysis that we actually intend to describe in the results section. Here's an example of a description of our data analytic sample. The sample included N equal 435 injection drug products manufactured at the Chicago plant from January 1, 2015 to December 31 2015. All batches were high yield batches, meaning that each batch produce 500,000 and 1 million 0.5 mg drug units. So here we describe the population which are injection drug production batches and we get a little more specific by indicating that they were manufactured at a Chicago plant during the period of one year, 2015, and that they were high yield batches. We also include the sample size, so we had 435 drug production batches that we analyzed. Now, this particular sample of a sample section assumes that the final report will be presented to people in this particular company so they would understand what we mean by the Chicago plant and what types of drugs might be injection drugs. If we were to present this final report outside of our company we would probably want to give even more detailed information. The goal here is to provide an accurate and clear description of the population from which the sample was drawn. And here's an example of a measure section. In the first section we defined the manufacturing lead time response variable. So we know that it was measured for each drug batch by calculating the number of hours between release of the batch manufacturing order and completion of product packaging. So there's no question about how we define manufacturing lead time. Now we also provide the units of measurement of the response variable, which is hours. It's important to provide clear and detailed definition of your measures and how you might have managed them. For example did you take the average of a set of variables, or did you sum a set of variables? Did you choose to bend a quantitative variable into categories? These are all important things to discuss in the measures section. And the primary reason for having a clear definition of the measures is that the definition of a particular factor or characteristic in a study, can be different. For example, someone else might define manufacturing lead time as something different. How we define and manage our variables could provide different results. This is why it's important for you to be clear about how you define your measures and how you manage the data for the analysis in your particular study. In the next section, we describe the predictors and we need to provide definitions for all of our predictors that were included in the analysis. This list of definitions should include only those variables that were included in the data analysis. Again, it's important to define the variables clearly and so that readers understand how you chose to define the variables and how you might have manged them. Finally we need to write about the analyses that we conducted. We usually start by defining the simplest analyses. So in this first section here, I describe how I examine the distributions for the predictors and the manufacturing lead time response variable which was done by examining frequency tables, for categorical variables, and calculating the mean, standard deviation in minimum and maximum values per quantitative variables. Next, I discuss some of the data visualization techniques I use. In this case, I examine scatter plots and box plots to get a preliminary idea of the association between each predictor and the response variable and I used piercing correlation, and analysis of variants, to test the significance of the association between each individual predictor and the manufacturing lead time response variable. Finally, I describe the multivariable analysis. In this example I used Lasso regression with the least angle regression selection algorithm to identify the subset of variables that best predicted manufacturing lead time. I described how the data were split into training test data set in this case using a random sample of 60% of the batches which was N equals 411 batches to develop the algorithm. The test data set included the other 40% of the batches which was N equals 273 batches. THe test data set was used to validate or test the predictor algorithm that was developed in the training data set. They also indicate that all predictor variables were standardized to have a mean equals zero and standard deviation equal to one. Prior to conducting the lasso regression analysis. Then I discuss cross validation. In this case I used a k fold cross validation method. Specifying ten cross validation folds. And finally I describe the criterion used to identify the best subset of predictive variables, and then predictive accuracy was assessed by determining the mean squared error rate of the training data prediction algorithm when applied to observations in the test data set. In the analyses section I describe each of the statistical methods and their purpose, starting from the simplest analyses and going to the most complex analyses.