Hi, my name is Yan Kou. I'm a PhD student at Dr. Avi Ma'ayan's lab at Icahn School of Medicine of Mount Sinai. In the previous lecture I introduced the of top hat and cuff links to analyze RNA-Seq data. Once we construct the we will be able to be able to apply various statistical methods towards a deeper understanding of your data. In this lecture I want you to use the CummeRbund R package that comes with the pipeline for the downstream analysis. To remind you, here's a pipeline of TopHat and of Cufflinks. We have learned to align the RAC [INAUDIBLE] to the reference genome using TopHat. Assemble transcriptome and identify differentially expressed genes using Cufflinks and Cuffdiff. So the CummeRbund package was developed by the same research group. It directly takes output from Cufflinks, and provides functionality to visualize the transcriptome data and several frequently used data analysis approaches. Here's a list of point we will cover in this lecture. Since CummeRbund is an R package, we will first need to learn some basic operation in the R statistical programming environment. Once we get familiar with the R language, we will note the CummeRbund package and learn how to evaluate quality of the RAC data. We will also perform the hierarchical clustering as an example of down stream analysis. And also, to remind you, here is the data we have. We have knockout and welltype and three different time points, G1, G2, and G3. So, what is R? It's a language and environment for statistical computing and data visual visualization. It provides a wide variety of statistical and graphic techniques. It produces publication quality plots and is easy to manipulate. There are many well developed package and produce publication quality plots and is easy to manipulate. R is an open resource project which means that you're free to develop your own libraries or packages. And you can share it with everyone else. There are many well developed packages for many kinds of statistical analysis methods, and free to use. You can also obtain the source code and rewrite it to suit your own tasks. However, compared with the well-maintained commercial software such as Matlab and Mathematica, there's no guarantee of the quality of an R package. As a beginner, I suggest you use packages that have been frequently used and widely tested. Or write your own function for better control. Fortunately for biologists, we have the bio conductor which collects R packages performing high-throughput sequencing genomic data analysis with almost all well established approaches. Several packages will be needed for CummeRbund to be fully functional. Similar as other programming language, we can simply write R code in a text editor and execute with R. However, this is not the best way to analyze your data with R. There are several graphic user interface become available in recent years and I highly recommend the RStudio. When we first look at RStudio work space, this is the interface of RStudio. On the left, we have a console that we can type in any R command, on the right, we have work stays, history on the top, in files plus packages we have from the bottom. I will talk about these in a minute. The first creative variable, and I found a value to fit the variable. We want to give A the value 5, hit Enter, another job here, but if we look at the birthdates, we can see. You can see age was up here or we can also type A in the console, and a job in the console. Now let's create another variable, b, with a value of three and we can also do a simple addition of a and b and get the value of c. Now, c equals eight, which is a and b. And on the workspace we can see all of them listed here. We can also import data in this workspace, for example from a text file in your file system. On the other buttons we can delete the variable or save the workspace or open a saved workspace data. Now if you have a bunch of code, you don't want to generate, you don't want to execute on line by line a console, you can create a script and type it here. You can also execute the lines. Just move your cursor to the line you want to run and click plus enter. It will go to the console and execute in the counter. Now if we look at the work space a value of H and 2,6. This history path records all the previous commands we have executed. And the good thing about this history path is we can click the previous command and click this, for example, C equals A and B, and click this To Console button. And this command will go to console, and we can execute it again in the console over here, and we can also select a series of commands and click To Source. And all of this commands will be paste to the source of the script. This is very handy if you want to repeat some previous analysis. We can also open a history file from existing file. For example, if you are working on another project and you can import all the previous commands you've activated for that project and integrate to your current one. Now move down to the Files tab. So the files list all the directories in your computer, since the same as you were browsing within your system. For example you can move around, go to the about this file's path is that, so by this one, R needs to know what's the working directory. And that way search through some data files that or keystroke files or scripts, I will look for these files under this working directory. If you want to set from the console, you need to type in the set directory function followed by the path to wherever the folder is that your data is stored. Now, if you're using this those have, we can first move to the directory where you want to work from and click this more button, for this set and working directory. And now once you click it, our studio will set up bits where you can see here as a working directory. It saves you the trouble of typing these long path names. The plots tab will show any plots You did want any plot and we'll give an example later. And this package shows all available packages that you can load from RStudio. Most of them comes with R when install it, but if you have some customized packages, it'll also show up here. You can easily upload a package by click the check box here, and it's equal to typing this library option. And you can click it again, now this package will be detached. This is handy to know the packages instead of typing in the console or in your script. Now the last tab is Help. It shows the help documentation of the functions. You can search function from this search here and it's let from a drop down list. Now I want to show you the example plot. So let's search for plot. Now the good thing about help function is that it gives an example of how to use this option. And you can just simple like copy and paste the entire thing in the examples and execute in the console. Hit Enter. And it generates the plot. This is [INAUDIBLE] could be plot. You can do, or export as a image or a PDF. Now if you have a basic lab, you can export as a PDF and open in Illustrator or Photoshop. And they could do public user [INAUDIBLE]. Now that we've seen examples of create variables and a basic plot. R as a statistical program language, R also offers the basic math operations. For example log and the exponential, it is typing the function. Now, except for a single value, now I want to create a series of values and to the active variable I can use c, stands for concatenate. I give it five values, now let's see what's the value of x? It has five values in the array. We also have in the console, it's the same, five values. Now in the work space it shows it's numeric. We can also check the type values using the mode function, it says numeric. R also allows us to take the keyboard input using that scan function. For example, [INAUDIBLE] scan and it gives me the index and I can type [INAUDIBLE] so now I have six items. But if I want to create a series data that follows okay, use the seek function. For example, I want to start from -1, goes up to one step by 0.1. Now we have exactly what we want. For the seek function can also give you more complicated rules. For example, I wanna have 40 elements starting from -1, and 0.5. Now let's take that's exactly where we want it. Another useful function is a wrap function that we can create repetitive patterns in the array. For example if I want to have an array has ten threes, I will type (3.10) and now z2 has 3 by 10 times. But we also, except for numbers, we can also create a character array by using the wrap function. That gives me another value. Now we have ten times of A, instead of ten times of three. We can also give rule to the wrap function. For example, I wanna have twice the value of one through six. This column stands for one through six. And now we have 1 2 3 4 5 6 1 2 3 4 5 6 and we can also put the wrap function inside a wrap function to create an even more complicated array. For example here, I will have three times of each value and counted from one to six. So seeing that R has a numeric and the character type of variables. We can also have logic variables in R, for example, here, I give a true value and when I check mode of a, it shows logical. So basically we have true and false for the logical variable. And then we can also create a character for array, so now we have a, b, c in the c variable and the mode of c variable is character. But when we're getting the biological data, we might see like in a Microsoft Excel file we have a table that contains the data and the headers of the columns or the rows indicating what is the data. In R, it's very easy to manipulate the headers of columns. For example, let's [INAUDIBLE] create a variable called birthday and give it numerical value that's the year, month and day. So now I want to give these three columns a header for each. We'll use this named function, so the name of birthday is now given by array of characters, which are year, month, and day. Now if we check the names of birthday, which is the headers of birthday, we will see year, month, and day. We can also retrieve the columns value by using the headers. For example, if I wanna see the month of the birthday, I can retrieve they're using this number and determining it's five. Next I'm gonna show you how to import your old data, do a basic block and export the variables from our two x file. So first let's look at example data. So we have two columns of values. The first column is x values, the second column is y values. So we're gonna apply x better a plot of the data point. First we need to import the data. We're going to assign the data to a variable called data, and then use a read table function. First, we need the name of the file, and then keep the header from the file here T stands for true, and the data are separated by cap. This back \t stands for cap. Click Enter, and we can see that data is already here in the Workspace, and we can click on it. It'll give us an overview on the left, then if you notice that this click also equals to the view data function. We can also, if you have a huge table, we can also see the first few lines of the data by using the head function, and we can also retreat by using a dollar sign followed by the name of the variable. Now let's plot. Here, I just want to do the basic plot, so I give the x values and y values, in the correct order and it would show up under the Plots tab. Notice that the labels for x and y axis is the default and I wanna change that. So, let's do the plot again and this time we specify the x labels and y labels. We can also specify the color for the data point using the color parameters. Let's say dark red. And notice that instantly the x and y labels, and the color of the data point change. If we like that, we can plot new data points. So the plot function also gives some other parameters that we can set up, except for the x and y labels. Definitively you've made some changes of the data. Now we wanna save the changes to another file. We'll use a write table function, but first down here we wanna save, and we want to mail our other files, and we want to separate the ticket by half, and we also wanna keep the quotes. So we set up the code with quotes, otherwise R will give a quote mark wrapping each data. And I wanna keep the headers, so I set two equal column names in quotes for the row name, cuz we don't really have the row names, and we can take a look and see, let's take a look at what's in the file. So, this is exactly what we expected. Now, what if we set up the roaming at two? What's gonna happen to the output file? We can see that R automatically assigned an index for each of the rows, this is not what we want. So we set up x and quotes, and so instead of searching for help documentation in the Help tab, we can also use the question mark followed by the name of the function in the console. Now as we tab in, the help function or the help documentation will show up here on the right. So the last thing I want to show you is how to install packages. RSVU offers a handy way to get from the user interface. Now we're typing the name of the packages, but usually it will generate a drop down list, and we can select the package from the drop down list, and click Install, and it will automatically install the package, yeah. So now it generates the drop down list, but for example, if you type karma button it couldn't find automatically. So we have to go to the bio conductor to see how to install CummeRbund. Let's move to file conductor and search for CummeRbund. Click on the list, and here it shows you how to install the CummeRbund package. So basically, we just need to copy paste its two lines of code, hit enter and it will automatically install the packages. It'll first find the packages and then install it. So here, I already have CummeRbund, so I'm gonna skip this step. [MUSIC]