I wanted to provide you with some overview of PATRIC before we start so you can get some sense of the amount of data that's there and how to find it. How to find anything, it can be through the tabs or through the global search function. You'll notice from the homepage, we allow you to browse data from bacteria, archaea, bacteriophages, and eukaryotic hosts, some of those that are used in RNA-seq, and so we allow you to search for that data. We also give you direct access to the different tools, different data types. If you're logged in, you'll see your own data here. But let's march through the tabs and see what you can find in PATRIC. If you click on the "Organisms" tab, you can see a list of bacterial genera, but that's not all the data that we have in PATRIC. Anything that has a genome, that sequence that's in GenBank, we have it in PATRIC as well. This is NIAID priority pathogens, so we want to provide direct links to those. But you can go into the all bacteria page, the archaea page, the data that we have for bacteriophages, and for the eukaryotic host, so all of these are hyperlinks to those particular data types. For data, these are special data collections that we've generated over the years that PATRIC has been funded. This shows how we analyze different data or bring it into the resource. For example, if I click on the "Genomes" and you scroll through here, this is just good background information for you if you have questions. What kind of data do we have? How does it come into the resource? What can we do with that type of data? Those are fun and I really do the figures that we have with those two because they make it easy to understand what we do. One thing I want to point out is if you don't want to drill through the website to get your data, you could click here for the FPP server and navigate to the data that you're interested there. Workspaces. Well, there'll be a whole instructional video about workspaces, but every registered user in PATRIC gets their own private workspace, which is represented by the home workspace. We give you predesignated folders in that; for genome groups, [inaudible] groups, an experimental group data, but you can also create your own folders and put your own private data in it. Remember it's all free. Right now we have no limits to the amount of data that you can store there. If you look at the higher workspace tab, that's where you'll have workspaces that are shared between you and other users. We talk about how to do that in the workspace video. Public workspaces are data that's available for everybody. In fact, this course, you will find the data for that is in the public workspace. If you have run jobs in PATRIC or have annotated genomes, this will give you direct links to tables that show you that data. In 2015, we started providing services, and you can see that they've grown. We have a lot of different services that you can do in PATRIC. This course, for comparative genomics, is concentrating on assembly, annotation, the comprehensive genome analysis service, we touch on blast, but I don't do a deep dive into the service, similar genome finder, phylogenetic tree, protein, family sorter, and proteome comparison. We're planning on wrapping into subsequent courses, each of the services. But for now, this is what we have and we'll be going over those in each of the upcoming sections that we're going to talk about. If you have questions or need more information, the help tab will show you how to do a quick get into PATRIC. We have user guides, tutorials, common tasks that people are often asking us questions about, webinars, any of instructional video that we've had, workshops that we've done in the past, and for questions you would click on "Provide Feedback." But I wanted to point something out. What we're going to be doing in this course is looking at the website and the interface and allowing you to launch tools and drill down into data through the website. But many of you are very comfortable with command line, and you don't like to do all these clicking when you can just do it just as easily by command line. The CLI tutorial, if you click here, that will show you where to get the PATRIC command line interface and how to use it, so that you can just grab all the data or run many of the analysis services through this without going through the website. Very quickly, I wanted to step through one of the organism so you could see the data that we have for each of these genera. [inaudible] was inordinately fond of beetles, I myself, inordinately fond of Brucella. This is the landing page. This gives me taxonomic information and you notice each of these when I mouse over it, a line comes underneath it, which is an indication that that is a hyperlink. The taxonomic breadcrumb is at the top and I could easily march up the taxonomy here. You notice that we have a number of tabs here. Each of these has unique information associated with this organism. We provide phylogenetic tree so you can get a sense of where the different species are within this level. These are ones that we have generated. We took what we considered to be high-quality genomes. We did this several years ago. Your private data won't be here and not all the genomes at this genus will be here. This is to give you a sense of where they are in either the order or the family level. One thing I wanted to point out is you can click on a branch and then easily go to Genomes or create a group of them. We provide a page that shows you the taxonomy, the GenBank, as we map here. The genomes that are available in Brucella on each of the tables, we provide filters to allow you to dig more deeply into the data with this dynamic filter and capture the data that you're interested in. Any genome that comes into PATRIC, whatever data is associated with it, if it's in BioSample or BioProject or the researcher put it into the note when he submitted that data, we parse that out to try to make it easy for you to identify or to find data that you're interested in. You could search for specific data with the keyword search to do that. Just to give you a sense of that, If I were to click on a given strain, you notice beyond the green bar, it shows all the data that comes with that organism. It gives you direct links to BioProject, BioSample, the assembly, the GenBank accessions, the RefSeq accessions. Let me just take this off here and let's see what happens if I put in something like blood. Right now I'm looking broadly across these guys and this will show me all the genomes that actually have that word somewhere, either in the BioSample, the BioProject or in the note. What we're trying to do is make it easy for you to find the data you want to use to compare your data to. When we used to have eight bacterial genomes, that sequence was the most important thing, but now it seems that the metadata associated with it is very important to allow people to look for things that are of interest. As it grows bigger and bigger, we're trying to make it easy for you to find it amidst this sea of data. It could be like finding a needle in a haystack, but we're trying to make it easier. We're trying to make the needle a lot bigger in the haystack to enable you to find it. If we have FEM, AMR phenotype data, we provide that. The sequences are all the individual contigs. The features are all the individual genes. The specialty genes, any genome that comes into PATRIC is blasted against collections that we have of virulence factors, antimicrobial resistance, proteins, transporters, drug targets, human homologs so that you can drill down into that. Some of these pages at the genus level take a little while to load but you can click on that and you can see like if I wanted to find from the public data all the virulence factors that have homology to those found in the virulence factor database that were identified by BLASTP analysis that are involved in that or whatever, it doesn't matter, but you can see that you can quickly drill down into that data. Every genome that comes into PATRIC, public or private, is annotated using the RASTtk pipeline and in the course of that, they're assigned genus and higher level protein families. The Protein Families tab allows you to drill down on those and this will take a while to load because it's a huge number of genomes. We do pathways as well. Subsystems are a effort that we have with the fig group. They are biochemists that come in and link together genes of interest. It's an amazing resource and it takes a while to load. You'll notice you can see the overview, the subsystems, and the individual genes, all of which take a while to load. What I like to do when I get this is take it all, download it in Excel, and then I can manipulate the data in ways that I can see it, base it on the different superclasses and classes. Some organisms we have transcriptomic data for several years ago we went in and got microarray data that was publicly available, and for a number of organisms, we took that data, we re-normalized it, so that we can make comparisons, and then we brought it in so that people could see if they're looking at a gene, and they want to see if it's going up or down. They could look to see if the other experimenters had seen similar patterns in the up or down expression. Our goal was to start bringing in RNA seq data to supplement this. In fact, it would take it over because there's so much of this now, and we also have a cap on protein interactions. I wanted to go back to special teachings for a minute just to take a single gene here, and if I click on that you notice the vertical green bar becomes populated with possible downstream functions. There's one last thing I haven't shown you, we provide you with data at a high level for an individual taxon, but you can also look at individual features or genomes. First let's look at just this feature that I've chosen at random. This is the feature landing page. We have an overview that tells you a number of things. It also gives you the RefSeq locus tag, and this is an important thing to point out. If you come in with a gen bank or a RefSeq locus tag, you can use the global search to find those genes at [inaudible]. It tells the protein families that's assigned to more information about where the start and stop of it how big it is; and this is the gene neighborhood, more information if fits any of the [inaudible] Helen's factor databases that's described here, and you notice that this has direct link out to specific publications that are indicative of the burial ends up the gene that this one blasted to. Note too, that we have these other tabs here; the Genome Browser. You open up the genome browser and it shows you where that gene is, and also if the RefSeq annotation is available for that organism, you'll be able to see similarities; you can drill down or out, you can even get to the six-frame annotation with this, compare region viewers taking that gene neighborhood, and looking across all of PATRIC and asking which genomes have a neighborhood that's similar to mine. This is a pretty cool tool here to look at that. Red is always the one that you came in on and they're all colored the same, they are named and the same. If this gene has transcriptomic data from that microarray data that came in, you would find that here, and if there were any protein interactions. I wanted to step up to an individual genome so you could see what that looks like. So this is the bread crumb that tells us where this genome is in the great scheme of life, and this is the gene on it. I could click on this; and that takes me directly to the genome; and look even more tabs. This is all about the tabs and Patrick, this is what every single genome landing page in PATRIC looks like, whether it's public or private. So your private data is going to look just like this except where it says RefSeq, everything will be zero. It gives you information about the organism, information about the annotation, and then you'll be able to march down on the similar views. We've already seen that genome browser for the feature, this is a fun one. This will give you the circular view showing you where the genes are located on the different strands, and you can actually remove strands or add in custom tracks. Sequences are the number of [inaudible] that are available in this genome, and you can get a variety of information from Matt, including saying that features that are on the individual contexts as well. The features are the genes and this will show me all the genes that are called on this genome, and also the RefSeq locus tag that corresponds to it. The special teachings are much like we saw at the taxon level, showing you the different databases that were blasted against and allowing you to filter down to them. The protein families, the pathways, the subsystems, transcriptomics and interactions are all like the taxon landing pages. This was just a quick overview of the data and how you can drill down into it and PATRIC. We'll be using some of these functions later on when we're looking at data to compare our private data too, and you can do more of that in the future.