[NOISE] >> Howdy, when your doctor orders a blood test, it comes back with fewer than 20 parameters to analyze. But modern genomics technology can measure the expression of over 20,000 human genes in a single experiment. How can a doctor process all this information, let alone make a decision on how to treat a patient? Now, imagine a doctor who measures the expression of every human gene in 50 consultations and 50 healthy people. The doctor gives this data to you, a computer scientist, but doesn't tell you which of the 100 data sets correspond to the 50 cancer patients. Why does the doctor hide from you which data correspond to cancer patients? Because if you can correctly separate all data into healthy and cancer clusters, in a blind experiment, that you can potentially determine whether a new patient has cancer just from that patient's gene expression data. Perhaps, the diagnostic can identify cancer even before the patients show any symptoms of cancer. >> Revolutionary cancer diagnostics like this are being developed today, including the breast cancer diagnostic Mammo Prep, which can predict a patient's likelihood of breast cancer recurrence based on the expression of just 70 genes. A related biological problem is to partition thousands of individual human genomes into clusters in order to identify a genomic basis of ethnicity. We can then start to answer questions about your genetic heritage, as well as determine whether you, like me, have any Neanderthal DNA. These two biological problems fall under the same computational framework of assigning data points to clusters, so that elements in the same cluster are similar, and elements in different clusters are dissimilar. This problem may seem simple, but it quickly becomes a major challenge when we attempt to apply clustering methods to biological problems, which range from finding genes responsible for the circadian clock to clustering yeast genes in order to determine which ones are responsible for wine making. We hope that you will join us in this class to learn about these challenges of clustering biological data. >> Although these instructors may appear crazy, they are not quite as mad as they look. Dr. Pavel Pevzner is a distinguished professor of computer science at the University of California San Diego and a leading authority on bioinformatics. He's dressed this way because he sometimes thinks that he's a sheriff of bioinformatics, a frontier discipline, underpinning the digital revolution in biology and personalized medicine. Dr. Phillip Compeau is an assistant professor of computer science at Carnegie Mellon University. To learn why he is dressed this way, you'll need to take this course or read the textbook, Bioinformatics Algorithms: An Active Learning Approach, co-authored by the two speakers.