Hello and welcome to this course on linear regression. In this lecture, I'm going to introduce you to the uses of statistical models and in this course I'll be showing you how to develop and interpret model results to learn about disease. So, my name is Victoria and I'll be your guide through it. So this is the second course in a specialisation on statistics for Public Health. In the previous course, you met my colleague Alex and he showed you how to take your time to work out the right questions so that you can turn your questions into testable hypotheses. And he also taught you practical skills such as assessing key features of a data set and summarising your data. So you're going to need all these skills to enable you to move on to regression, so if you're unfamiliar with these, I suggest you take that course first. So let's start by looking at the way in which models are commonly used for public health research. One use of models is to help us evaluate interventions to see if they work. For example, in a clinical trial we might be interested in estimating the effect of a treatment, such as statin, for heart disease. Or, in an observational study, we may want to determine the effects of a certain exposure such as air pollution on asthma. In these models, our focus is on obtaining an estimate for that treatment or exposure that's been adjusted for all other variables. We can also use models to help us understand the cause of disease, for example, what are the predictors for high blood pressure, or suffering anxiety, or depression? In this case, we're interested in all regression estimates and their relationship with the outcome. And finally, models can be very useful tools for prediction as they can provide us with a chance to intervene in order to avoid future adverse outcomes. So one risk prediction model you might be aware of is the Framingham Risk Score and that model is used to predict a patient's 10 year risk of having a cardiovascular event. We can also use prediction models to help us diagnose patients. So in diagnosis, one or more measurements are taken and the model is used to categorise the patient as either having or not having the disease, and one diagnostic test that researchers are currently working on here at Imperial College London is a breath test to detect stomach and esophageal cancer. They're currently looking at how the combination of several organic compounds in the breath can be used to diagnose patients early. So I've described three common ways we use models in practice and the statistical theory behind these models is the same for all three approaches. So I now have a question for you; do you think that the way in which a model is used in practice will alter the way we approach developing that model? So the correct answer is yes - the purpose of the model will inform important aspects of how we develop that model. While the statistical theory is the same, our approach to selecting variables and the accuracy to which we wish to model the relationships between variables will depend on whether we want to use the model for evaluating intervention, understanding the disease, or predicting a future outcome. So that's why it's important to define the research question before you start developing a model. So you can see, statistical models are incredibly powerful tools for public health research but only if they're good ones. It's very easy to develop a bad model. If we ignore missing data or make unreasonably strong assumptions about the relationships between variables, the model will give us the wrong answer and this can waste huge amounts of resource and lead us down the wrong research track. So this is why it's important you get to take your time to know your data before you start any analysis and then after fitting a model, you check the assumptions you've made. So I'm looking forward to teaching you all about linear regression and model building. We're going to start with correlation and then I'll introduce you to linear regression and I'll show you how to fit and interpret model results, and then the next step will be to unpick the relationships between all of our variables which we'll do using multiple linear regression. So you can see, there's lots for you to learn so let's get started. [Music]