We've really come along in this course, and we've learned quite a lot. And I think it's time for another case study. And we're gonna take something that was published, a paper, and we're going to take a closer look at the research question. We're going to try to identify the study type, what type of sampling was used to get subjects to analyze. What variables were collected to answer our research question and what data types these variables were, and what the analysis was. So here's our paper. It's the Prevalence and Characteristics of Hepatitis B Virus coinfection among HIV-positive Women in South Africa and Botswana. You see there from Matthews et al, and it was published right this year in 2015 plus 1. You see the reference there, read it, quite good information in this paper. Now let's start with a research question. Now a bit of background. The HIV is really endemic in many sub-Saharan populations unfortunately. But what's the progressive availability and success of NT retroviral therapy, that's therapy for HIV. It really has reduced opportunistic infections and malignancy. And those were really main causes of mortality in these patients. So now we have increased survival and we now see the emergence of previously unrecognized chronic liver disease, and of special concern is viral hepatitis. So these patients live a longer life now. The HIV is under control and now we see other problems apart from the opportunistic infections and malignancy. Now there's also an increase in evidence that co-infection now, that is HIV with viral hepatitis, specifically hepatitis B, is associated with morbidity and mortality exceeding that as with either of them alone. So there certainly is some evidence of that. Also, many in Africa are vulnerable to liver disease. Now, that's due to genetic factors. That is due to the diet and especially exposure to toxins, making their livers really vulnerable to infection. Now there has been some articles in the literature, but the burden and characteristics of this coinfection HIV and hepatitis B is only known through small studies of disparate population. So really the aim of this piece of research was to investigate the burden and the characteristics of the coinfection. So it's really epidemiology and nature. And from the paper itself they state they aim to assimilate more information about the nature of hepatitis B virus, HIV coinfection in this region, so that was really the research question. So, what study type. I think that's an easy question to answer. First of all, it was observational so there was no intervention, this is not a trial, this is not a clinical trial, this is an observational study. And, we want to know how many people are infected, and we want to describe them. Clearly, this is a cross sectional observational study. Let's look at the sampling. And if you really read it carefully, there's a bit of a mix of sampling techniques. First of all, there was clustering by clinics. They went to certain clinics, identified certain clinics, and these are antenatal clinics, and pediatric clinics. In Durban, in Kimberley, these are city, and very large town in South Africa. And also a clinic in Botswana. So, they looked at everyone in those clinics. So that is really clustering by clinics, and then they stratified patients according to their HIV status. So they're either positive or negative. So it's really clustering by clinics and then stratification by HIV status. So the HIV+ patients really came from two sets of patients in South Africa and one clinic in Botswana. And, then also identified another clinic in Durban, South Africa where they identified some HIV- patients and included them just so that we can compare the HIV+ and negative patients. In the end, they had a cohort of 1022 women. Now, if you read the paper carefully, most of these patients form a part of previous studies, previous observation studies. Certainly a group that has been very active. So let's look at the variables in the data types. So we have a research question. We have this cross-sectional study, and we want a bit of information about the prevalence of this co-infection. So the first was these cohorts. What clinics do they come from? If you come from one clinic, the next clinic, or the clinic in Botswana. These are nominal categorical data types, Durban, Kimberly, and if we look at the country as well. We start off again in Botswana. You can't, those are not numerical values, even though we can count how many patients come from each. The word Botswana, and the word the data type Botswana, and the data type of those are nominal categorical data types. HIV status as well is positive or negative. Those actual terms, positive or negative, that is nominal categorical data. Hepatitis B surface antigen. Now that is an antigen, and something that sits on the surface of the virus to which the immune system will react. That is for hepatitis B. And it's either positive or negative in the patient's blood. And positive or negative, that's also nominal categorical data types. Now the HIV viral load, that's where we actually count the number of viruses. Clearly that is a numerical data type. And they can be zero viruses. So, there's an absolute zero, so this is a ratio type numerical. And if we look at the numbers involved in this viral load, we really have to see this as a continuous data type. Not as a discrete data type. CD4 count, that is a type of immune cell that is affected by HIV in human blood, and again it has an absolute zero value for sure. And therefore has to be seen as a ratio type numerical data. But the sheer number of cells that you can count in each individual really means that this is also a continuous data type and not a discrete data type. Now they've looked at some other variables as well in this article, but some of them become really technical and I don't want to go into them. They really are quite spacious. These are the ones that we are going to concentrate on, so let's look at the data analysis. We're going to look at the sections of the results, where they looked at the HIV status. So, looking at the CD4 and T-cell count, looking at the viral load. And then also the section on the Hepatitis B surface antigen prevalence. So how many patients were also infected with Hepatitis B. And then the impact of coinfection, specifically on the CD4 count and the viral count, let's have a look. Let's look at, start with this HIV status, in a total of 1022 patients, you'll note that 950 were positive. So those were from the clinics that they had data on before. And then 72 HIV negative, from a different clinic where they just identified the patients that are HIV-negative and wanted to include them in this observational cross-sectional study as well. Now let's look at the CD4 T-cell count. Remember, that's a type of immune cell in the body. Attacked by the human immunodeficiency virus. And we see three drops at the bottom that you will not see in the paper itself. Now the beauty about this, with this paper and why it was chosen for inclusion in our discussion, is the fact that the actual data values in spreadsheet format is also available from the PLOS ONE website. You can download this and look at the data yourself. So these are the three cohorts of HIV positive patients on the left-hand side as we went from Durban in the middle, you'll see the data value for the patient's from Kimberly and on the right from the clinic in Botswana. Now these are probability plots, also called QQ plots, where you take every individual, now these CD4 counts versus its quantile. And you see the little red lines there. Remember we discussed this before. If the underlying population parameter was normally distributed, so if we look at the population. We look at CD4 count and the population from which these individuals were taken, clearly these values are not from a normal distribution. And what does that tell us? Well, we can clearly visually see it here because these blue dots do not follow these red lines to well. And if we want to compare these three sets of values. We really cannot make use of a parametric test, and in deed, as in this article, they make use of a nonparametric test. Remember also that these are three sets of data values that we are comparing to each other. So we have to use the nonparametric version, nonparametric example, of something that would be akin to the ANOVA, one way ANOVA tests. So indeed, and it's discussed right at the end of the paper. They used a Kruskal-Wallis test which is absolutely correct indeed. Now we look at the viral loads for these three positive codes of positive patients and again you'll see you can not use a parametric test. You can not use ANOVA to analyze or compare these three sets of values, because clearly the data points do not come from a normally distributed patient parameters. Really we can't do that, and again they made use of the nonparametric example and that was the Kriskal-Wallis test. You can look at the results. Let's go to the Hepititus B Surface Antigen prevalence. And you'll see on the right hand side there was a contingency table. So of the total of these patients, 950 were hepatitis surface antigen negative and 72 were hepatitis surface antigen positive. So these are now positive patients and negative patients. You can see we can make this, this is a two by two contingency table. You see the column of HIV- and the column of HIV+ patients. And on the rows, so for each HIV- and positive you'll see the numbers that were hepatitis B surface antigen negative and those that were hepatitis B surface antigen positive. Now they made use of the features, exact tests, we've looked at that. They need not have, if we were to do some analysis and we looked at this, certainly the numbers that are in the expected table or would be larger than five, so we might as well make use of the chi-square test as well. They decided on the Fischer's exact test, which would also be correct. So this is categorical data, it should be positive. HIV negative. So that's nominal categorical data. Same for hepatitis B service antigen. So we really are just looking at proportions here. So we use a statistical test for categorical data. Either Fisher's exact test here for this contingency table, but as I say we could use the Chi-square as well. Let's get to the section on the coinfection impact. The authors really compare just the HIV positive patients in South Africa and they looked at those that are HIV, they made two groups from those. So, those were hepatitis B viral negative and hepatitis B viral positive. And we are looking at the CD4 counts and the viral loads. So first of all, these are the CD4 counts, so these are only patients from Durban, they are all HIV positive patients and we are just looking at two groups here, the hepatitis B negative and the hepatitis B positive. Negative on the left, positive on the right. Again, these are probability plots not from the graph. You can take the data that they have provided and you can construct these yourself with proper software, and yet again we see with the CD4 counts these are not normally distributed data points. They do not follow those red lines. So simply comparing these two, we would not use the T-test at all. If we look the case for the viral loads as well, again not from a normally normal distribution underlying population. As far the underlying population premise is concerned. And we have to be with comparing these two sets of data, we have to compare with what the authors did. And indeed, they used the Mann-Whitney-U test, quite correctly. So wonderful paper here, you can play around with the data yourself. And really such important data. Because really, the antiretroviral drugs are really changing the landscape and improving the lives of these patients. And as the management of these patients go forward, we really have to address all the issues that come up. And certainly hepatitis B coinfection in these patients. Kenya of concern, it was very good to have this data from which to work. It's a wonderful case study, go and read it, good paper.