The concept of informed consent was developed in the context of experiments that would be conducted on human subjects, and data would be collected prospectively after a consent had been obtained. Let's look at how this translates into our world today and the kinds of things that data scientists do. First, the experiments are not often experiments, there is first collection of data, and the experiment comes afterwards. And the collection of data is often from people who are interacting with somebody who would like to collect that data, possibly a merchant, possibly a software vendor of some sort. Informed, in this case, is usually something that's hidden in multiple pages of fine print. You want to use some service, and you're given a dump with lots and lots of dense legalese. And you're required to say, I accept, before you can actually use that service. There is some benefit in the law for the person who's getting this kind of consent, but it is very far from being something that is clear, firm permission. There have been many lawsuits involving injuries in sports activities. So there's a long history that one can look at in terms of legal precedent, and extrapolate from that to understand the benefits of getting consent for data collection. And one can understand what informed really means. And from an ethical basis, setting aside the law, I think that we all can say that there is some weirdness if we claim that somebody has been informed, because they were given multiple pages of fine print that they really didn't have an opportunity to read. The notion of voluntary also is a little questionable. Because the consent is being obtained exactly at the time that the user is intending to perform a particular action, like use a software service or buy a product. This is not something that they have time and opportunity to think of, or this is not something that was shown to them early during their shopping experience where they could worry about what was required by different vendors. And they could fold this into their choice of vendor, or choice of product or service. Rather, this is quite late in their decision making process, this is after they've already decided what they really want to do. And now suddenly it is, pay me this toll or you can't go on this road, after you've decided to drive a particular route. If one looks at how informed consent works, and think about the example of what Facebook does. And how one would think about a research experiment that Facebook may conduct to do, say, some psychology study. Facebook explicitly tells the user in its agreement that it may collect user data for research purposes. Certainly, it has been doing so since 2012 after its famous mood contagion experiment. And so Facebook may have met the letter of its user agreement, but it still got hammered by its users. And I think the issue here is not that there's something wrong with Facebook. Facebook actually does a very good job of thinking issues of privacy and what users would like to do with their data. It's just that Facebook is at the head of the curve. It's a big company and it has a lot of very personal data for a lot of us. And often gets into the cross-hairs of the community before anybody else, because they're there before anybody else. Okay, so that was informed consent with regard to data collection. But then there's a question of, what is the data actually going to used for? So I may give data about myself to a merchant to obtain a specific service. I don't want the merchant to use these data for other purposes. I don't want them to use it to sell me other things, for instance. I may want them only to use it for the specific service that I have contracted with them for. And I don't want the merchant to share these data with other users. So I may, for instance, give them a consent to disallow repurposing. And this is where we're saying, you may collect this data, but the data that you collect is something where you've been given permission in a particular context. That context matters and you may not use it in any other context or sell it to somebody else. Now repurposing of data is not all bad, there are often business needs to do this. So my credit card company obviously needs to collect data about my purchases and about my payments. They don't have to share this data with the credit reporting agency, and maybe I don't particularly like that they share it with the credit reporting agency. But this is something that I've got to accept as a part of the social setup. And this is something that the credit card company ought to be telling me about and that I agree to. If I'm going to be given credit, I need to participate in the ecosystem that involves credit reporting. And a separate credit reporting agency learning things about my purchases and payments from my credit card company. Repurposing, even if it isn't of business necessity, might actually be of great societal value. I share medical data with my hospital to get better medical care, but I may not actually mind at all. I might be very glad if my medical data is repurposed for medical research. I might feel happy that information about my disease progression, my health, is going to help future generations beat some scourge. The thing is, the specific research questions that would be asked, the things that scientists wish to study, may not be known at the time I receive my care. The questions come afterwards. The data was already collected, and this is called retrospective data analysis as opposed to prospective data collection. And the problem here is, how can we obtain a consent agreement where we've given enough information to the human subject that they know what they're consenting to? And yet have the consent be broad enough to cover a range of possible research questions that one might wish to ask. And so this is a balancing act in terms of having an informed consent with enough information for it to be meaningful. To conclude, most data of interest, most data that we'll analyze are created by humans, are about humans, or have impacts that affect humans. When we practice data science, we have to consider this impact, and it is consideration of this impact that is at the cornerstone of ethical practice of data science.