We've discussed our first two secondary challenges, Usability, Workflow and
Process.
Now, we move on to the third, Privacy and Security.
By now, you should expect, and I hopefully won't disappoint you,
that there are analytics tools that have been developed or
have been proposed that can help with the management of privacy and security.
And indeed, we're going to look at a couple of those in this lesson.
First, a brief discussion of an issue that I believe will gain more and
more visibility as time goes on.
Here in the United States, we have a very tough law.
That, which is known by its acronym, HIPAA.
It protects the privacy of patient's data, and gives patients tremendous
input into how, whether their data's used and how it can be used.
The question proposed here by Don Detmer,
who is one of the foremost people in the field of health informatics.
One of the editors of the original ION publication calling for
electronic health records is that are we giving so
much weight to privacy that we're actually depriving society to the use
of data in ways that could be of great benefit to everyone.
We don't have time to get into that question, but I want you to know that
serious people are in fact raising that question at this point in time.
If you ask patients whether the electronic health record system or
the paper medical record system being used by their physician has an overall positive
impact on quality, you'll see that the substantial majority of patients feel that
way either very much or somewhat, with respect to an electronic health record,
if that's what their provider's using.
But a much smaller number of patients feel that way if their provider's using a paper
record system.
That's the good news.
The bad news, I suppose,
is that patients are very concerned about the security of their data.
No matter whether it's an electronic system or paper system.
These numbers are not that different, so I am not able to suggest that
patients attribute more security to one approach versus the other, but
it is clear that substantial numbers of patients worry about the privacy and
security of their health care data no matter how it's stored.
Health data is classified as either protected health information or PHI.
This is the health data as it exists in the electronic record systems,
fully identified.
You know who the patient is, you know all about the patient,
you know their age, their address, their telephone number, and so forth and so on.
And you know all of the health information.
This data must be completely protected under the HIPAA law.
And failure to do that carries a very severe penalties.
The second type of data is de-identified health information.
The Centers for Medicare and Medicaid Services here in the United States,
the federal department that runs Medicare and Medicaid, has established criteria for
how one can go about de-identifying health information.
Once the data's de-identified, it can, at ,least in theory, be shared with others.
Typically, under some sort of institutional review board finding that
the proposed use of the data is justifiable.
And in some cases, this requires patient consent, although,
it's often done without specific patient consent, because the data is
de-identified and at least in theory can't be traced back to the patient.
The third classification of health information is synthetic health
information.
This is completely made up information by using statistical models and
other techniques.
It's clinically realistic or at least reasonably, clinically realistic.
Because it's synthetic, it's not covered at all under HIPAA.
And it can be used freely, for example, by students or
others to do prototypes of systems, or at least initial testing of systems.
You can rely on synthetic data to know that your system works as you intended it
to work.
You can't necessarily rely on it to verify that the results of your analytic tool,
for example, are in fact correct,
because the synthetic data isn't gonna perfectly mirror the real world.
I just said that protected health information is subject to the HIPAA law.
What does that really mean?
Well, if you go to your physician, you will be asked to sign a HIPAA release
form, and in essence, what you're agreeing to is the use of your data for
treatment, payment, and healthcare operations.
This is often referred to as TPO.
Well, what do those letters mean?
Well, the best example of treatment is when the physician refers you to another
physician, they're allowed to send your record to that physician without
further- Interaction with you.
Payment is obvious.
They can use your health data for claims, that they might send to your insurance
company, to Medicare, or Medicaid to be paid.
A good example of operations is quality reporting where your data will be
aggregated with other patient data, as we discussed in module one, and
sent to Medicare, Medicaid, or an insurance company to validate that this
practice is in fact meeting some quality standards that they've established.
Probably, the primary application of de-identified health information is in
research where, generally, there's no need to know who the patient is.
The interest is in their health data and maybe some grouping of their information.
So, for example, they know which patients are male and
female and which patients fall into various age groups.
But one has to be very careful here, not to reveal enough demographic information,
so that the patient could in fact be re-identified, at least in theory.
So, how do you actually do that?
How do you take protective health information and de-identify it?
Turns out it's not so simple.
As I said earlier, the centers for Medicare and
Medicaid Services have sort of defined the rules, and in fact,
they define two approaches to de-identifying health information.
The first is the so-called Safe Harbor.
And they specify 18 demographic identifiers that must be removed from
the patients record, for it to be considered de-identified.
Many question whether this is actually adequate as data like genomic data's being
added to patients records.
And if full genomic data is there, then there is really no way to
de-identify the record because everybody's genome is unique.
The other method is the so-called expert determination method
where CMS says statistical or scientific principles can be applied
to minimize the risk that the anticipated recipient of the data, again,
typically a researcher, could, in fact, identify the individual.
So, how do I actually do that?
Well, this is an opportunity for analytics, and
the NORC at the University of Chicago has developed a tool called XID,
that's really a console that our expert can sit at and
develop various scenarios that try to balance
the disclosure risk against the utility of the data.
I talked earlier about the risk of re-identification.
One of the things you can do is actually change the data itself a bit to make
re-identification even harder, particularly, if you're gonna disclose
some things that might potentially be used to identify the patients.
So, that gets pretty complex pretty quickly.
If you change the data enough, it becomes less and less usable.
If you leave it unchanged, the risk of disclosure can be increased.
So, this tool actually looks at the data and the proposed use of the data,
and allows the expert to play with the slider here and
achieve what they consider to be the right balance of usability and
the risk of disclosure, which is exactly what CMS is asking them to do.