Now extractors typically take as input something called a salt, and a salt just
like in a salad, it kind of adds flavor to things, what it does is basically kind of
jumbles things around, so that no matter what the input distribution is, the output
distribution is still going to be indistinguishable from random. So a salt
basically, what is it? It's a non-secret string, so it's publicly known. It doesn't
matter if the adversary knows what the salt is, and it's fixed forever. The only
point is that when you chose it, you chose one at random. And then the hope is that
the funny distribution that you're trying to extract from kinda doesn't inherently
depends on the salt that you chose and hence as a result using your salt, you
will actually get a distribution that looks indistinguishable from random. So
essentially the salt, you know, you can just bang it the keyboard a couple of
times when you generate it but it just needs to be something that's random
initially but then it's fixed forever, and it's fine if the adversary knows what
it is and nevertheless the extractor is able to extract the entropy and output a
uniformly random string K. In some sense the salt is only there to defend against
adversarially bad distributions that might mess up our extractor. Okay, so now that
we have extracted a pseudo random key. Now, we might as well just use it in a KDF
that we just saw using a secure pseudo random function to expand the key
into as many bits as we need to actually secure the session. Okay, so there are
these two steps. The first one is we extract a pseudo-random key, and then once
we have a pseudo-random key we already know how to extend it into as many keys as
we need using a pseudo-random function. So the standardized way of doing this is
called HKDF. This is a KDF, a key derivation function that's built from HMAC.
And here HMAC is used both as the PRF for expanding and an extractor for extracting
the initial pseudo-random key. So let me explain how this works. So in the extract
step, we're gonna use our salt which you remember is a public value just happened to
be generated at random at the beginning of time. And we use this salt as the HMAC
key. And then the source key we're gonna use as the HMAC data. So we're kind of
using a public value as a key. And nevertheless, one can argue that HMAC has
extraction properties, such that, when we apply HMAC, the resulting key is going to
look indistinguishable from random, assuming that the source key actually has
enough entropy to it. And now that we have the pseudo random key we're simply going
to use HMAC as a PRF to generate a session key of you know as many bits as we
need for the session keys. Okay. So that basically concludes our discussion of
HKDF. And I just want you to remember that, once you obtain a source key, either
from hardware or from a key exchange protocol, the way you convert it into
session keys is not by using that sample directly. You would never use a source key
directly as a session key in a protocol. What you would do is you will run the
source key through a KDF. And the KDF would give you all the keys and output
that you need, for, the randomness, for the random keys to be used in your
protocol. And a typical KDF to use is HKDF, which is actually getting quite a
bit of traction out there. Okay. The last topic I wanna talk about in this segment
is, how do you extract keys from passwords. These are called password based
KDFs or PBKDFs. The problem here is that passwords have relatively low
entropy. In fact, we're gonna talk about passwords later on in the course when we
talk about user authentication. And so, I'm not gonna say too much here. I'll just
say passwords generally have very little entropy estimated on the order of twenty
bits of entropy, say. And as a result, there is simply not enough entropy to
generate session keys out of a password. And yet we still need to do it very
frequently. We still need to derive encryption keys and MACs and so on out of
passwords, so the question is how to do it. The first thing is, you know, for this
kind of purpose, don't use HKDF. That's not what it's designed for. What will
happen is that the derived keys will actually be vulnerable to something called
a dictionary attack, which we're gonna talk about much later in the course when
we talk about user authentication. So, the way PBKDFs defend against this low entropy
problem that results in a dictionary attack is by two means. First of all, as
before they use a salt, a public, random value that's fixed forever. But in
addition, they also use what's called a slow hash function. And let me describe
kind of the standard approach to deriving keys from passwords. This is called PKCS#5,
and in particular, the version I'll describe is what's called PBKDF1. And I
should say that this mechanism is implemented in most crypto libraries so
you shouldn't have to implement this yourself. All you would do, you know, you
would call a function, you know, derived key from password. You would give the
password in as input, and you would get a key as output. But you should be aware of
course that this key is not going to have high entropy so in fact it will be
guessable. What these PBKDFs try to do is make the guessing problem as hard as
possible. Okay. So the way they work, first of all, is, as we said, they
basically hash, the concatenation of the password and the salt. And then the hash
itself is designed to be a very slow hash function. And the way we build a slow hash
function is by taking one particular hash function, say, SHA-256, and we
iterate it many, many times, C times. So you can imagine 1000 times, perhaps even a
million times. And what do I mean by iterating it? So, well, we take the
password and the salt. And we put them inside of one input to the hash function.
And then we apply the hash function, oops, let me write it like this. And then we
apply the hash function and we get an output, and then we apply the hash
function again and we get another output. And we do this again and again and again
maybe a thousand times or a million times depending on how fast your processors are
and then finally we get the final output that we actually output as, as the key
output of this key derivation function. Now what is the point here? Iterating a
function 10,000 times or even a million times is going to take very little time on
a modern CPU, and as a result, it doesn't really affect the user's experience. The
user types in his password, it gets hashed a million times and then it gets output.
And maybe that could even take, you know a tenth of a second and the user wouldn't
even notice it. However the attacker, all he can do is he can try all the passwords
in the dictionary, because we know people tend to pick passwords in dictionaries,
and so he could just try them one by one, remember the SALT is public, so he knows
what the SALT is. And so he can just try this hash one by one. However because the hash
function is slow, each attempt is gonna take him a tenth of second. So if he needs
to run through a dictionary, you know, with, with a 200 billion passwords in it,
because the hash function is slow, this is gonna take quite awhile. And by doing
that, we slow down the dictionary attack, and we make it harder for the attacker to
get our session keys. Not impossible, just harder. That's all this is trying to do.
Okay, so this is basically what I wanted to say about password based KDFs. As I
said, this is not something you would build yourself. All crypto libraries have
an implementation of a PKCS#5 mechanism. And you would just call the
appropriate function to convert a password into a key, and then use the resulting
key. Okay, in the next segment, we're gonna see how to use symmetric encryption
in a way that allows us to search on the cipher texts.