0:00

By now, you've seen most of the cheap building blocks of RNNs.

Â But, there are just two more ideas that let you build much more powerful models.

Â One is bidirectional RNNs,

Â which lets you at a point in time to take

Â information from both earlier and later in the sequence,

Â so we'll talk about that in this video.

Â And second, is deep RNNs,

Â which you'll see in the next video.

Â So let's start with Bidirectional RNNs.

Â So, to motivate bidirectional RNNs,

Â let's look at this network which you've seen a few times

Â before in the context of named entity recognition.

Â And one of the problems of this network is that,

Â to figure out whether the third word Teddy is a part of the person's name,

Â it's not enough to just look at the first part of the sentence.

Â So to tell, if Y three should be zero or one,

Â you need more information than

Â just the first three words because the first three words doesn't tell you if they'll

Â talking about Teddy bears or talk about the former US president, Teddy Roosevelt.

Â So this is a unidirectional or forward directional only RNN.

Â And, this comment I just made is true,

Â whether these cells are

Â standard RNN blocks or whether they're GRU units or whether they're LSTM blocks.

Â But all of these blocks are in a forward only direction.

Â So what a bidirectional RNN does or BRNN,

Â is fix this issue.

Â So, a bidirectional RNN works as follows.

Â I'm going to use a simplified four inputs or maybe a four word sentence.

Â So we have four inputs.

Â X one through X four.

Â So this networks heading there will have a forward recurrent components.

Â So I'm going to call this, A one, A two,

Â A three and A four,

Â and I'm going to draw a right arrow

Â over that to denote this is the forward recurrent component,

Â and so they'll be connected as follows.

Â And so, each of these four recurrent units inputs the current X,

Â and then feeds in to

Â help predict Y-hat one,

Â Y-hat two, Y-hat three, and Y-hat four.

Â So, so far I haven't done anything.

Â Basically, we've drawn the RNN from the previous slide,

Â but with the arrows placed in slightly funny positions.

Â But I drew the arrows in

Â this slightly funny positions because what we're going to

Â do is add a backward recurrent layer.

Â So we'd have A one,

Â left arrow to denote this is a backward connection,

Â and then A two, backwards,

Â A three, backwards and A four,

Â backwards, so the left arrow denotes that it is a backward connection.

Â And so, we're then going to connect to network up as follows.

Â And this A backward connections will be connected to each other going backward in time.

Â So, notice that this network defines a Acyclic graph.

Â And so, given an input sequence, X one through X four,

Â the fourth sequence will first compute A forward one,

Â then use that to compute A forward two,

Â then A forward three, then A forward four.

Â Whereas, the backward sequence would start by computing A backward four,

Â and then go back and compute A backward three,

Â and then as you are computing network activation,

Â this is not backward this is forward prop.

Â But the forward prop has part

Â of the computation going from left to right and

Â part of computation going from right to left in this diagram.

Â But having computed A backward three,

Â you can then use those activations to compute A backward two,

Â and then A backward one, and then finally having computed all you had in the activations,

Â you can then make your predictions.

Â And so, for example,

Â to make the predictions,

Â your network will have something like Y-hat at time t is an activation function

Â applied to WY with both the forward activation at time t,

Â and the backward activation at time

Â t being fed in to make that prediction at time t. So,

Â if you look at the prediction at time set three for example,

Â then information from X one can flow through here,

Â forward one to forward two,

Â they're are all stated in the function here, to forward three to Y-hat three.

Â So information from X one, X two,

Â X three are all taken into account with information from X four can flow

Â through a backward four to a backward three to Y three.

Â So this allows the prediction at time three to take

Â as input both information from the past,

Â as well as information from the present which goes

Â into both the forward and the backward things at this step,

Â as well as information from the future.

Â So, in particular, given a phrase like, "He said,

Â Teddy Roosevelt..." To predict

Â whether Teddy is a part of the person's name,

Â you take into account information from the past and from the future.

Â So this is the bidirectional recurrent neural network and these blocks

Â here can be not just the standard RNN block but they

Â can also be GRU blocks or LSTM blocks.

Â In fact, for a lots of NLP problems,

Â for a lot of text with natural language processing problems,

Â a bidirectional RNN with a LSTM appears to be commonly used.

Â So, we have NLP problem and you have the complete sentence,

Â you try to label things in the sentence,

Â a bidirectional RNN with LSTM blocks both

Â forward and backward would be a pretty views of first thing to try.

Â So, that's it for the bidirectional RNN and this is

Â a modification they can make to the basic RNN architecture or the GRU or the LSTM,

Â and by making this change you can have a model that

Â uses RNN and or GRU or LSTM and is able to make

Â predictions anywhere even in the middle of a sequence by taking into

Â account information potentially from the entire sequence.

Â The disadvantage of the bidirectional RNN is that you do

Â need the entire sequence of data before you can make predictions anywhere.

Â So, for example, if you're building a speech recognition system,

Â then the BRNN will let you take into account

Â the entire speech utterance but if you use this straightforward implementation,

Â you need to wait for the person to stop talking to get

Â the entire utterance before you can

Â actually process it and make a speech recognition prediction.

Â So for a real type speech recognition applications,

Â they're somewhat more complex modules as well rather than just

Â using the standard bidirectional RNN as you've seen here.

Â But for a lot of natural language processing applications where

Â you can get the entire sentence all the same time,

Â the standard BRNN algorithm is actually very effective.

Â So, that's it for BRNNs and next and final video for this week,

Â let's talk about how to take all of these ideas RNNs,

Â LSTMs and GRUs and the bidirectional versions and construct deep versions of them.

Â