In the previous lecture,

we studied about the basics of a neural network.

And in this lecture,

we go deeper into it and we are going to study a Neural Network Learning technology

and we're going to base it upon Backpropagation.

A neural network structure versus the level of intelligence that can be accomplished.

Now, one neuron can only make a very simple one-dimensional decision.

For more complex intelligence,

we need more neurons working together, collaborating.

For example, how many neurons and in what structure?

This gives you a very simple example.

Now, weights that need to be trained.

In this type of a structure,

over there, you can see that,

there is only one layer and we have two weights,

which are the weights of W_1 and W_2 in a single layer structure.

With these weights that we have,

we can basically train the system to draw a line like you see over there,

which distinguishes between A and the others, which are B, C,

and D. Going with more weights into our neural that work.

Well, then you can see in the neural network structure over there,

there are four weights,

these weights right here,

W_1, W_2, W_3, and W_4.

And, they are structured in a two-layer structure,

where there's one layer and another layer like that.

Now, what can you do with this more complex structure?

Well, you could draw something like two lines,

which you see, two lines here that are drawn.

And after sufficient training of the weights,

then you can see that A and D is distinguished from C or B.

Now, going a little bit more sophisticated and complex in our structure.

Here, in the neural network over there,

you can see the weights that are W_1 through W_8.

We have eight weights that are going to be trained and updated such that we can do,

let's say, an example like this,

which is this operation right here.

Now, with these multiple weights that we have,

basically, we have a formation that has one layer,

two layer and three layers.

And with this, let's say that we were drawing lines.

Then, we might be able to,

after sufficient training of the weights,

these weights I'm talking about over here,

then you might be able to have a distinguish of A separate,

D separate compared to the rest of the region which is C and B.

The neural network has multiple layers,

as you've just seen, and the layer in the middle is the hidden layer.

Looking at the overall structure,

it is the input layer,

the hidden layer and the output layer like this.

Then, the weight training is executed in the middle of this.

And these connections that you see,

they have weights that are the weights that we are going to train.

Now, looking into the details of a neural network structure,

there are, input layer,

which is the layer where the input to the neural network comes in, and then,

there's the output layer, which is the layer where

the output of the neural that goes out and is used.

In the middle we have hidden layers.

Now, this is a layer that contains

the intelligence in a distributed fashion using many neurons,

interconnections, weights, biases, activation

functions and other technologies.

Looking down there, multiple hidden layers can be used.

And in deep neural networks,

the word "Deep" comes from having multiple hidden layers used simultaneously.

Now, deep neural networks have multiple hidden layers.

And then comes the question, how many hidden layers are

needed to qualify to be called a deep neural network?

Well, the answer is a little bit ambiguous.

Some engineers say that, "Oh,

you at least have to have 10 or more."

Some engineers say, five or more.

Actually, looking into it,

it's a little bit difficult to determine how many hidden layers to be

qualified and say that your neural network is deep.

Well basically, the way I look at it is that it's a problem issue.

And how deep is sufficient?

Well, it's deep enough to solve your problem,

then that's what's needed.

And therefore, the definition of deep is not that clear.

However, we'll go with multiple layers and actually,

in some of the examples later that are shown,

you will actually see how many layers are used to do a specific example operation.

And that's going to be shown later in later lectures.

Now, learning methods.

How do we train the weights of these layers

to make the neural network become intelligent?

Well, there is supervised learning and unsupervised learning.

Supervised learning uses training with labeled data.

The labeled data are data that has the desired output values already specified.

So, we have the inputs,

we have the desired outputs corresponding to these inputs

and then we can match and train the inside of

the weights such that they operate the way that we want them to.

The other one is unsupervised learning.

This is training that uses unlabeled data.

So there are no desired output values that are used.

Other techniques include semi-supervised learning,

which is training that uses both labeled data and unlabeled data.

And then there's reinforcement learning,

which is the feedback is given back into

the system but no labeled data is used in this case.

Supervised learning using backpropagation training is our next topic.

And backpropagation is used to train perceptrons and multi-layer perceptrons.

Backpropagation uses training iterations where

the error size as well as the variation,

direction and speed are used to

determine the update value of each weight of the neural network.

Here's an example.

The current output is what we get,

then from the labeled data,

we know what our desired output is.

So, by subtracting the desired output from the current output,

we get our error value which is this right here.

Then, if we know the error value,

then we can use this chain to go back and update the weights

such that this error right here is actually minimized or eliminated.

Now, that's why we call it backpropagation, like right here.

Now, supervised learning uses backpropagation training.

Since each weight of the neural network is

used in the calculation of various other inputs,

the size of a weight update has to be small in each training iteration.

Why does it have to be small?

Well, this is because a big change in one weight may mess up the weights

that were already trained to match the other output-to-input relations.

So, the updates are small and we will make small changes but we will do it many,

many iterations of training to get everybody and all the weights

well-adjusted such that our desired output is set.

Now, backpropagation uses a tool called the gradient.

This is a derivative of a multi-variable vector function.

The gradient points in the direction of

the greatest rate of increase of the multi variable function.

Now, the magnitude of the gradient represents the slope,

the rate of change in that direction.

The backpropagation learning algorithm operates based on the following steps.

In step one, it forward

propagates the training pattern's input through the neural network,

starting from the input side going through the hidden layers.

And in step two,

the neural network generates the initial output values over here at the output.

Then use the difference value and input value to derive

the gradient of weights of the output layer and hidden layer neurons.

We're going to scale down the gradient of

the weights and that will reduce the learning rate.

The learning rate determines the learning speed and the resolution.

We will update the weights in the opposite direction of the sign of the gradient.

In other words, if the gradient is a plus sign,

we will update the weights giving them a negative value.

If the gradient results in a negative number,

a negative value, then we will update the

weights in a positive using a positive number.

And then comes, why?

Well, this is because the plus-minus sign of the gradient indicates the direction of

the error and we want to be moving in

the opposite direction of the error which is the correct answer direction.

That's why we were using the opposite direction,

compared to the direction of,

that's indicated by, the gradient.

We will repeat all steps until the desired

input-to-output performance is satisfactory.

These are the references that I use and I recommend them to you. Thank you.