Knowing binary fractions, we've got another way of representing your values,

by the binary fractions with a fixed number of digits.

For example, if we use 64 bits,

we might say that the point is always in the middle, and so,

the first 32 bits are the integer part,

and the other 32 bits are the fractional part.

In fact, you could just store 64 bit integer A,

and think of it as the number A over 2 to the 32nd.

Let's think a bit about what properties such data type would have.

The maximum value will be when the first part is maximum value of int,

and the second part is all ones.

So, there'll be something about 2 to the 31st,

or slightly more than two billion.

Similarly, the minimum value will be run the first part is the minimum value of int,

and the second part is all zeros.

And that will be minus 2 to the 31st,

or slightly less than minus two billions.

So, we could start any value from the range from minus

2 to the 31st up to nearly 2 to the 31st with the error of 2 to the minus of 33rd,

as we're rounding to 32 binary digits.

Of course, we could add our numbers.

And in fact, it will be the same as adding 64 bit integers

because the point always stays 32 digits from the ends.

And as far as the integers,

there could also be overflow.

We could also multiply, but it got some issues.

As I'm multiplying to 2, 64 bit integers,

the result will have 128 binary digits.

So, there are 64 extra digits,

32 at the beginning and 32 at the end,

as the point I decide is the middle.

There're 32 digits at the end.

It's not going to be a problem.

They're just going to be rounded by the digits at the beginning unlike overflow.

They just couldn't fit in their value at this time,

but ignoring them will change the value very much.

So, they could multiply only small enough values,

or rather there'll be overflow.

Now, as there are always errors with rounding,

let's take a closer look at them.

We already know one type of error, the Absolute error.

It is just the absolute difference between the actual value, and the value we have.

Say, we are storing some real number A in our fixed-point representation.

So, that real number it, if we round it to the number

a-hat which has exactly 32 digits up to the point.

And as we know is absolute error of this,

they'll be or greater than 2 to the minus 33rd.

If there are values and b,

and we store the rounded versions a-hat and b-hat,

and round off sum a and b,

we will sum are values a-hat and b-hat.

So, what will be there after that?

It turns out that this error of the sum is not

greater than sum of absolute errors of the values.

So, there as do sum of,

if you just found it,

a-hat and b-hat from a and b,

there as in each they'll be 2 to the minus 33rd,

and the total error will be 2 to the minus 32nd, which is not much.

But the more operations we do,

the larger error we'll accumulate.

You could also bound absolute error of the products by subtracting,

and adding a-hat times b.

We could group as follows.

It will be, the absolute error will be now greater than absolute value of

b times the absolute error of a plus absolute value of a-hats,

which is close to the absolute value of a times the absolute error of b.

So, the errors not only sum up,

they also gets scaled by the magnitudes of the values,

and that is bad potentially.

If a is equal to billion and it's stored without error,

so a-hat also is equal to billion,

and b is equal to one,

but is stored with an error of one billions,

then the product will have the error of one.

A billions time greater than it was before multiplication.

So, absolute error does not go well with products.

From that point of view, it's natural to consider another rationale error.

The Relative error, which is

just the absolute error divided by the magnitude of the exact value.

So, it's not about how big the number is,

it's about how big it is relative to the exact value.

And it turns out that the seen definition goes well with the multiplications.

The relative error of the product is not

greater than the sum of relative errors of the factors.

In our previous example of multiply a billion and one,

relative error of a is zero,

and the relative error of b is one billions.

The relative error of the product is also one billions.

However, the relative error is not that good with sums.

Consider the values of a and b on the slide.

The relative error of the sum is just one,

but the relative error of a is zero,

and that of b is one billion.

So, the error has grown a billion times with sum.

So, in fact, we want both the absolute error,

and the relative error be boundaries.

So, the errors will be tactical both in addition, and multiplication.

Let's see how a fixed point data type behaves with errors.

The absolute error goes well, at least first stored.

The problem is the relative error depends highly on magnitude.

When they are about maximum possible value,

we use all 64 bits,

to store actual digits.

And so, the relative error is about 2 to the minus 64th.

But when it's about the minimum possible positive value,

we use only the last digits to store something,

and a lot are just zeros.

So, the relative error will be about one half, which is become,

at an average by the order of funct value,

about one, we use only half digits to store something.

And the other half is just zeros.

So, in fact, we could do a bit better.

And we'll do that in the next video.