Now the first thing you do is we look Look at Rug Plot to see how this data points
are distributed.
And as you can see when we just make a standard Rug Plot, you just see this.
Now, I just told you there were 150 instances, that's not 150 lines.
And the reason is that there are defined values in this data set.
In other words, there's a limited precision with which the measurements
were made.
So in order to see all of the data, we have to generate the points, which is
shifting them slightly in the x direction, so that we can see the full breadth.
So here you can see there's clumps of points around 6 centimeters.
And around 5.5, and around 5.
We didn't see that in the unjittered, so that's an important demonstration of,
sometimes you want to make sure your points get a little jitters,
that you don't have this over lap.
Now we gotta round that by also using histograms,
which basically counts the number of values, or instances, in each bend.
So we can do that with the histogram, and we can see here's our data bend up.
So there's our supple length, and you can see there's a peak right around 6,
around 5.5 and around 5, just like we saw once we jittered the points.
But histograms have another problem and
that is, what is the bin size we should use?
Here is one bin size and here is the same data but with a different bin size.
And you notice how spiky it's become.
What does that mean?
Is that physically important, or
is it just simply an artifact of the measurement process?
And that's a challenge with using histograms.
We can compare histograms by using the same number, and just connect the points,
and say is there some kind of generative model at work here.
But as you can see this is just kind of up and down, up and down,
that's really hard to see what is, fundamentally, going on.
In other words, is there some sort of physical model for
the appearance of an iris flower?
If so we really want to try to understand that, to pull that out of the data.
You'll be doing this in your career, as you try to analyze data and
understand what is the driving mechanism here?
Can I model it?
Because once you can model something, you have a much better
ability to understand what's going on, and how to control it, and how to use it.