0:15

This session is about data and the normal distribution.

We're going to get introduced to some concepts of the normal distribution and

see how we can apply it in different Phases of a Six Sigma Project.

But before we get there,

let's see what the measure phase of the Six Sigma Project is all about.

So what happens in the measure phase?

The first thing is that you identify variables.

You identify the critical to quality characteristics and

you think about how you are going to measure these.

Then you assess the measurement systems.

The idea there is to make sure that your measurements systems are valid and

they are reliable.

They're valid in the sense that they are measuring what they are supposed to be

measuring.

They're reliable in the sense that when you use them over and

over again they give you accurate results.

They are sensitive to changes, that's what a measurement should be.

And accessible in terms of they can be understood by people who are going to be

seeing those measurements on a day-to-day basis so

that they know what's going on in the process.

So we'll go from critical to quality characteristics to

measurement systems in the measure phase.

In the measure phase, we also go to establish the current performance on

critical to quality characteristics.

So once we've gone from figuring out what those critical to quality characteristics

are, and then the measurements are, then we need to establish current performance.

Now to establish current performance,

we use something called statistical process control.

And these are control charts that you can have for different types of data, for

discrete data, for continuous data.

And there are many different types of control charts that you can use to

establish the inherent capability of a process.

Next within the measure phase of the Six Sigma Project, you also can

establish the targets for improvement and what those targets should be.

So, there you would be looking at things like the Sigma levels of the process,

so you establish the Sigma level of the process.

But before that you do a process capability analysis.

A process capability analysis is to see how well the process is performing in

relation to customer expectation.

So in relation to the voice of the customer.

Comparing the voice of the customer with the voice of the process,

the VOC with the VOP in that sense.

So those are the things that happen in the measure phase.

Now let's take a look at different types of data that can be used in

the measure phase.

And then we'll get to distributions of data next.

So what are the different types of data that we can use in a Six Sigma Project and

that we need to start thinking about in the measure phase?

So first is simply verbal data.

And this could be open-ended comments from people.

If you're doing a customer survey, they're telling you something about the product or

the service.

If you're doing an employee survey, they're telling you something about

the experience that they have with their supervisor or working in that company.

So here are the example that you see is a statement that says,

my supervisor respects my opinions.

So these are open-ended comments that you would have coming out of any

kind of an interview or

a survey that you do of the audience that you're interested in getting data from.

Next we get into data.

Data in the sense of numeric Data.

So first we have discrete variables.

And the way you can think about discrete variables are where decimal

points do not matter, do not make sense in fact.

Not that they don't matter, they don't make sense.

So when we think about things like anything that has two values.

Say it's available or not available.

We think of it as a zero one situation.

Something is on time or not on time.

It's a zero one situation.

There's no 0.5, there's no 0.75.

So that's the first type of a discrete variable.

And the data that we're talking about there is attribute data of

a binary characteristic.

It is binary in the sense that there are only two possible values for it.

And if think about what is underlying distribution for that kind of data,

you maybe familiar with this already that it's a binomial distribution.

Binary data binomial distribution, two kinds of options yes and

no, or is good or not good.

Those kinds of data we're talking about there.

Next, within the categorical data, within attribute data, we have the nominal ones.

So here we don't really have numbers for different types of categories,

but we are considering them as four different categories.

So for example, here we have how do employees commute to work?

They either walk, they come by bike or they take the train or

they drive their own car.

And those are four different types of ways of commuting to work for the employees.

Now, you can give these numbers as 1, 2, 3 and 4.

You can call them as 1, 2, 3 and 4.

But they don´t really have any natural ordering.

We can´t say that one is higher than the other.

So you can call this in some way, but

they´re not going to mean anything in terms of their natural ordering.

The next category that we go to of our types of data is ordinal data.

Ordinal data is

5:37

going to have meaning in terms of something is higher than the other.

So when you think of any kind of customer satisfaction survey that you may be

familiar with.

Those are the things that we get in the mail, or when you go to a restaurant,

they put it on the table saying could you fill this out for us?

And you may also be getting these as employee satisfaction surveys.

Now these surveys have scales that go from extremely dissatisfied

to extremely satisfied.

Or extremely happy with this to extremely unhappy, whichever way it's ordered.

The point there is that there's going to be some meaning of that ordering.

That one is either going to mean very good and five is going to mean very bad.

Or five is going to mean very bad and one is going to mean very good.

So there's going to be some kind of ordering,

some kind of natural ordering to these categories.

But remember, we're still talking about discrete categories.

And if you think about these three types of data, the binary data, the nominal,

without natural ordering, and then the ordinal, with natural ordering.

The concept here is that you are taking data that is subjective and

you're converting it to objective.

You're taking information and you're converting it into objective data,

using either a binary scale or a nominal scale or an ordinal scale.

So you can express these in terms of numbers.

7:02

Within discrete variables, we also have something called count data.

And what is count data, it's as the name suggests.

It's counting for example the number of defects in a product.

If I'm looking at this clicker that I'm holding,

and I'm saying, how many defects are there in this clicker?

I can count the number of defects.

If I'm looking at defects in an application form that I get,

I'm counting the number of defects.

And again, it's going to be discrete.

I can not find 2.5 defects.

It's going to be either 2 defects or 3 defects.

And that's why it's still a discrete distribution, but I'm looking at here

different type of data within a discrete distribution and it is count data.

Now what are the implications of these different types of data?

The underlying statistical frequencies,

the underlying frequencies of data will be different.

The underlying statistical distributions that you can use for

these types of data are going to be different.

And that is going to have implications in terms of how you're going to do

the analysis.

The other implications of these types of data are some will give you more

information than others.

And some will be in that sense more valuable in terms of data

collected than others.

And some will also be harder to collect than others.

So there might be some trade-offs that you're thinking about

as to which type of data we should collect.

Well, you might be trading off with, this one is simple to collect, we're simply

asking a yes/no question, if you're talking about the binary type of data.

But we're not getting much more information than simply somebody

was happy or unhappy about something.

And we can move some more in depth information if we can move to

more of an ordinal kind of scale which has a survey, a battery of questions,

many questions that are scaled on one to five, or one to seven.

Typically we have odd numbers in those scales.

And there you are capturing a little more information.

It's going to take more effort, it's going to cost you more, but

you're going to get more information.

You can do something with that information.

So when you are thinking about types of data, you

should be thinking about what are the cost benefits of the different types of data.

Now let's take a look at the other kind of data when we're talking about discrete or

we're talking the opposite of discrete is continuous data.

So continuous data is any kind of measurement data.

And there we're basically saying that it can theoretically take infinite

number of values.

So we can say for example that if you're talking about temperature,

depending on the level of granularity that you want to go into,

you can go up to many, many decimal places when you're talking about it

in terms of Fahrenheit of Celsius.

And when you're talking about weight of something,

depending on the level of granularity that you want to go into,

you can be talking about 2.5 pounds, 2.68 pounds, 2.697 pounds.

And then you can be thinking about it in terms of ounces if you want to get it to

be more specific.

And that's the idea of continuous data of measurement data.

So that's the kind of data that we are normally think about when you're thinking

about numerical data.

It's very useful in terms of it's a very specific measurement of something,

but never the less it's a measurement of one kind of characteristics.

So, if I know that a critical to quality characteristic of

a service in a restaurant is time, I can be measuring time.

But it's only going to give me information about time.

If I know that critical to quality characteristic in a restaurant is

temperature of food,

then I can be thinking about measuring temperature of food.

But then it is going to be very specific but

it's going to be only about the temperature of the food.

So measurement data gives you much more information, but

it's about a specific aspect of a product.

Now, within measurement data you can collect data that is cross-sectional,

or that is more of a time series.

And simply here what we mean is that we could be looking at things

as they are at a point in time, or we can be looking at them over time and

is there a trend when we look at time series kind of data.

And then when we look at time series kind of data,

there are some implications in terms of what kind of analysis we can do.

So there maybe specific things that we have to account for

in terms of when we're doing time series kind of data.

Kind of when we're taking it from the same process over a period of time when we're

trying to measure something.

Or if we're looking at sales over time, over different months or

over different weeks, there will be some ways, in fact,

of adjusting to the collinearity, the the obvious relationship

that is going to be there when you have many weeks of sales data or

many weeks of any kind of process data.

There's going to be some relationship between the previous week and

the next week so you need to account for that.

And that's why you need to think about times series data as a little bit

differently than when you're looking at cross-sectional data.

Now, let's take this categorization and

apply it to some different types of data that we have over here.

So here you have different measurements, different things that are being measured.

And what I'd like you to do is apply the categorization

that we just saw in terms of is it discrete, is it continuous, and

is it within discrete, the different things that we saw.

The ordinal, the nominal,

the binary, and the count data, and whether you can apply those.

So you have paint viscosity, service at drive-through, and

then you have on-time arrival or not, number of customer calls abandoned,

humidity in a paint shop, and source country for outsourced parts.

So apply those categorizations and we'll come back and

see if you were able to apply them correctly.

13:08

So, we're back to the data types that we saw before the question and

paint viscosity is something that would be a continuous measurement.

So it'd be measurement kind of data.

It's something that you might measure in units that can have decimal points.

So it's a continuous measurement data.

Service at a drive-through going from very unsatisfactory to very satisfactory,

it's categorical data but it's ordinal.

There is meaning to 1 being better than 5.

So there is an implied hierarchy in those numbers.

On-time arrival or not, something was on-time or not is obviously binary.

There are only two kinds of options there, two options there.

Number of customer calls abandoned should give you a hint just from the term just

from the fact that it's a number of calls.

It's count kind of data.

You're counting the number of calls that were abandoned.

Humidity in a paint shop.

Again, it's going to be like viscosity that you saw earlier.

It's going to be measurement data.

Source country for outsourced parts is going to be categorical,

except it's going to be nominal.

You're going to put these in different countries, and

you're going to say that if it's a one, it indicates that it's from the US.

If it's two, it indicates it's from Canada.

If it's three, it indicates it's from Mexico.

If it's four, it indicates that it's from China.

And there's going to be no implied hierarchy in terms of the numbers that

you're using, in fact you could use any numbers for any of those countries.

And that's what we mean by it being categorical but nominal data.

So, here you've seen the application of the different data types.