This course answers the questions, What is data visualization and What is the power of visualization? It also introduces core concepts such as dataset elements, data warehouses and exploratory querying, and combinations of visual variables for graphic usefulness, as well as the types of statistical graphs, tools that are essential to exploratory data analysis.

從本節課中

Statistical Graphics: Design Principles for the Most Widely Used Data Visualization Charts

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational Efficiency School of Computing, Informatics & Decision Systems Engineering

K. Selcuk Candan

Professor of Computer Science and Engineering Director of ASU’s Center for Assured and Scalable Data Engineering (CASCADE)

In this module we're going to continue our discussion on

exploratory data analysis and introduce our first statistical graphic, the pie chart.

These statistical graphics are often used to summarize different data sets and forms,

and here we want to talk about different properties and

principles of designing pie charts.

And so a pie chart is simply a circular graph that shows

the relative contribution of a group of categories

where each slice is going to represent a different category within the data.

So for example, we can think of a dataset for this class so,

this is MCSA 578,

we may have our course and we may have a certain number of students with A, B, C,

D and F for example,

so we may have two As,

five Bs, three Cs,

one D, and zero Fs.

And we can think about how could we make

a pie chart to allow us to explore this set of categories of data,

and essentially we have our circular region and we have 2 plus 5,

plus 3, plus 1,

we have 11 samples,

and we have five different categories.

And so basically, what we're going to do,

is we're going to split this up.

So, two 11s of our angle here,

assuming that’s two 11s is going to represent the As in our graph.

The next slice for B is going to be bigger,

it's five 11s so we have B,

we’ve got the Cs and then we have one D and we should have zero F’s.

So you can see that I've messed up my drawing here so,

we need to think about how we split up the angles,

how we draw our categories,

and what you should notice here is if I have zero Fs I can't

actually represent that in a pie chart I can't have an angle with four, zero elements.

So, we have to think about what data elements we have,

how do they get represented in the pie chart,

and then if we have too many categories so,

what if I was Walmart and I wanted to make

a pie chart of all my different types of sales,

I would have thousands of categories so my pie chart would

have tons and tons of extra little pie slices.

It gets really really hard to see really quickly.

And in that case when I think about

how many categories are going to be good for representing with the pie chart,

and a good rule of thumb for human perception is typically

somewhere between seven to nine categories maximum that we're trying to represent.

And pie charts essentially use angle area an arc length to encode

our values so we're trying to set up

our region of our PI to encode the different value structures.

The problem with pie charts though is that they can often lead to

an overestimation of small values and an underestimation of large ones.

So, for example when we talked about having

that category where nobody in the class got an F,

how can I represent that in a pie chart?

Perceptual research also shows that,

the error rate in pie charts is high when they're trying to estimate values.

So, pie charts give us a quick visual look

and show us the different distribution of categories,

and we have a quick idea of that perhaps there's more of category A than of

category B but we're not good at estimating how many is in the actual category.

So, pie charts often are presented with values as

the pie slice labels in order to tell them the exact number that's in that chunk.

So for example if we have a pie chart that looks like this,

so we have categories one,

we have categories A,

B, C and D. If the question was,

"Is Category B bigger than Category D?"

It's a little bit hard to tell due to the rotation of the angles.

So, oftentimes we may have some leader line with

some value number to tell us exactly how many are in a given category,

or we can even think about adding an interaction so users can hover over

the different pie chart pieces to learn what the elements are.

Now, what's interesting though is angle is not actually the key visual clue.

So, users are not trying to decide any sort of measurement on this.

Instead, what we're looking at is really distance along a curved arc.

So, the arc length and the area within the region are the most important part.

And so it turns out that doughnut charts are just as effective

as pie charts because we're really only looking at area and arc length,

nothing to do with angle.

We can also think about modifying

pie charts just slightly to draw attention to certain regions.

So, creating perhaps a larger slice,

or adding some highlights there for an exploded pie chart as well.

And typically with the exploded pie chart we may even add again

leader lines to provide more information about a particular thing.

So, providing details about a category,

summarizing this, providing more information.

And so with that, those are sort of the elements that go into designing a pie chart.

We talked before about nominal,

ordinal, interval and ratio data,

pie charts are primarily only used for

nominal or categorical data where you have some sort of counts,

so you have some nominal data and ordinal data.

So for example, the number of large, medium,

and small soft drinks you sold at a restaurant might be

a good option to represent in a pie chart.

The other thing to think about is design principles for a pie chart,

since we're looking at angles and area,

we don't ever want to think about using 3D pie charts for example.

So, the 3D pie chart winds up trying to add volume into our visualization as

well which has no bearing on the area representation.

So those are the general elements of designing a pie chart.