Data repositories in which cases are related to subcases are identified as hierarchical. This course covers the representation schemes of hierarchies and algorithms that enable analysis of hierarchical data, as well as provides opportunities to apply several methods of analysis.

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational Efficiency School of Computing, Informatics & Decision Systems Engineering

K. Selcuk Candan

Professor of Computer Science and Engineering Director of ASU’s Center for Assured and Scalable Data Engineering (CASCADE)

In previous modules, we talked about hierarchical data,

methods for exploring hierarchical data through agglomerative clustering,

talking about different distance metrics,

and then we started thinking about how we might

visualize this sort of data through node-line diagrams.

We talked about some of the constraints there.

We wind up with lots of white space on the screen.

So, we started talking about space-filling diagrams, specifically treemaps.

We went through the original treemap algorithm,

where essentially all you're doing is taking your hierarchy,

and flipping your cuts from horizontal to vertical,

and looking at underlying data properties to decide on the width of these different cuts.

In this module, we want to go through a variety

of different embellishments to the treemap algorithm.

So, we're going to explain new versions of treemaps,

some of the problems,

and how people have tried to focus on correcting these.

So, if you remember from our previous modules,

on time series analysis,

on line plots, and those sort of things,

we talked about aspect ratio.

Cleveland talked about how we wanted to try to set up an aspect ratio of a plot,

such that the lines bank to sort of 45 degrees on average.

The same sort of principle applies to treemaps.

The aspect ratio for the treemap is going to drastically affect the visualization.

Here, let's take a look at our elements.

First, if we have these long,

skinny cuts in our tree,

those are visually kind of unappealing.

Those look odd, hard to interpret sometimes.

And the other problem is humans have a hard time interpreting area.

So, if I were to ask you, which one of these has a bigger area?

You probably have a hard time telling.

So, these are constraints that people were thinking about

for creating new algorithms for treemaps.

So, they started thinking about, "Well,

what if we can do sort of a cluster treemap,

or what if we can try to make the elements in the tree more square?"

So, they started developing simple recursive algorithms to

try to reduce the overall aspect ratio.

So, Bruis introduced squarified treemap.

So, we had Wattenberg's work on Visualizing the Stock Market through treemaps,

and then thinking about how we could get rid of some of

these long aspect ratios to try to

make our elements in our hierarchy more square, for example.

Now, the methods had

two major drawbacks: changes in the sets can cause discontinuities in the layout,

so treemap data is dynamic.

Large visual changes you're going to make it hard to track.

What I mean by this is, if you have

stock market data and the stock market is data is changing over time,

and you have a treemap stock market updates, treemaps most update,

this can cause things to jump around in the treemap layout discontinuity,

so it's hard to track where elements flowed to,

and if there is some sort of ordering in the data,

this doesn't constrain the ordering.

So for example, when we have our elements 1,2,3,

like this, and we were talking about the splits before.

We went along the splits in sort of an order.

Here, in clustered and squarified treemaps,

they don't care about the ordering in this direction,

because really, 1 and 2 can be switched and

the hierarchy stays the same as long as we switch all of the [inaudible] underneath there.

So, if that ordering is important,

these methods do not stick on those.

Then, Shneiderman and Wattenberg introduced an idea of

the ordered treemap to try to overcome the limitations the squarified treemap.

So, it is possible to create a layout in which items that are

next to each other in a given order remain adjacent in the treemap,

and they presented two algorithms for ordering a treemap.

So, the Ordered Treemap is first,

we start with a rectangle R,

and we're going to subdivide this.

It's gonna be subdivided first by,

we're going to pivot by size,

where the pivot is the item with the largest area.

So, we let P be the pivot,

and this is the item with the largest area in our list of items,

and if the width R is greater than or equal to the height,

we're going to divide R into four rectangles,

and then we place P in this new rectangle R. So,

P and divide the items in the list other than P into three lists to be laid out,

and we're going to recursively lay out these elements inside of our rectangles,

and make sure that we're keeping this ordered list the same.

The other algorithm was this idea of a Strip Treemap,

where we're going to scale the area of all the rectangles,

so the total area of input rectangles equals that of the layout rectangle.

We're going to create a new empty strip,

that's going to be our current strip,

then we're going to add the next rectangle to the current strip,

recompute the height based on the area of all rectangles,

and then recompute the width.

We're going to keep doing this,

try to keep the average aspect ratio of the current strip has increased.

We're going to push it back to the list of rectangles, go back to two.

And once we're able to process all the rectangles,

we're going to stop, otherwise we repeat back to step three.

So, we create these recursive elements,

trying to go from what we saw before in sort of our clustered squarified treemaps,

to ideas of how we can improve these to handle ordering,

to getting these sort of strip treemaps.

So, how do we decide if one treemap element,

one treemap visualization is better than another?

Well, in order to assess all these different treemap algorithms,

we need to have some sort of metric definition of what we mean by goodness,

how good are they?

People started thinking about goodness in terms

of the average aspect ratio of the treemap layout,

and the layout distance change function as well.

So, how much does a element move in the layout?

The goal is to have a low average aspect ratio,

so that means more square,

and a low distance change as data's updated.

So, the average aspect ratios, the unweighted average,

and the distance change is Euclidean distance change in the width,

height, and corner locations of the rectangle.

We're trying to maintain rectangle size,

so it's not suddenly shifting into a square,

and trying to maintain the position of the upper left corner of the rectangle,

so it's not jumping all over the treemap.

So, people tried looking at

all these different properties of the treemaps here and computed a bunch of statistics.

This should remind us a little bit about scagnostics,

so having metrics for diagnosing different elements here.

The slice and dice treemap is quick to do,

but it has a really high average aspect ratio compared to the squarified,

which has the lowest average aspect ratio,

but it has very little change property.

Whereas the square of phi has a high change property,

and then readability becomes tricky in some of these two.

So thinking about how readable different elements are and so forth and impending ratios.

So by computing the statistics,

we can start thinking about which treemap

might be the best for different type of data set.

We can try these and compare these with the pivot-by-middle, pivot-by-size,

more strip maps as we can start looking and

thinking about which treemap we might want to use.

The goal is also that we want to be able to show structure.

It's really important to be able to reinterpret these.

The problem is with regular borderless treemaps,

it's really difficult to interpret the hierarchical structure here.

If I look at the strip treemap,

I can sort of quickly pick out the first level.

I get those two horizontal stripes.

I know I've got three elements here, right?

Now I've got to go to this node and I can find the vertical one that cut it,

so I've got one, two, three, four,

and I have to keep going through this process to sort of mentally recreate this things,

so I can think about the hierarchical elements in the data.

So we can see that it becomes challenging to discern the structure of the hierarchy.

Especially, if we get really large ones like these guys here.

So people have thought about supplementing the treemap view,

or changing rectangles to other forms.

So people have used like cushioned tree maps where they use shading and

texture to try to help convey the structure of the hierarchy.

That's fine, this may give us some more idea of structure in the data but

remember the size of these elements corresponds to some property there.

So for example in a computer,

this will correspond to the file size.

So I can say, "Hey this might be a really big file.

Let me check out why it's taken up all this memory in this drive."

In the stock market,

it may correspond to the price of the stock.

With that, that might mean that zero value might be really important.

What if a stock value drops to zero?

What if I'm in a sports game and have a score of zero?

How do I encode that value into a node,

if I'm mapping areas how do I map to zero?

Because I want to show something because that might be important but

if I give it width in my misinterpreting this.

So thinking about how we map to zero,

one way to overcome this is to distort the classic treemap visualization.

So distorted tree map can show another attribute than a classic tree map,

so node area is no longer proportional to the attribute being visualized.

So instead we can scale this and make zero be important.

There's a lot of different implementation strategies.

For example, we use a regular treemap but we could add some epsilon to zero value items.

We could use an exponential mapping so two to the value.

So if the value is zero,

we get a size of one for the area or we can

assign some minimal screen space size to zero nodes.

But again, this can lead to sort of weird aspect ratios and other issues.

The final solution for contexts treemap was to calculate the total.

So in this particular paper,

which was from Johns Stasko's group on fund explore,

they calculated the total value,

created an additional total with respect to the context,

and split the context screen real estate among all the empty nodes.

The empty nodes were given the same level of importance.

And we can see an example here of a fund explorer.

Looking at where some of the zero node values are and trying

to again think about how we might interpret this sort of treemap.

Now, of course people also started thinking about,

"Well, I don't have to just do vertical and horizontal splits.

Why can't I lay out treemaps with other methods?"

So we had circular packing.

But what about voronoi treemaps?

So we can take our data and voronoi tessellation to the data,

and try to organize our elements into these voronoi diagrams.

But of course again, it gets tricky to extract and understand the hierarchy there.

But they look really pretty.

And so you can take a look at this work from Balzer or Deussen and Lewerentz in

2005 and see how they did voronoi treemaps for software visualization.

Take a look at their algorithm for that.

Essentially for a voronoi diagram,

we're going to let our partitions be a set of n distinct points,

and then the voronoi diagram of P is the subdivision of the plane into n cells,

one for each site and we're trying to capture this hierarchy.

So point Q is going to lie in the cell corresponding to a site.

The main algorithm for computing the voronoi diagram is Fortune's algorithm.

This would be something you would do more in a computer graphics class if you're

interested in learning more about voronoi tessellations and those sorts of things.

Take a look at Fortune's algorithm during the computer graphics class.

Mostly we mentioned the voronoi tessellation here just to give you

an idea of that treemaps don't have to just be square of phi,

we can do other sorts of elements.

The voronoi properties, essentially a point q lies in a voronoi edge

between different sites if and only if the largest empty circle

centered at q touches only p_i and p_j.

So we can start thinking about how to do these voronoi tessellations,

and how to do our splits to create this hierarchical clustering.

So there's more summaries of voronoi properties.

And by laying out these different voronoi elements,

we can try to maximize our screen space

and capture some different aesthetics that we don't have before.

What's nice is that voronoi diagrams have a linear complexity.

The problem is that not all bisectors are voronoi edges.

So it can become a little bit tricky in trying to think about how to

interpret these voronoi treemaps to get the hierarchical structure back out of those.

So while the voronoi diagram looks pretty,

it can be hard to interpret.

So people have also thought about circle packing.

So taking different levels, hierarchies,

and the zeroth level is our root node,

and then all the children from the zero level get packed in and so forth.

So we just, you're packing these circles into there.

This is maybe a little bit easier to interpret the hierarchical structure.

We can also extend this to a 3D view as we show here.

But it gets harder to read and interpret as this gets bigger and bigger and it takes up

more space and we wind up losing white space elements here as well.

So again, another option instead of doing node link or space dealing with treemap,

we can try a circle packing algorithm to fill in these treemap elements.

So I don't want people to think that they're just constrained

to node link diagrams or treemaps.

There's lots of different ways we can think about

what visual metaphors might enable us to

represent a hierarchical structure being inspired by things in nature,

thinking about taking advantage of different geometric concepts,

whether its circle packing or voronoi diagrams,

the goal of all of these things is to try to help

us visualize as much of the data as possible.

Remember with a node link diagram,

we already have sort of shape constrained by the shape of the nodes,

we're going to use circles, we can add color,

we can add size.

But again, this can cause weird elements to occur.

Imagine in our node link diagram,

it's size corresponds to the size of the file.

So is the root node supposed to be giant,

and then they get smaller as they go down.

Should we do a different method where they get bigger as they go up?

How do we use size to help add another dimension?

Should we just add color and ignore size?

How can we augment this and use

multiple visual variables to show multivariate aspects of the data?

These are the challenges we want to face and think about how we can use

these different data visualization tools in different ways to

help us explore and understand our data. Thank you.