Once you’ve collected and interpreted data, what do you do with it? In this module, you’ll learn how to take the next step: how to use data about actions in the past to make to make predictions about actions in the future. You’ll examine the main tools used to predict behavior, and learn how to determine which tool is right for which decision purposes. Additionally, you’ll learn the language and the frameworks for making predictions of future behavior. At the end of this module, you’ll be able to determine what kinds of predictions you can make to create future strategies, understand the most powerful techniques for predictive models including regression analysis, and be prepared to take full advantage of analytics to create effective data-driven business decisions.

Professor of Marketing, Statistics, and Education, Chairperson, Wharton Marketing Department, Vice Dean and Director, Wharton Doctoral Program, Co-Director, Wharton Customer Analytics Initiative The Wharton School

Peter Fader

Professor of Marketing and Co-Director of the Wharton Customer Analytics Initiative The Wharton School

Raghu Iyengar

Associate Professor of Marketing The Wharton School

Ron Berman

Assistant Professor of Marketing The Wharton School

Welcome back.

I hope that Rugu's content on regression and related techniques make sense to you.

I hope it's pretty clear now how we can take a bunch of data from let's say,

period one.

Whether it's again pass behavior or marketing activities,

competition, whatever, to predict something about period two.

Whether its a number of purchases, whether someone stays with us or not.

It's really, really important to be able to do that.

Unfortunately those techniques are common, they're very accessible.

You don't necessarily have to have special software.

We can do it in something as simple as Microsoft Excel.

In fact, I want to talk about an example where people were doing that kind of

thing long before they had any kind of the computational power that we have today or

even the rigid data that we have today.

I want to take you back to the late 1960s, the early 1970s.

It was the dawn of what today we would know as direct marketing.

It really was when a lot of these ideas of when customer analytics where born.

It was the first time that we really had any kind of

granularity about what particular customers were doing and

a desire to what know each and every one of customers would be doing next.

And for how long, and for how much money?

And so it became very important for companies to come up with

what we like to call KPIs, key performance indicators.

Can we look at some indicators of what people had been doing in the past in order

to make some accurate statements about what they're likely to do in the future?

And again, this is just a natural area to run something like a regression model and

indeed, regression models were used for this kind of purpose.

But it wasn't this just throw in tons and tons and tons of data.

Because part of it was the data was limited, part of it as I said,

is that our computational is limited so we have to think very carefully.

It was very, very important for us to come up with just a few measures that

would be fairly predictive of what customers would be worth in the future.

So our forefathers in direct marketing,

they basically did the kinds of things we've been talking about here.

Let's take our dataset, let's chop it into two pieces.

Let's collect some data from period one to see which elements of that period

one data would be most predictive of what people did in period two.

And again in period two, we'll be looking at how many purchases they made or

what was the dollar value of those customers?

And they ran lots of models to try and find out which bits of data were most

predictive and they do it over and over again.

Lots of different data sets, lots of different products,

lots of different geographies, lots of different customer segments.

Because we wanted to find a few of those explanatory variables

that were pretty robust that time and time again would prove to be predictive.

And this is where our forefathers in direct marketing came up with the idea of

RFM, recency frequency monetary value.

What they found time and time again, back in the 60s, early 70s.

And we still see true today here in the 21st century,

is that you can give me these three summary metrics.

You give me recency, frequency, monetary value.

You tell me the last time that someone made a purchase with me or

did some other kind of economically valuable activity.

Maybe they took a sales call, maybe they visited the website.

So they did something that suggest that they

going to become a more valuable customer.

Generally, we're talking about a purchase.

So that's our, that's recency.

Now, tell me about frequency.

Tell me how many purchases they made or how many

economically beneficial activities they did over a set period of time?

Let's say the last year or two.

And third would be monetary value and I think that's pretty much self explanatory.

So when they did those economically beneficial activities, what was

the overall or the average monetary value of each and every one of them?

So if you can give me RFM,

recency frequency monetary value, I can make a very accurate statement

about what that customer's going to be worth in period two.

And again,

this was one of the first areas where regression analysis was used in marketing.

It was one of the first ways for folks in marketing to say, you know what?

All of that data that we've been collecting,

not really sure what to do with it, woah, there's real value there.

We can really predict stuff.

Then we can start to change our business to take advantage of these insights about

what's likely to happen in the future, not just what happened in the past.

So I just want to put RFM out there as just one very nice example

of an application of the kinds of things that I was talking about.

And now I want to go one step further.

So we can run these regression models and we can take whatever data we have.

Again, we could start with something as simple as RFM.

We can bring in many, many more kinds of measures, much more complicated,

much more interesting.

And make statements about what's likely to happen in period two?

And again, if all your interested in is making statements about period two.

How many purchase going to happen in the next year, who's going to turn or not.

Then regression type of models are fine.

That's in fact you can do better than regression type model.

There are different kind of data mining that might be out there.

But what happens when you want to go beyond period too?

What happens when you want to make statements about period three or

period four?

What happens if you want to talk about something like customer lifetime value?

Well we don't want to limit our statements just to what

a particular customers going to do over the next year.

But if we want to go out there and acquire customers, if we want to figure out what's

the maximum amount that we should be willing to spend on the customer?

We can't limit ourselves just to how much they going to pay us,

how much profits will get from them in the next period.

We need to project that out way into the future.

And the problem is, regression tied models are fairly limited at their

ability to do that kind of thing and let me try to explain why.

And let's go back to the timeline that I described before.

We get all of this data in period one to make a statement about what we see

in period two.

And we run a regression model to predict sales as a function

of visits to the website, usage of social media, marketing activities,

everything under the sun.

That's great.

What happens if you want to make statements about period three?

Well if all you want to do is make statements about period three,

that's not so bad.

You'll say, wait a minute, wait a minute, wait a minute.

I have this data on period two, instead of using period two as my dependent variable,

that's the thing I want to explain in my regression.

Why don't I look at period two and get my explanatory variables from it.

Why don't I look at the visits to the website, the marketing touches, the RFM?

I have period two, so let me take all the period two data.

Now to try to make a prediction about what will happen in period three.

And hey, I already ran my regression, so I have my regression coefficients,

I have all the outputs, I have everything that Rugu was talking about.

So let me just jam in my period two data into that regression and

make statements about period three.

You see?

I can predict the future and that's great.

And if you want to go one statement one period out, terrific.

But what happens when you want to go to period four?

We don't have any data beyond period two.

We don't have any x variables from period three in order to predict period four,

what are we going to do there?

How far out into the future can we go?

The problem with progression type models is that they're limited.

That if you don't have any data to use as inputs into the model

then you can't get the outputs.

Then soon no matter how long your observation period might be,

you're limited as to how far into the future you can make statements.

Now in many cases this isn't a problem.

For many kinds of decisions that companies want to make, simply

being able to make statements about one, maybe two periods out is perfectly fine.

In fact, you might say that most decisions are perfectly adequate.

And these limitations of regression aren't going to be a problem and I agree.

But there are times especially when we want to ask when type questions,

what long run type questions, like I mentioned customer lifetime value already.

It's one thing, if you want to make a statement about is this customer going to

turn in the next period or not.

Regression models are going to be great for that kind of thing.

But if we want to ask a question instead, when will this customer turn?

If they survive through the next period, how many more periods will they survive?

Regression won't really work well when we're projecting way outside

of the range of data that we had in the first place to run the original model.

So if we want to make these longer run projections.

And I'm going to keep coming back to talk about customer lifetime value

as one very very nice, very, very practical example

of something that we're going to want to do over a longer period of time.

And today, as firms start talking much more about customer centricity,

that we want to figure out who the right customers are.

And we're willing to invest in them because they're going to be so

worth it in the long run, we need to have some visibility into the long run.

We need to be able to make these predictive statements about the long run

in order to see if those investments are justified.

So there's much more interest than ever to be able to make statements beyond

period two.

And so I want to talk about a very different kind of modelling approach

that's not nearly as popular as regression models are.

But it's not necessarily any more complicated.

And as our view to the future goes further and further out.

It becomes more and

more important to add this other kind of modeling approach to your tool kit.