Welcome, in this video,

you will learn how to use XLMiner to perform logistic regression.

Throughout the video, I will use medical appointment data set.

I will first demonstrate how to build a logistic regression model

with one particular variable.

And then proceed to show you how to build a multiple logistic regression model,

with multiple predicted variables.

And then, I will show how to use XLMiner to partition the data set and

perform cross validation.

Here is the medical appointment data we discussed before.

To perform a logistical regression with one particular variable,

we are going to use two columns.

Status is used as the targeted variable and

the lab is used as the predictor variable.

If you have XLMiner properly installed,

you should see the XLMiner ribbon when you bring up Excel.

To perform logistic regression, click on Classify and Logistic Regression.

Note that all variables are listed here.

Choose Lag and move it to Select New Variables and

set Status as Output Variable.

At the bottom of the window we need to specify success class.

Since you are interested in predicting appointment cancellation,

you should check the box and select Canceled in the drop-down menu.

Also know that the default value for cut off permitted is 0.5.

Click Labs, you will see a number of options.

We will skip them for now and click Finish.

This creates three new output sheets in the Excel workbook.

Let's take a closure look at output sheet name LR Output.

At the top of the window is Output Navigator,

which can lead you to different sections of the output.

Let's scroll down to the Regression Model section.

As we can see here, the coefficient estimates are -1.7431 and 0.01658.

This table also gives us the p values, which are close to zero for

both coefficients.

Indicating that the model is statistically significant.

There are some additional summary statistics in the table on the right.

In particular, the Multiple R squared is 0.03179.

Note that the Multiple R squared is a shorter r squared value and does not share

the same interpretation as r squared from the linear regression model.

It also reports a residual deviance of 7,663.

Both the shorter r squared and

the residual deviance can be used to compare different models.

Larger values of shorter r squared and

smaller values of residual deviance are preferred.

In the lower portion of the worksheet,

a summary report on the predictive performance is given.

Since we did not partition the data, the result is based on applying

the models of the whole data set, which is used as a training data.

Building the multiple logistic regression model follows almost the exact same steps

that first return to the worksheet with the data.

We would like to add the gender variable to the model.

Before building the model, we first create a dummy variable for gender.

Click Transform > Transform Categorical Data > Create Dummies.

In a pop-out window, move Gender to Variables to be Factored, and click OK.

This creates a new sheet called Create Dummies.

Know that in the Data section, two additional columns are added.

The second to last column is Gender_F,

where the value is 1 if gender is F and 0 otherwise.

The last column is called Gender_M,

where the value is 1 if gender is equal to M and 0 otherwise.

Click Classified Logistic Regression.

Move Lag and Gender_M to select variables and assess status as output variable.