The difference is that we add some additional code within the parentheses

with the seaborn regplot function.

We add order=2 to ask for a quadratic line that includes the second order polynomial.

If we run the code for both the linear and quadratic scatterplots at the same time.

We will get a single scatterplot with both the straight linear line and

the curved rush line.

Now my scatterplot shows the original linear regression line in blue,

and the quadratic regression line in green.

Notice how the quadratic line does a better job of capturing the association

at lower and higher urbanization rates.

The points at these levels are closer to the quadratic, or

second-order polynomial curve.

Meaning that the expected or predicted values are closer to the observed values.

So based on just looking at the two curves, it looks like the green quadratic

curve fits the data better than the blue straight line.

But we can be even more sure of this conclusion if we test to see whether

adding a second order polynomial term to our aggression model gives

us a significantly better fitting model.

I do this by simply adding another variable that is the squared value of my

explanatory x variable, x squared, to my regression model.

First, let's test my regression model for

just the linear association between urbanization rate and female employment

rate using the ols function from the stats model API formula library.

Note that we have centered our urban rate quantitative explanatory variable.

Urban rate, underscore, c.

Centering is especially important when testing a polynomial regression model.

Because it makes it considerably easier to interpret the regression coefficients.

If we look at the results, we can see from the significant P value and

negative parameter estimate that female employment rate

is negatively associated with urbanization rate.

So the linear association, the blue line in the scatter plot,

is statistically significant.

But the R-square is 9%, indicating that the linear association of urban

rate is capturing only about 9% of the variability in female employment rating.

But what happens if we allow that straight line to curve by adding a second order

polynomial to that regression equation.

The Python code to do this is here.