But then typically if you have some out lier like this you don't want to

look at the outlier you just want to look at the core of the data.

So it's

typical to kind of set the the y axis limits to be, to

be roughly kind of where the data are and just ignore the outlier.

So you can see that the time series that gets

drawn has all the data connected and that you can see

roughly where it's going to shoot off to a hundred and

comes back down to be roughly where it's suppose to be.

So you know that outlier is out there

somewhere, but you don't see it in the plot.

Now, if I do the equivalent plot in ggplot, I can create my ggplot with with

the test data, and the aesthetics to the x and y.

And then I add the geom_line function to make

a line plot as opposed to a scatter plot.

You can see that just plots the whole, all the data including the outliers.

And it's maybe not exactly the kind of plot

you want to make because the outliers maybe not that interesting.

So if you want to do this, it's you have to be careful about how you do it.

And so the first is that on the left-hand side, you

might think, well, I'll just change the y limits to be within,

kind, in the range of most of the data between minus 3 and 3.

The issue here is that what ggplot will do is that it will subset the data.

To include the values that are between minus 3 and 3.

And so, of course, the outlier is not included in this data

set and so you won't see that data point in this plot.

So you can see this clearly where the outlier's missing the

two lines are not connected, but then everything else is connected afterwards.

So if you want to recreate the kind of phenomenon

that you saw with baseplot You have to add, this special

function called coord_cartesian, which that sets the limits to be minus 3.

The one, the y axis limits to be minus 3 and 3.

Now you can see in the plot here that

the outlier is in fact included, in the dataset.

It's the dataset hasn't been subsetted to only include

the ones that are in the y axis range.

Um,so, I just want to go over a slightly more complex example of kind of adding

pieces to a plot, just so you can get

a sense of how the different layers are added on.

And then hopefully get you going from there.

So, so here I've just, I've made the

scientific question just a little bit more complex.

I want to know how is the relationship between PM 2.5 and

nocturnal symptoms vary by both BMI and nitrogen dioxide or NO2.

And so as NO2 or BMI values change how what does the relationship between

PM PM 2.5 and nocturnal symptoms look like?

So one tricky thing about this is unlike our previous BMI

variable which is kind of categorized into normal and over weight.

Now, NO2 variable is continuous, or it's really the

log of the NO2, and it's really a continuous variable.

So we need to, so we can't really condition on a continuous variable

when we're making plots because then there would be an infinite number of plots.

And so we need to categorize this variable into a reasonable series of ranges.

And so what we're going to do is we can use the cut function

for this purpose, to cut literally cut the data into a series of ranges.