So, what we're going to do, instead, is tell
merge which of the variables it should merge on.
So, it merges the reviews and the solutions for the x data frame in
this case is the reviews data frame, we're going to merge on the solution id.
And in the solutions data frame, we're just going to merge based on the id.
Here I'm going to tell it, all equals true.
Which means if, if there's a value that appears
in one but not in the other, it should include
another row but with na values for the missing
values that are, don't appear in the other data frame.
And so, we can look at the top of this did new merge dataset and
you can see now that the solution id has now taken the place of the id.
And it's ordered by that variable and then this id variable
is the id variable that's left over in the reviews datasets.
So, this is the reviews main id that's appears here in this dataset.
So you can see that this has merged
the two datasets together, based on their common id.
So that then you can analyze it as one dataset.
The default again is to merge based on all column names common column names.
So, if you do an intersection of the names of the solutions data
frame and the names of the reviews data frame, you get these four variables.
And so, if you try to merge without telling it what to merge on the
basis of, it will try to merge on the basis of all those four variables.
So, what ends up happening is, the id
variable will match up sometimes between the two datasets.
But the start and the stop times won't necessarily match up.
So, what it ends up doing is it just
creates a data frame that's larger that applies multiple rows.