To give you an example of, it's a
little more complicated I am actually going to use
a website of the Baltimore Ravens which is an
American football team that is based here on Baltimore.
So, this the homepage of that team on ESPN.
Which is a sports channel in the U.S. and so
what we're going to do is actually look at the source code.
So, if you right click on the page and say view
source what you'll see is a source document that looks like this.
It's actually quite complicated.
That's the source code that actually
demonstrates, that's process to show you the
website that you actually see when you go navigate your browser to that website.
So we're going to actually drill into this website
source code and see if we can extract some information.
So again I'm going to pass the file URL.
This is the URL of that website, if you go back to the previous page.
So now, since I'm parsing an HTML file, instead of an XML
file, I'm going to use HTML tree parse instead of XML tree parse.
Um,the difference is our, it's different enough you want to
use HTML tree parse when you're parsing an HTML file.
And then I'm going to pass the command use internal equals true, so
that I can get all the different nodes inside of that, that file.
So now what I'm going to do is, I'm again going
to use the xpathSApply to programatically extract some components of this document.
So I'm going to start with the whole document again.
And now what I'm going to try to do is, I'm going to
again extract the XML value to the value inside of certain elements.
But I want to find very specific kinds of elements.
So here's what I'm going to do.
I'm going to look for elements.
That are list items li, and that have a particular class.
So they have class equal to score.
And so, what this is going to do is it's
going to go through the entire document, and any time it
sees a tag that is a, a list item,
it's going to check and see if it's class is score.
And if it's class is score it'll return the XML value.
So it turns out if you go back to this website and you look very very
carefully, That you can see for example there are these list items and
the class in this class is equal to the team name and so the
next element that I'm going to be extracting
on this page is actually the team name.
So it's the same thing.
I look for a list item with class equal
to team name and it will extract those team names.
So the way this works is you go to the website and
you find the tag and any attributes that you want to extract,
and then you need to write them into this xpath language to
extract only the data for those specific elements with xpath, as applied.
So, if I do this, I end up with the scores for each of the games and the teams.
So, I've scraped from that website information directly using