Hello everybody. Welcome to Python for Everybody. We are doing some code walk-throughs. If you want to follow along with the code, you can download the source code from Python for Everybody website. So, the code we're playing with today is twfriends.py. This is a step beyond the simple twspider. It is a restartable spider. But we're going to date the model things a little bit differently. We're going to have two tables and we're going to have a many-to-many relationship accepted. It's a many-to-many relationship between the same table, which is okay. Twitter friends are a directional relationship. So, we start out here in twfriends.py. Remember, that the file hidden.py, I'll show it to you, but I'm not going to open it because I've got my keys and secrets in it. So, this hidden.py file, you've got to edit that, and you got to go to apps.twitter.com and get your keys and put them in there. Otherwise, these things won't work. But if you have Twitter and you set your API keys up, and you put them in the hidden.py, then all these things will work. It's fun actually, and impressive. Not hard to do actually. So, the Twitter URL, that's my library, that reads hidden.py and augments the URL does all the OAuth stuff, JSON and SSL because Python doesn't accept any certificates even if they're good certificates. So, we crush that. Here's our friends list that we're going to hit. We're going to make a database friends.sqlite. Now, here we're doing a create table if not exists. So, what this really is saying is, I want this to be a restartable process and I don't want to lose the data. We're starting out, we do not have any SQLite files. So, this is going to create the database and create these tables. But the second time we run it, we're not going to recreate the tables. We're not going to be able to restart this, because we're going to run out of rate limit before we finish this. So, we just have to wait, however long the time it takes to reset, and we'll watch the rate limit go down. So, we're going to have a people table, and we're going to have a primary key and the name. The name is going to be unique, and whether or not we've retrieved it. That's from a previous one. Then there's the who follows who, the from ID to, to ID. So, this is a direction and we're going to put a uniqueness constraint in, just like we do in many-to-many, that basically says, the combination of from ID and to ID has got to be unique. We don't allow ourselves to put duplicates of the combination. So, from ID can be one in many records, and to ID can be one in many records, but one, one is only allowed once. This is the crud we have to do to convince Python to accept the Twitter certificate. So, this is similar to some of the other stuff that we've done. We're going to enter a Twitter account or quit. If we enter by itself, then we will actually go and retrieve a record that was not yet retrieved. Now, we're actually pulling out two values, ID and name. So, we will grab, fetch one, is going to give us a two tuple basically. We're going to store that in ID, in account. Of course, that's like, this is coming back with a two tuple, first of which is the ID from the database. Limit one, means we're only going get one of these or zero of these. If there's zero these, that means there are no unretrieved Twitter accounts. Retrieved equals zero, well, you'll see in a second, that all the new accounts we put in, are the ones for which we haven't retrieved. Again, given that our rate limit, we want to know which ones we've retrieved. Okay? So, what we're going do next is, we're going to check to see if the person that we just checked, which means the length of the account is greater than what we just entered. We're going to check to see if they're already there. Okay? We're going to select ID from people where name equals. So, that's the one we just entered. We're going to fetch one and grab the first thing, because we only got one thing in the select statement here. If this person that we just asked to see is not in the table, that means this is going to fail, we're going to do an insert or ignore. This our ignore is redundant because we just checked to see if it was there. But we'll put that in just to be safe. We're going to put the name in as the new account that we're looking at, indicating that retrieved is zero. So, that we will know that we haven't retrieved it yet. You'll see that we'll update that in a second. We commit it, so that later selects we'll see this. So, you got to do the commit. This later select wouldn't see the one we just inserted. We're going to ask how many rows were affected, and if it's not equal to one, then, we're going to complain about- we inserted it, and we're going to do this thing. We are going to ask, "Hey remember there was an ID up there?" Right here, ID integer, primary key, and we did not insert this here, but we want to know what that ID is. Every time I was showing you that in lectures, I was saying it's really easy in Python to do this. That's what we're saying. This cursor did the insert, but one of the things happens is after the insert, we're going to grab the last row ID, which is the primary key that was assigned by SQL. Okay? So, that means, that one way or another, coming through this code here in line 45, one way or another, we're either going to know the ID of the user that was there before, or we just inserted one and so we're going to know the primary key of the current user. You'll see why we need that. So, ID, is the primary key of the current user that we entered right here. Okay? Now, what we're going to do is do the Twitter URL augment with OAuth, and all the keys, and the secrets in hidden.py. Instead, we're going go through let's count 1,000. Let's go count, what the heck. Let's go 200. Up to 200 friends. No, let's do 100. We'll keep it that way. Then, we're going to retrieve it and we're retrieving the account. We're not going to print the nasty URL out, we could. Then we're going open the URL with a connection, and then we're going to read that, and we're going get the UTF-8 data from this, and then we're going to decode that, and we're going to have the unicode data. So, the data in string is a internal Python string, with all that data representing all the wonderful characters. Of course, we're going to ask URL open to give us back the headers as a dictionary, using this call. We can see how many we have left for the remaining. Right? What's the remaining rate limit that we have? So, then we're going to parse the data with JSON load S. Wait, I need to continue in here. Continue. Okay. Save. If we're going to parse this data, we'll print it out, right? So, that means that this died, which means it's not syntactically correct, json basically. Who knows if we're ever going to see that, but at least, when it blows up it'll print this data out. We'll have to catch it, and then it'll continue. Actually, I'll make this a break, because if that's blown up that bad, we should quit. Now, I don't yet know what happens when this rate limit says you can't have it, and so but I do know that I expect when it's successful, that there will be a key of users in this outer dictionary that we're going to get. If this outer dictionary, that if users is not in the parsed dictionary, then I'm going to dump out this data so that at least I can debug what happens when I've got some broken json. So, the difference between this code, this code is going to fail when the json is syntactically bad, meaning a curly brace isn't right or whatever. This code will trigger when I get good json, but I don't have a user's key in it, okay? So, then, once we've retrieved it, we will be pretty happy with it. We're going to update for our account that we're retrieving. We're going to set this is one of our retrieved accounts, okay? Then, what we're going to do is write a loop that goes through all the friends of this particular user that we're asking, and gets their screen name, prints it out, and then we're going to check to see if this one is already in our people database, because this is a spider, we're grabbing accounts. So, we'll do a friend ID, and do a fetch one, grab the subzero thing. If that works, if this person not in there, this fetch one is going to blow up which means we're going to drop down to the except code. But if it does work, we have friend ID, that if they're there, and they're already in our database. They just weren't retrieved. Okay. So now, if the friend ID wasn't there, we're going to do an insert into, setting retrieved to zero, and then we're going to commit. Now, remember, row count is how many rows were affected by this last transaction code at row count, and we're going to die if that insert doesn't work. This is unlikely, unless somehow we've ran out a disc drive or something, and we're going to grab the friend ID as the key, the last row that was inserted. We're only going insert one row, so it's basically the primary key of the row that we just inserted. So, if you look at this code right here, it comes out the bottom, one way or another with friend ID successful. Friend ID is either they are already in our database or they're not, and if we insert them, then we have it. So now, this count new and count old is just so I can print out a nice print out. Now, we are going to insert into the friend's table which is called the follows table in this case, from ID and to ID. Those are the two outward pointing foreign keys, and we have the ID of the account that we're retrieving the friends of, and then this particular friend. So, we're inserting the connection from this person to that person then we commit it. We want to commit these again so that later selects when the loop goes back up, later selects get all of that data that's going on. So, we do want to commit from time to time, and then we close the cursor at the very end. So, let's run this and see what happens. So, python twfriends.py. Of course, I am a refugee from python two, so I always forget to type python three. So, we're going to start. If we take a look right now, I'm going to start another tab over here, and ls-l *sqlite. Now, that sqlite file is there, and it's actually made the tables. If you go up here, it ran all this stuff, create the tables, yada, yada, and we're sitting right here at this line. As a matter of fact, I think without causing too much trouble, I can open that database, and get into this database right here. There is no data in the Follows table, and there is no data in the People table. It's completely empty. So, we're waiting for the first one, and I'll go with mine, Doctor Chuck. So, it's retrieving the 100 friends, and they all were brand new. They are all inserted. So now, if I hit Refresh, we will see that Doctor Chuck is retrieved. Who follows? So, these are all the people I follow, for one follows two. So, if we look at here, we see that Doctor Chuck follows Stephanie Teasley. Because we grabbed the followers of Dr. Chuck, we're going to have a record in all of the follows for all the ones that I did, right? So, these are all the people I followed, and we put them in. So, we can go back and we can, let's see grab somebody. Let's go grab Stephanie Teasley, and let's pull out her friends. So, we grabbed 100 of her folks. I got 14 left. That's my x-rate limit. I did Stephanie Teasley, so let's go back here. So, you'll notice there's 101, that row is going to be 182. That's interesting. So, we've retrieved Doctor Chuck and Stephanie Teasley, and let's go take a look in the Friends table, the Follows table. So, we have all the people I follow, now all the people Stephanie follows. So, there we go. Let's go ahead and do somebody else. Let's see here. I think we both follow Tim McKay. Where is Tim McKay? Let's follow Tim McKay. Let's see who to Tim follows, If we can get like an overlap. We revisited some. Let's see if we can see this in the Follows. Let's see People. So, we've got Dr. Chuck retrieved. Tim McKay's somewhere down here. It might take us a while before we get any really good overlaps. Let's see. Let's do a database called. Let's see. Let's do a database SQL. Select, count. Yeah. Okay. So, let's just run this some more. It's clearly working. Now, one thing I can do here is I can hit "Enter", and it will just pick one randomly. So, it grabbed Liveedutv. Let's see how many I got left. We got 12 left. Now, I can hit "Enter" again. It picks another one. That was the next one. It's picking them in order. Is it picking them in order? Let's go to "People". Yeah, it's picking these. So, we can see that it's going to just do the first unretrieved person, who's Nancy. Let it retrieve Nancy. So, it grabbed Nancy, new. So, we're finding some and this table's getting really big. So, if we look at the "People" table, we now have 455 people and we have 467 following records. So, there we go. Hit "Enter", does another one, and away we go. So, you get the idea. I can type quit to finish. Just to give you a little interesting bit of code to show you how to do selects, I'm going to do this twjoin. Now, you'll notice that we're not talking, let's show you one thing, ls minus l star sqlite. So, this database has it. So, I can restart this process and run it again. The database is still there. So, we just grab, swear_trek. So, we can keep doing this. So, this data, it keeps extending. So, this is a restartable process. I can run it. Then, tell it to grab the next unretrieved one. So, away we go. So, that's part of it. I've got eight left. How many do I have left, really? Let's keep going. How many do I got left? I got five left. Okay. Wait, I guess we'll just run it out. So, I got four left. You know what I should do? I can't change the code. Yes, I can change the code. I can stop the code and I can quit the code. So, what I'm going to do is I'm going to change this code a little bit really quick and I'm going to print the headers of rate limiting at the beginning and at the end. So, now I can run it again. I changed the code. Hopefully, I didn't make a Python error. Tell it to go get another one, Ana Navarro. So, I got three left. We'll see what happens when I run out of rate limit. Run out of rate limit. So, we have one left. Hit "Enter" and control K, OpenSource.org. So, we have zero left, that worked. Now, let's see what happens. I don't know what happens next. We blew up. Too many requests. So, we got a HTTP Error 429. So, that means that going from Marc Cuban, that was in line 48, so the right thing to do would be in line 48. We should really put this in a "try except" block. "try except" block because it gives us an error. Print, fiddlesticks. How do I print the exception message? I always am forgetting. Print "Failed to Retrieve". So, we'll put that in. Now, if I run it. Then, I have to put a "break" here because that's not good. Break. Fail to Retrieve. See, I never know how to print out the error message. Yeah. So, see that's the weird thing about stuff is, that I don't ever remember enough. I don't remember the syntax, what I say here, to print the error message out. So, I'm going to go to Google and I'm going to say, "Print out the exception message in Python." Python 3, hello. So, let's go find it here in the documentation. Except. Is this it? Is this what I say? I just want to print out the message. That's it. Except. Let's try this. So, this is part of Python programming is, for me at least, because I'm just not a genius expert at this stuff. This is one thing I like about Python is you can guess stuff and sometimes you guess right. So, there we go. We got the error, we got the nice little error message, and we see error 429: Too Many Requests. So, that cleans that up nicely. So, we have run out of requests. On that, it is a good time to say thanks for listening. I hope that you found this valuable.