Welcome back. In this assignment,
we will be implementing some of the data fragmentation.
But we're going to be mixing the fragmentation and partitioning in this assignment.
So, you'll be given a user movie ratings database.
So in the user movie ratings database,
you have a set of users that have rated a set of movies and rating
here means it's like you give it a zero to five star rating to that movie.
So it's just one table,
and each user has rated a movie maybe once or a couple of times at different time stamps.
And what are you going to do is you're going to take that table,
you're going to load it into the database and
a database we're going to be using database system is
postgres The details of the version and
everything if the system is in the instruction document.
But you're going to load it in postgres database system,
and after you load it you need to implement several functions.
And these functions are like fragmentation or slash partitioning functions.
You need to implement these functions in Python.
So in other words,
you have a python script.
And a python script has to connect to
the postGres database that has the ratings table user,
item, a user movie rating table.
You connect to it, and after you connect to it you have to
implement a function called round robin partition,
another different function called range partitioning,
and then you also have to implement the insert for
either insert round robin or insert range partitioning.
So in other words, these functions with a partition round robin for example,
partition the database in a round robin fashion,
it will the same way we discussed in class,
and the other one which is the range partitioning will kind of
do the fragmentation based on range of values of attributes given an attribute.
So range partitioning will take an attribute as input,
but for the sake of simplicity in this assignment,
we'll ask you to do the range partitioning based on the rating value.
So give you the number of partitions either in range or round robin partition,
and then given these numbered partitions you have to decide how
can you put rows together into multiple partitions.
So the partition that you're going to create are
different again tables that are stored in a database.
And whenever you have a round robin insert for example,
this is inserting a new row in the table and you're rating user movie rating.
Based on the partitioning method that has been used,
you have to decide which partition,
which fragment to add the data to, to the new row to.
There's another function to implement which the delete partitions.
This basically resets the system.
So, whatever kind of partitioning method you have,
delete partition will just basically reset that
and bring it back to where you have just one table with no partitions at all.
It's not partitioned or not fragmented at all.
In this assignment, we're testing one partitioning method at a time.
And if we want to do and partition to the again we will delete the partitions first.
So reset the system and then try a new partitioning method.
Again you're implementing only two partitions.
So we're not going to test them both together at the same time.
So this is something to keep in mind.
You may need sometimes for some of the partitioning methods
you may need to create another table for
it to keep track of which partition you inserted
last or something in order to determine whenever you have a new insertion,
a new row insertion which partition to add to.
That's okay. You're allowed to do that inside of the database system.
The only thing you may not be allowed to do is in fact,
for example creating a new file in the file system.
You'd just have to use the database.
Again the output is Python scripts,
the python script connector database and issue SQL queries to the Posterous database,
that has the user movie rating table, this is just one table.
More details about the grand,
the scheme of this table,
about the details of e.g.
single function and implemented Python,
is an instruction document.
Best of luck. Thank you so much.