Bacon Number
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.
1) Preparation
The following file contains code and other resources you will need for this exercise: bacon.zip
This exercise requires you to complete the definition of three functions in lab.py
,
and submit answers to the questions on this page.
All of your changes should be made to lab.py
, which you will upload for testing at the
bottom of this page. Importantly, you should not add any (more) imports to the file.
We've also provided you with your own copy of the tests that the server
will run on your code, in test.py
.
The file uses the unittest module.
You can run test.py
as you would any python file. From Terminal/Command Prompt, you can
also run just particular test suites or test cases, e.g. python3 test.py Test00_Tiny
to run the suite called Test00_Tiny
, or python3 test.py Test00_Tiny.test_bacon_number_0
to run just the test_bacon_number_0
test in that suite.
2) Introduction
Have you heard of Six Degrees of Separation? This simple theory states that at most 6 people separate you from any other person in the world.
Hollywood has its own version. Every actor who has acted with Kevin Bacon in a movie is assigned a "Bacon number" of 1, every actor who acted with someone who acted with Kevin Bacon is given a "Bacon number" of 2, and so on. (What Bacon number does Kevin Bacon have?)
Note that if George Clooney acts in a movie with Julia Roberts, who has acted with Kevin Bacon in a different film, George has a Bacon number of 2 through this relationship. If George himself has also acted in a movie with Kevin, however, then his Bacon number is 1 and the connection through Julia is irrelevant. We define the notion of a "Bacon number" to be the smallest number of films separating a given actor (or actress) from Kevin Bacon.
3) The Actor Links Database
We've mined a large database of actors and films from
IMDB via the
www.themoviedb.org API. We present this data set
to you as a list of records (2-element tuples), each of the form (actor_id_1,
actor_id_2)
, which tells us that actor_id_1
acted with actor_id_2
.
Keep in mind that "acts with" is a symmetric relationship. If (a1, a2)
is in the database, it is true both that a1
acted with a2
and that a2
acted
with a1
, even if (a2, a1)
is not explicitly represented in the database.
However, these relationships do not necessarily exhibit the transitive
property. That is, if (a1, a2)
and (a2, a3)
are in the database, it
is not necessarily true that a1
and a3
have acted together (unless (a1,
a3)
or (a3, a1)
is in the database).
We store these data as pickle files
in the resources directory of the zip you downloaded: tiny.pickle
, small.pickle
and large.pickle
.
4) The Names Database
We also include a file, resources/names.pickle
, which
contains a representation of the mapping between actor IDs and names. You can use
the load
method of Python's pickle
module to get the data out of the file
and into Python. For an example of this, check out how we load databases in the
setUp
functions of test.py
.
You need only use the names database for the checker questions on this page. All
of the functions you will write in lab.py
use actor IDs, not names.
Answer the following questions using Python.
resources/names.pickle
?
5) Acting Together
To get used to the structure of the databases, complete the definition of
acted_together
in lab.py
. This function should take three
arguments:
- The database to be used (a list of records of actors who have acted together in a film, of the form described above),
- Two IDs representing actors
This function should return True
if the two given actors ever acted together
and False
otherwise. For example, Kevin Bacon (id 4724
) and
Steve Park (id 4025
) did not act together, meaning
acted_together(..., 4724, 4025)
should return False
.
When you are done implementing this function, check that it passes the associated tests in test.py
.
6) Bacon Number
Next, we want to find all of the actors who have a given Bacon number.
Implement this as a function called actors_with_bacon_number
in
lab.py
. This function should take two arguments:
- The database to be used (the same structure as before)
- The desired Bacon number
This function should return a Python set containing the ID numbers of all the
actors with that Bacon number. Note that we'll define the Bacon number to be
the smallest number of films separating a given actor from Kevin Bacon,
whose actor ID is 4724
.
Before you start writing this function, answer these questions by computing the responses manually from looking at the data. (We recommend drawing a picture of the tiny database on paper.)
tiny.pickle
? Enter your answer below as sorted Python list of integers (from smallest to largest):
tiny.pickle
? Enter your answer below as sorted Python list of integers (from smallest to largest):
tiny.pickle
? Enter your answer below as sorted Python list of integers (from smallest to largest):
tiny.pickle
? Enter your answer below as sorted Python list of integers (from smallest to largest):
Now you're ready to write your Bacon number code! Here are some things to think
about when writing your implementation. Consider the set of actors with a
Bacon number of 1. Here is a visual representation of the data from the
small.pickle
database.
![](https://smatz.mit.edu/_static/6s090/week3/alternate/bacon/graph1.png)
Given the set of actors with a Bacon number of 1, think of how you can find the set of actors with a Bacon number of 2:
![](https://smatz.mit.edu/_static/6s090/week3/alternate/bacon/graph2.png)
Once you get a sense for how to get the Bacon number 2 actors from the Bacon
number 1 actors, try to generalize to getting the Bacon number i+1
actors
from the Bacon number i
actors.
Note that the test cases in test.py
run against small and large databases of
actors and films, and that your implementation needs to be somewhat efficient
to handle the large database in a timely manner (i.e. below the time limit on the
server, which you can see when you upload your file). Your code should also handle the
case of arbitrary Bacon numbers (not just n \leq 6) since some databases may
be structured to assign some actors quite large Bacon numbers.
When you are done implementing this function, check that it passes the
associated tests in test.py
.
7) Bacon Paths
Finally, complete the definition of bacon_path
in lab.py
. The function should
take two arguments:
- The database to be used (the same structure as before),
- An ID representing an actor
The function should produce a list of actor IDs (any such shortest list if
there are several) detailing a "Bacon path" from Kevin Bacon to the actor
denoted by actor_id
. If no path exists, return None
.
Note that the paths are not necessarily unique, and any shortest list
that connects Bacon to the actor denoted by actor_id
is valid. The tests
do not hard-code the correct paths; they only verify the length of the path
you find (as well as that it is indeed a path that exists in the database).
For example, if we run this method on large.pickle
with Julia Roberts's ID
(actor_id=1204
), one valid path is [4724, 3087, 1204]
, showing that Kevin
Bacon (4724
) has acted with Robert Duvall (3087
), who in turn acted with
Julia Roberts (1204
).
tiny.pickle
. From those data, you should be able to manually compute a couple of shortest paths. What's the shortest path that connects actor 4724
to actor 1640
? Enter your answer below as a Python list of ID numbers:
Hint: This is a challenging problem. You may wish to review online material on "breadth first search" (although your code should be entirely your own).
A Tip for Speed
Anytime you usein
to check for membership in a list (e.g. if you’re checking whether you’ve already seen an actor), consider whether you could replace it with a membership check in a set. Under the hood, to check whether some element is in a list, Python has to iterate through the list, which can get very expensive. On the other hand, Python sets are implemented using something called hashing, which allows membership checks to be instant.
8) Code Submission
lab.py
below. This question will be manually graded for correctness (passing all provided test cases in under 30 seconds) and good code style. Remember to use succinct, descriptive variable names, docstrings and comments to explain confusing lines.