Bacon Number

The questions below are due on Sunday June 30, 2024; 10:00:00 PM.
 
You are not logged in.

Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

1) Preparation

The following file contains code and other resources you will need for this exercise: bacon.zip

This exercise requires you to complete the definition of three functions in lab.py, and submit answers to the questions on this page. All of your changes should be made to lab.py, which you will upload for testing at the bottom of this page. Importantly, you should not add any (more) imports to the file.

We've also provided you with your own copy of the tests that the server will run on your code, in test.py. The file uses the unittest module. You can run test.py as you would any python file. From Terminal/Command Prompt, you can also run just particular test suites or test cases, e.g. python3 test.py Test00_Tiny to run the suite called Test00_Tiny, or python3 test.py Test00_Tiny.test_bacon_number_0 to run just the test_bacon_number_0 test in that suite.

2) Introduction

Have you heard of Six Degrees of Separation? This simple theory states that at most 6 people separate you from any other person in the world.

Hollywood has its own version. Every actor who has acted with Kevin Bacon in a movie is assigned a "Bacon number" of 1, every actor who acted with someone who acted with Kevin Bacon is given a "Bacon number" of 2, and so on. (What Bacon number does Kevin Bacon have?)

Note that if George Clooney acts in a movie with Julia Roberts, who has acted with Kevin Bacon in a different film, George has a Bacon number of 2 through this relationship. If George himself has also acted in a movie with Kevin, however, then his Bacon number is 1 and the connection through Julia is irrelevant. We define the notion of a "Bacon number" to be the smallest number of films separating a given actor (or actress) from Kevin Bacon.

3) The Actor Links Database

We've mined a large database of actors and films from IMDB via the www.themoviedb.org API. We present this data set to you as a list of records (2-element tuples), each of the form (actor_id_1, actor_id_2), which tells us that actor_id_1 acted with actor_id_2.

Keep in mind that "acts with" is a symmetric relationship. If (a1, a2) is in the database, it is true both that a1 acted with a2 and that a2 acted with a1, even if (a2, a1) is not explicitly represented in the database.

However, these relationships do not necessarily exhibit the transitive property. That is, if (a1, a2) and (a2, a3) are in the database, it is not necessarily true that a1 and a3 have acted together (unless (a1, a3) or (a3, a1) is in the database).

We store these data as pickle files in the resources directory of the zip you downloaded: tiny.pickle, small.pickle and large.pickle.

4) The Names Database

We also include a file, resources/names.pickle, which contains a representation of the mapping between actor IDs and names. You can use the load method of Python's pickle module to get the data out of the file and into Python. For an example of this, check out how we load databases in the setUp functions of test.py.

You need only use the names database for the checker questions on this page. All of the functions you will write in lab.py use actor IDs, not names.

Answer the following questions using Python.

Which of the following best describes the Python object that results from loading resources/names.pickle?

What is Ziva Rodann's ID number?

Which actor has the ID 936084?

5) Acting Together

To get used to the structure of the databases, complete the definition of acted_together in lab.py. This function should take three arguments:

  • The database to be used (a list of records of actors who have acted together in a film, of the form described above),
  • Two IDs representing actors

This function should return True if the two given actors ever acted together and False otherwise. For example, Kevin Bacon (id 4724) and Steve Park (id 4025) did not act together, meaning acted_together(..., 4724, 4025) should return False.

When you are done implementing this function, check that it passes the associated tests in test.py.

6) Bacon Number

Next, we want to find all of the actors who have a given Bacon number. Implement this as a function called actors_with_bacon_number in lab.py. This function should take two arguments:

  • The database to be used (the same structure as before)
  • The desired Bacon number

This function should return a Python set containing the ID numbers of all the actors with that Bacon number. Note that we'll define the Bacon number to be the smallest number of films separating a given actor from Kevin Bacon, whose actor ID is 4724.

Before you start writing this function, answer these questions by computing the responses manually from looking at the data. (We recommend drawing a picture of the tiny database on paper.)

What are the ID numbers of the actors who have a Bacon number of 0 in tiny.pickle? Enter your answer below as sorted Python list of integers (from smallest to largest):

What are the ID numbers of the actors who have a Bacon number of 1 in tiny.pickle? Enter your answer below as sorted Python list of integers (from smallest to largest):

What are the ID numbers of the actors who have a Bacon number of 2 in tiny.pickle? Enter your answer below as sorted Python list of integers (from smallest to largest):

What are the ID numbers of the actors who have a Bacon number of 3 in tiny.pickle? Enter your answer below as sorted Python list of integers (from smallest to largest):

Now you're ready to write your Bacon number code! Here are some things to think about when writing your implementation. Consider the set of actors with a Bacon number of 1. Here is a visual representation of the data from the small.pickle database.

Given the set of actors with a Bacon number of 1, think of how you can find the set of actors with a Bacon number of 2:

Once you get a sense for how to get the Bacon number 2 actors from the Bacon number 1 actors, try to generalize to getting the Bacon number i+1 actors from the Bacon number i actors.

Note that the test cases in test.py run against small and large databases of actors and films, and that your implementation needs to be somewhat efficient to handle the large database in a timely manner (i.e. below the time limit on the server, which you can see when you upload your file). Your code should also handle the case of arbitrary Bacon numbers (not just n \leq 6) since some databases may be structured to assign some actors quite large Bacon numbers.

When you are done implementing this function, check that it passes the associated tests in test.py.

7) Bacon Paths

Finally, complete the definition of bacon_path in lab.py. The function should take two arguments:

  • The database to be used (the same structure as before),
  • An ID representing an actor

The function should produce a list of actor IDs (any such shortest list if there are several) detailing a "Bacon path" from Kevin Bacon to the actor denoted by actor_id. If no path exists, return None.

Note that the paths are not necessarily unique, and any shortest list that connects Bacon to the actor denoted by actor_id is valid. The tests do not hard-code the correct paths; they only verify the length of the path you find (as well as that it is indeed a path that exists in the database).

For example, if we run this method on large.pickle with Julia Roberts's ID (actor_id=1204), one valid path is [4724, 3087, 1204], showing that Kevin Bacon (4724) has acted with Robert Duvall (3087), who in turn acted with Julia Roberts (1204).

Take a look at the data in tiny.pickle. From those data, you should be able to manually compute a couple of shortest paths. What's the shortest path that connects actor 4724 to actor 1640? Enter your answer below as a Python list of ID numbers:

Hint: This is a challenging problem. You may wish to review online material on "breadth first search" (although your code should be entirely your own).

A Tip for Speed

Anytime you use in to check for membership in a list (e.g. if you’re checking whether you’ve already seen an actor), consider whether you could replace it with a membership check in a set. Under the hood, to check whether some element is in a list, Python has to iterate through the list, which can get very expensive. On the other hand, Python sets are implemented using something called hashing, which allows membership checks to be instant.

8) Code Submission

When you have tested your code sufficiently on your own machine, submit your modified lab.py below. This question will be manually graded for correctness (passing all provided test cases in under 30 seconds) and good code style. Remember to use succinct, descriptive variable names, docstrings and comments to explain confusing lines.
 No file selected