Toll Lanes: Simulation

The questions below are due on Monday July 26, 2021; 11:00:00 AM.
 
You are not logged in.

Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.
Toll Simulator

Caution

This is a challenging problem with a lot of new material. Please do not hesitate to reach out via OH or email for sanity checks or assistance!

As a toll company, we need to decide on how many toll lanes to have. Therefore, we're interested in predicting how many cars there will be waiting at the tolls on average given a certain number of toll lanes - and we'll do this via a Monte Carlo simulation. We'll repeatedly run random trials and then average them to get a more robust estimate. We'll use those estimates to optimize the number of lanes to build.

Based on years of traffic data, our company knows the probabilities of different numbers of cars arriving at this section of road, which they refer to as arrival_probs. It is represented as a dictionary mapping a number of cars to the probability that each minute, that many cars will arrive. For instance, {0: 0.5, 1: 0.3, 3: 0.2} means that for any given minute, there is a 50% chance of 0 cars arriving, a 30% of 1 car coming, and a 20% chance of 3 cars.

Single Lane Probabilities

However, in order to model the tolls - specifically how each minute one car can go through - we want to know the odds of the number of cars arriving at a single toll lane. If there's only 1 lane, then arrival_probs is already what we want. But if there are multiple lanes, then cars will randomly and uniformly choose a lane regardless of how many cars are already in that lane. (This is different from the hypothesis mentioned in the Behavior exercise) For example, we could work out the math and show that if the overall arrival_probs = {0: 0.5, 1: 0.3, 3: 0.2}, then for two lanes, the probability of the number of cars arriving in a given lane is {0: 0.7, 1: 0.2, 2: 0.05, 3: 0.05}. But since math is gross, we will provide a formula and you will code it up so Python does the work for us!

Given that there are n total cars arriving, with t possible toll lanes, then the probability of a given lane receiving k cars is equal to this formula1:

P(n, t, k) = \frac{{n-k+t-2 \choose t-2}}{{n+t-1 \choose t-1}}

where {a \choose b} is the choose or combination operator and can be coded using math.comb(a, b). You'll want to import math at the top of your file. If you have a pythonn version earlier than 3.8, please use the following function:

def comb(n, k):
  return math.factorial(n) / math.factorial(k) / math.factorial(n-k)

To illustrate, if we want to calculate the overall probability of 0 cars arriving in any given lane, when there are two lanes and given the arrival_probs of {0: 0.5, 1: 0.3, 3: 0.2}, then we would need to compute P(n, 2, 0) for all possible n (remember this is the number of total cars) multiplied by the probability that n total cars arrive. So our computation would be:

0.5 * P(0, 2, 0) + 0.3 * P(1, 2, 0) + 0.2 * P(3, 2, 0) = 0.7

Your first task will be to write a function get_lane_probs(arrival_probs, num_lanes) which takes in arrival_probs - a dictionary representing a probability distribution as described above; num_lanes - an integer representing the number of lanes present; and return a dictionary mapping a number of cars to the probability that number of cars arrives in a given lane.

As an example, get_lane_probs({0: 0.5, 1: 0.3, 3: 0.2}, 2) should return {0: 0.7, 1: 0.2, 2: 0.05, 3: 0.05}. Please write the function and then test/debug on your own with the example above as a test case. Then when ready, upload it here for additional testing and test cases. Other test cases from the checker are visible in the Detailed Results and can be used for debugging if needed.

Common errors:

  • ValueError: n must be a non-negative integer - this is likely because math.comb is being called with a non-negative integer. For example, if we do P(1, 2, 2), then notice it would be \frac{{-1 \choose 0}}{{2 \choose 1}}. But does it make sense for k (the number of cars in a given lane) to be greater than n (the total number of cars arriving)?

  No file selected

Simulating Arrivals

Now we will start simulating the process of arrivals by running random trials and averaging the results to get a reasonable estimate (also referred to as the Monte Carlo method). Since your 15.066 class will be doing something similar using numpy and numpy runs faster than generic Python operations, we will practice using numpy here as well!

Overall, we will have a numpy array representing the number of cars initialized to zero. Then for each minute, we will deduct one car from any lane with a positive number of cars. Next, we randomly sample to get the number of cars that arrive for each lane (consider this relevant reading) and add them to the lanes. Finally this function should return the final number of cars per lane as a python list (remember you can convert using the tolist function from the readings).

Since we are using random functions, the code checker is very lenient and accepts any reasonable response. To that end, it's pretty straightforward to get 100% on this exercise, but for your own edification please work through the problem responsibly. Once you feel you have a reasonable solution and receive 100%, feel free to Check Answer and see the staff solution. In the staff solution you will see that we seed the random number generator so that it produces the same results during testing. In practice, this line should be removed.

Your task is to make a function simulate_lanes(arrival_probs, num_lanes, num_minutes) which takes in arrival_probs and num_lanes (defined in the previous section) and num_minutes, an integer representing how long to run the simulation.

For example, if we call simulate_lanes({0: 0.5, 1: 0.3, 3: 0.2}, 2, 3), one possible outcome is as follows:

  • Minute 0: the array of cars is just [0, 0] (0 cars in each lane)
  • Minute 1: we randomly sample the number of arrivals for each lane, resulting in [0, 1], and add to the current for a result of [0, 1]
  • Minute 2: we deduct a car from any positive lane. Then randomly generate the arrivals as [0, 2] to add on for a result of [0, 2]
  • Minute 3: deduct cars. Randomly generate [0, 0] and add on for a final returned value of [0, 1].

Your code might result in different numbers of arrivals due to different random number generation, but a reasonable implementation should never produce impossible results, such as a final returned value of [0, 100] since there's no way to have a 100 cars within three minutes.

Try to attempt this problem using numpy as a means of practice, but since this is a lot of new syntax, please feel free to look at the more in-depth guide below where we break down the steps of the function and offer tips:

  1. Initializing a numpy array can be done using syntax from this reading such as numpy.array([0, 0, 0]). A slightly more general form is to use the function numpy.zeros(3) which also makes a 3 long array of zeros.

  2. As for repeating the process for each minute in num_minutes, we expect a for loop to be handy.

  3. For deducting a car from lanes with positive numbers, try running this code snippet to see what happens: a = numpy.array([0, 5, 2]) then print(a > 3) and finally print(a - (a > 3)).

  4. In order to randomly choose a number of cars for each lane, we could use something like numpy.random.choice([0, 1, 2, 3], p=[0.7, 0.2, 0.05, 0.05], size=3) from the reading. This will make a 3 long array (we want one value per lane), where each value is randomly chosen based on the arrival probabilities{0: 0.7, 1: 0.2, 2: 0.05, 3: 0.05}. You'll probably want to call the helper function get_lane_probs and then format the result so you can pass it to numpy.random.choice.

  5. Note that once you randomly generate the number of arrivals by lane, you can add the arrays together using the + sign.

  6. After repeating this for each minute, we can convert the numpy array representing the current number of cars to a list using the tolist function from the readings.

Common Errors

  • make sure to return a Python list not a numpy array.
  • because of the random generation, if instead of using numpy.random.choice([0, 1, 2, 3], p=[0.7, 0.2, 0.05, 0.05], size=3) to sample the number arriving for three lanes, you make a call to numpy.random.choose([0, 1, 2, 3], p=[0.7, 0.2, 0.05, 0.05]) three times, once for each lane, then you might get a different result. Please try using the first option, and if problems persist, please reach out via OH or email!

  No file selected

Simulations for Stats

Now that we can simulate the toll lane process, we can use this to answer some analytical questions. For example:

How many distinct cars arrive in a 20 minute simulation given arrival_probs of {0: 0.5, 1: 0.3, 3: 0.2} and 2 lanes?

To answer these types of questions, you might want to make some changes to simulate_lanes - but we'll leave the details up to you as a design choice. Below are some ideas ordered by how straightforward we feel they are. In the written exercise, you'll reflect on the design choices, but don't stress because the reflection can also be what you could improve on given more time.

  1. Copy simulate_lanes and make a specialized version for each question.
  2. Modify simulate_lanes to return lane_lines, a list of lists representing the number of cars in the lanes at each minute as we saw in the earlier exercise. Remember we can convert numpy array into lists using the tolist function from the readings.
  3. Modify simulate_lanes to also take in a starting state for the number of cars. Then you could call simulate_lanes one minute at a time and compute the relevant statistics externally.
  4. You could make simulate_lanes take in a function that you then call each minute. That function can take in the current number of cars and compute the relevant statistics.

Before throwing you to the wolves, let's go through how to approach the above example.

For this problem, if we chose option 2 and modified simulate_lanes to return lane_lines, then we could pass the result of the simulation to your num_arrivals function from the earlier exercise. Regardless of which design choice you make, we would want to run the simulation about 100 times and then take the average of the results. Also, don't forget to remove numpy.random.seed(6090) if you have its.

How many total distinct cars arrive in a 20 minute simulation given arrival_probs of {0: 0.5, 1: 0.3, 3: 0.2} and 2 lanes? Enter your answer below as a Python float:

Given arrival_probs of {1: 0.1, 5: 0.9} and 2 toll lanes, how many total cars are waiting at the tolls at any given minute during a simulation of 20 minutes? (note make sure to take the average over the 20 minutes) Enter your answer below as a Python float:

Given arrival_probs of {1: 0.1, 5: 0.9}, what is the minimum number of lanes so that on average over 20 minutes we have an average of 1 or fewer car in each lane? Enter your answer below as a Python integer:

Next Exercise: Toll Lanes: Design Reflection

Back to exercises


 
Footnotes

1This has been derived using combinatorics, specifically the Stars and Bars method (click to return to text)