Image Resizing

The questions below are due on Sunday July 07, 2024; 10:00:00 PM.
 
You are not logged in.

Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

1) Preparation

The following file contains code and other resources you will need for this exercise: u4_alternate_images.zip

This exercise will use the pillow library for loading and saving images. See this page for instructions on installing pillow (note that, depending on your setup, you may need to run pip3 instead of pip). If you have any trouble installing, just ask, and we'll be happy to help you get set up!

This exercise requires you to complete the definition of one function (and, optionally, various helper functions) in lab.py. All of your changes should be made to lab.py, which you will upload for testing at the bottom of this page. Importantly, you should not add any (more) imports to the files.

We've again provided a copy of the tests that the server will run on your code, in test.py.

2) Image Representation

Before we can get to our image manipulation task, we need a way to represent images in Python1. While digital images can be represented in myriad ways, the most common has endured the test of time: a rectangular mosaic of pixels — colored dots, which together make up the image. An image, therefore, can be defined by specifying a width, a height, and an array of pixels, each of which is a color value. This representation emerged from the early days of analog television and has survived many technology changes. While individual file formats employ different encodings, compression, and other tricks, the pixel-array representation remains central to most digital images.

For this exercise, we'll represent an image using a Python dictionary with three keys:

  • width: the width of the image (in pixels),
  • height: the height of the image (in pixels), and
  • pixels: a Python list of pixel values, each represented as a tuple of three integers, (r, g, b), where r, g, and b are all in the range [0, 255]. These values are stored in row-major order (listing the top row left-to-right, then the next row, and so on).

For example, consider this 2 \times 3 image (enlarged here for clarity):

image_sample

This image would be encoded as the following dictionary:

i = {
    'height': 3,
    'width': 2,
    'pixels': [(255, 0, 0), (39, 143, 230), (255, 191, 0), (0, 200, 0), (100, 100, 100), (179, 0, 199)],
}

3) Maps

In this exercise, we will at times want to compute and store data other than rgb colors for each pixel in an image. We define a "map" to be a dictionary with the same form as a color image above, but whose 'pixels' list contains integers rather than tuples2. This is a convenient and familiar way to store information. For example,

m = {
       'height': 3,
       'width': 2,
       'pixels': [44, 12, 0, 47, 93, 4],
    }

stores the value of 44 associated with the top-left pixel of the above image, 12 with the top-right pixel, etc. These integers could represent any information. Later in this exercise we'll use maps to store the "energy" and "cumulative energy" of each pixel in an image.

4) Seam Carving

Now we describe your task in this exercise. You should read all of the subsections in this part before implementing any code.

We're going to implement a neat effect called "seam carving3." The goal of this technique is to scale down an image while preserving the perceptually important parts (e.g., removing background but preserving subjects).

Two common approaches for resizing an image are cropping and scaling. The animation below compares these kinds of resizing on an image of two cats4, which is gradually scaled down from 300 pixels wide to 150 pixels wide. Cropping is shown on the first row; naive scaling on the second row; and seam carving on the third row.

resizing_strategies

As you can see, the first two strategies have shortcomings: cropping causes one of the cats to be almost completely deleted, and scaling distorts both cats. By focusing on removing "low-energy" (defined below) portions of the image, however, the seam carving technique manages to keep both cats almost completely intact while removing mostly background.

Here is another example, on a picture of several trees. Notice again that the trees are preserved almost exactly despite decreasing the width of the image by 75 pixels.

resizing_trees

The idea behind seam carving is that, when we remove a pixel from each row to decrease the size of an image, instead of removing straight vertical columns of pixels, we instead find and remove connected "paths" of pixels from the top to the bottom of the image, with one pixel in each row. Each time we want to decrease the horizontal size of the image by one pixel, we start by finding the connected path from top to bottom which has the minimum total "energy," and removing the pixels contained therein. To shrink the image further, we can apply this process repeatedly.

4.1) What is Energy?

As discussed, the seam carving technique prefers keeping paths which have higher energy. You can thus think of energy as importance. There are multiple ways to measure this, but we'll use the presence of edges as an indicator. That is, we'll perform edge detection on the image. A pixel's energy will be high if it lies on an edge, and low if it does not. A path's energy is the sum of the energies of all the pixels on it.

We have provided a helper function compute_energy for you in util.py. The function is explained via a docstring. It takes in a color image, performs edge detection for you, and returns a map (as defined above) containing energy values for each pixel. You need not deeply understand the function or its helper functions, but you should at least browse the code before beginning work. (More in-depth resources are linked in the comments there if you are curious.) All of the helper functions in util.py are made available to you in lab.py, and you may find them useful, so take note of them.

4.2) Outline and Description of Steps

Finding and removing the minimum-energy path is accomplished via the algorithm outlined below5.

The algorithm repeats the following steps until the width of the image reaches the desired size:

  1. Get the energy map. This simply involves calling the compute_energy function (imported from util.py) on the image.

  2. Compute a "cumulative energy map," where the value at each position is the total energy of the lowest-energy path from the top of the image to that pixel.

    For the top row, this will be equal to the top row from the energy map. Subsequent rows can be computed using the following algorithm (in pseudocode):

     For each row of the "cumulative energy map":
         For each pixel in the row:
             Set this value in the "cumulative energy map" to be:
               the value of that location in the energy map, added to the
               minimum of the cumulative energies from the "adjacent"
               pixels in the row above
    

    For our purposes, we'll consider pixels in the row above to be "adjacent" to the pixel we're considering if they are either directly above, above and one to the right, or above and one to the left, of the pixel in question. For example, the pixel marked in blue below has three "adjacent" pixels in the row above (indicated by the arrows), and the pixel marked in red (on the edge of the image) has two "adjacent" pixels in the row above:

    adjacent pixels

    (After completing this process, the value at each pixel in this "cumulative energy map" will be the total energy of the lowest-energy path from the top of the image to that location.)

  3. Find the minimum-energy seam. The minimum seam can then be found by backtracing from the bottom to the top of the cumulative energy map. First, locate the minimum value pixel in the bottom row of the cumulative energy map. This is the bottom pixel of the minimum seam. The seam is then traced back up to the top row of the cumulative energy map by following the adjacent pixels with the smallest cumulative energies.

    Ties should always be broken by preferring the left-most of the tied columns.

  4. Remove the computed seam. Finally, we need to remove the pixels in the computed seam from our original color image (and we should use this smaller result when we start back at the top of these steps).

4.3) Suggested Structure

As you can see, seam carving is a nontrivial process, so it will be helpful to break it down into smaller pieces.

We have provided skeletons for some helper functions in lab.py, which we think could provide a useful way to organize your code. You are welcome to ignore these helper functions if you wish, and structure your code however you see fit — this can be good practice in modularizing your program on your own.

If you would like to use them, our helper functions are:

  • cumulative_energy_map(energy): given an energy map, return the "cumulative energy map" as described above.
  • minimum_energy_seam(cem): given a "cumulative energy map," return a list of the indices into im['pixels'] associated with the seam that should be removed
  • image_without_seam(im, seam): given an image and a list of indices, return a new image with the associated pixels removed.

We have also provided test cases for them in the TestSeamCarvingHelpers class in test.py. These tests will not be checked on the server, but they can be helpful during debugging.

4.4) Loading, Saving, and Displaying Images

We also want to point out two functions in util.py which may be particularly helpful for debugging: load_color_image and save_color_image. Each of these functions is explained via a docstring.

You do not need to dig deeply into the code in those functions, but it is worth taking a look at their docstrings, and trying to use the functions to:

  • load an image, and
  • save it under a different filename.

You can then use an image viewer on your computer to open the new image to make sure it was saved correctly.

These functions provide a way to visualize your output, and they also make it easy to show off your cool results to friends and family.

There are several example images in the test_images directory inside the lab's code distribution. We recommend the small ones like centered_pixel.png and pattern.png for testing.

Note that you can add code for loading, manipulating, and saving images under the if __name__ == '__main__': block in lab.py, which will be executed when you run lab.py directly (but not when it is imported by the test suite).

4.5) Write It!

You're now ready to fill in the body of the seam_carving function in your lab.py so that it implements seam carving using the method described above. Both the input and output should be color images.

5) Check Your Results

Try running your code on some test images, and confirm that the result makes sense. For example, removing 100 seams from twocats.png (reducing the width by 100) should take it from this:

pattern_carved

to this:

pattern_carved

(Doing so may take a minute or so.)

Submit your lab.py below for testing once things pass locally and you're confident it works.

Please upload your code here. This question will be manually graded for correctness (passing all provided test cases) and good style.
 No file selected

6) Optional Extensions

In this exercise, you got familiar with some tools that can be quite powerful! We implemented just one interesting image manipulation, but there are many others you could also perform with the same image representation. If you'd like, here are some ideas for further exploration:

  • Navigate to the links in the comments of the util.py functions to learn about correlation and edge detection as we implemented them. There is also information about other simple filters there.

  • You could implement the opposite of seam carving: smart resizing to increase the size of an image by inserting appropriate rows at low-energy regions in the image.

  • You could implement some drawing primitives so that you can draw shapes on top of images. (You can change the color at a particular pixel in an image simply by changing the (r, g, b) tuple at the appropriate element in its 'pixels' list.) For example, circle(image, (r, g, b), x, y, radius) could draw a circle of the given size at the given location. You could implement multiple drawing primitives and use them to draw a nice new picture, or to modify an existing one.

  • You could implement various filters which change the color of an image. For example:

    • inversion (change the r, g, and b values to 255 minus themselves)
    • blurring (change each pixel to a color that is the average of itself and those around it)
    • a filter that takes spatial information into account, in addition to color information, such as a vignette or a ripple filter.
  • You could add the equivalent of "pasting" smaller images on top of larger ones (maybe also including a color that should be treated as transparent for purposes of the copy/paste, so that pixels of that color in the pasted image are not copied over).


 
Footnotes

1It turns out that the representation of digital images and colors as data is a rich and interesting topic. Have you ever noticed, for example, that your photos of purple flowers are unable to capture the vibrant purple you see in real life? This is the reason why. (click to return to text)

2map is in fact a Python keyword, and it also has other meanings in other languages. We disregard those meanings here — they are unrelated. (click to return to text)

3also called "content-aware resizing" or "retargeting" (click to return to text)

4This animation is licensed under a CC BY SA 4.0 International License. The original image of two cats, which is licensed under a CC BY 2.0 License, comes from wellflat on Flickr. (click to return to text)

5For more information, see the Wikipedia page for seam carving. (click to return to text)