Ratings Analysis
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.
Your mentor at Office Shack was so impressed with the functions that
you wrote in the previous problem that they asked you
to do some analysis on the large office_products.csv
dataset that was provided
in the resources folder. They provided some questions for you to investigate in
the ratings_analysis.ipynb
Jupyter Notebook. Because the actual dataset is
quite large, we'd encourage you to answer the questions using the
medium_office_products.csv
. Once you have
gotten all the expected answers on the medium dataset, switch to the
review in office_products.csv
and check your answers on this page. Finally,
upload your jupyter notebook at the end to receive some manually graded feedback.
In order to get full credit for this assignment, you will need to submit correct answers on this page, upload your Jupyter notebook containing the code (which will be graded manually based on completeness and style).
Notes:
- Any strings submitted in the boxes below must include quotation marks!
- All questions assume that you have parsed the data to only consider unique reviews. A review is unique if there are no other reviews with the same product id, user id, rating and timestamp in the database.
- While these problems are intended to be open-ended (meaning you can use pandas or whatever library / method you want to get your answers), it is very possible to solve them using only what we have learned about so far in this course (the functions you wrote in the last assignment in particular should be useful).
- If you are getting stuck on how to solve a problem (or how to solve it efficiently), we highly encourage you to email us and/or come to office hours!
1) Question Submission
Submit your answers to the questions below to check for correctness and receive credit.- How many items (unique products) are in the data set?
- How many unique users are in the data set?
- How many unique reviews are in the data set?
- What item has the most reviews? How many reviews does this item have? What is the item's average rating? Enter your answer as a tuple containing (item, item total, item mean)
- What item has the most 1 star reviews? How many reviews does this item have? What is the item's average rating? How many 1 star reviews does it have? Enter your answer as a tuple containing (item, item total, item mean, 1 stars)
- What item with at least 1,000 ratings has the lowest rating average? How many reviews does this item have? What is the item's average rating? Enter your answer as a tuple containing (item, item total, item mean)
- What user has given the most reviews? How many reviews have they given? What is their average rating? Print your answer as a tuple containing (user, total ratings, mean rating)
Note that questions 8-10 will be manually graded.
2) Python notebook upload