This is a final project for Statistics 141B in winter quarter 2017 done using Python.

Notebooks:

Introduction

Motivating Question

The question that led us to this topic is: How do big cities deal with cleaning problems, like when a mattress is left out on the street, human waste is all over the road, or a garbage can overflows?

We found that the answer to this is through street cleaning requests. Street cleaning requests are made by people in the city through phone calls, mobile applications, tweets, and more. People can specify location, type of request, and so on.

Plan of Attack

We are focusing on San Francisco for our main analysis. Some of our initial questions are:

Data

Our Primary Data Source

Street and Sidewalk Cleaning from SF OpenData that we found here. The data set contains information on 693,612 cleaning requests in San Francisco. We subsetted the data to just examine requests from January 1st 2009 to December 31st 2016 to keep counts consistent for each month. Some key variables from this data set are:

Adding Neighborhood Demographic Data

Adding demographic data to our primary data source was not as straightforward. The avaliable census data is not by neighborhood and all websites we could find that used the kind of data we want did not list their source. So we ended up scraping this website which lists demographic statistics for each neighborhood from 2015. This required a bit of cleaning due to differences between the neighborhood names in the scraped data and the cleaning requests data set.

Getting Data for SF Pride and Outside Lands

We are interested whether or not big yearly events in San Francisco are associated with more cleaning requests. In our analysis section, we learned that there are more requests in the summer, so we wanted to find two big summer events to look into. San Francisco Pride is a parade and festival held at the end of June each year to celebrate the lesbian, gay, bisexual, and transgender (LGBT) people and their allies. Outside lands is a music festival held annually at Golden Gate Park. To get data on the dates these events were held each year and the yearly attendance, we scraped Wikipedia: this page for Pride and this page for Outside Lands.

More Detail on Our Data Munging Process

Here is the notebook with the data munging exported to HTML.

Analysis and Visualizations

The HTML exported notebook containing our analysis, visualizations, and conclusions is here.


The notebook source files are available in the repository here.