— Submission to UCLA's Datafest 2020

Team: Sanjana Giduthuri, Valeria Morales

Scope: 2 weeks, May 1 - May 15

Tools Used: Tableau, Excel


The Competition

UCLA's Datafest is an annual data science competition sponsored by the American Statistical Association. It brings together students from schools across Southern California to use data to answer tough questions, developing their analytical skills in the process. Typically, it occurs over two days, but because of COVID-19, it was changed to a two-week virtual program.

Instead of providing a specific scenario and relevant data, this year, Datafest centered around the broader theme of COVID-19. We were challenged to find any question related to the pandemic, and find and analyze relevant data to answer it. The vague prompt gave us creative freedom to explore an aspect of COVID-19 we were personally interested in.

Our Focus

After exploring the initial resources provided for the competition, our team came across Google's COVID-19 Community Mobility Reports, which uses Google Maps data to track changes in people's movement due to the pandemic. The reports show the percentage change in mobility rates for different categories within regions — for example, a 19% increase in visits to grocery & pharmacy stores.

We thought that these changes in mobility rates were great proxies for how well people were social distancing. Since the data for the U.S. was collected by county, we thought this granularity could lead to interesting insights if we connected them to COVID-19 rates by county. In particular, we wanted to examine if decreases in mobility rates were correlated to lower COVID-19 infection rates, which would indicate the effectiveness of social distancing.

Data Preparation

Collecting and Cleaning the Data

For the mobility data, we downloaded the United States dataset from Google's COVID-19 Mobility Reports and extracted the data for Califonia in Excel. Then, we downloaded coronavirus infection data as well as population data from USA Facts and used R to calculate the infection rate by county. We also utilized economic data from the USDA to add columns containing each county's median household income and its percentage to the state median income, which we thought could show interesting associations between socioeconomic status, COVID-19 rates, and social distancing rates. Ultimately, this was our final dataset.