Regression and Clustering Analysis by County

Using the public data from NY Times, I ran cluster analysis and regression analysis on the data of cases and deaths amounts by county.

I answered a couple questions: What is the relationship between the amounts of cases and deaths? Which counties have it the worse?

Using KNN cluster analysis, I divided all the counties in the US into 6 clusters based on the number of cases. New York City is its own cluster since it has so many more cases than any other place, then the rest of the country is divided into 5 more clusters. Since one of the main problems we are facing is the overwhelming number of patients that require treatment in hospitals, it’s important to know how many confirmed cases of covid19 are in each county, and how severe is it. By knowing how bad each county is doing, government can assess where they need to invest most resources.

In the following graphs each point is a county, data is updated to April 7th. In the clustering graphs each cluster gets its own color, and the red dots are the centroids of each cluster:

 

I added a tool that lets the user enter which county, state they reside in, and find out how bad is the status of covid19 cases here in relation to all other counties in the US.

 

 

 

Then I ran a linear regression analysis to find the relationship between the number of cases and the number of deaths, and predict the number of deaths based on the number of cases. My model has a decent R-sqaured score, and after fixing negative predictions to become 0 (zero), it predicts the amount of deaths within less of 25 deaths of the real value on average. Following are the model measures, some predictions it made, and plot of predictions vs. real values:

:

I added a tool that lets the user enter amount of cases, and find out how many deaths are

 

I also plotted a map view of deadliness rate on a state level in order to find out how states deal with treating sick patients. In the following map, the colors represent the amount of deaths out of how many people are sick. The greener the state is – the closer it is to 0% death rate, and the redder the state is – the closer it is to 5% death rate:

 

This entry was posted in Uncategorized. Bookmark the permalink.