COVID-19 situation

Modeling Technique: Cluster Analysis

Legend: The Clustering in Tableau and Visualization in Python

N.Magarati

Data Source :  https://github.com/nytimes/covid-19-data

Last Updated: 04/07/2020

I used a clustering in Tableau where I tried to visualize the total number of cases and the total number of deaths starting from January to date in the United States. I divided all the states into 5 clusters based on the average number of cases and the average number of deaths. I indicate each cluster with different shapes and colors. As we can see the virus has quickly grown over time and New York has the highest number of cases and death rates among all the states in April. In the month of March New York and New Jersey had higher cases and death rates.

Cases and Death cluster Correlation each month

 

Cases and Deaths cluster corelation

From the above clustering, we can observe that initially the clusters were concentrated among different cities in the US. But NewJersey’s clusters started to rise exponentially eventually hitting New York the hardest so far. It could be because of the population density of New York. It is an obvious fact that COVID is more prone to cities with higher population density.

Clustering inputs and summary

Below is the different visualization of data using python.

Fig i shows average cases of Covid recorded by different dates whereas fig ii illustrates averages death by Covid recorded by different dates. These figures are self-explanatory as to stress importance of social distancing to save lives. The curve of deaths and cases are similar.

 

Fig i: Plot of cases vs date

Fig ii: Plot of Deaths vs date

Fig iii: Scatterplot of Cases vs Deaths

The above scatterplot illustrates the cases and deaths, which is increasing exponentially. It is quite imperative that failure to contain Covid could result some serious rise in these numbers.

This entry was posted in Uncategorized. Bookmark the permalink.