Modeling Technique: Cluster Analysis
Legend: The Clustering in Tableau and Visualization in Python
N.Magarati
Data Source : https://github.com/nytimes/covid-19-data
Last Updated: 04/07/2020
I used a clustering in Tableau where I tried to visualize the total number of cases and the total number of deaths starting from January to date in the United States. I divided all the states into 5 clusters based on the average number of cases and the average number of deaths. I indicate each cluster with different shapes and colors. As we can see the virus has quickly grown over time and New York has the highest number of cases and death rates among all the states in April. In the month of March New York and New Jersey had higher cases and death rates.
From the above clustering, we can observe that initially the clusters were concentrated among different cities in the US. But NewJersey’s clusters started to rise exponentially eventually hitting New York the hardest so far. It could be because of the population density of New York. It is an obvious fact that COVID is more prone to cities with higher population density.
Below is the different visualization of data using python.
Fig i shows average cases of Covid recorded by different dates whereas fig ii illustrates averages death by Covid recorded by different dates. These figures are self-explanatory as to stress importance of social distancing to save lives. The curve of deaths and cases are similar.
Fig i: Plot of cases vs date
Fig ii: Plot of Deaths vs date
Fig iii: Scatterplot of Cases vs Deaths
The above scatterplot illustrates the cases and deaths, which is increasing exponentially. It is quite imperative that failure to contain Covid could result some serious rise in these numbers.