Monthly Archives: April 2020

Covid-19 Current Situation.

  This data visualization is to show the total confirmed cases of corona virus global. It shows cases for each country which helps to know how much each country is affected due to corona outbreak. I want to help by … Continue reading

More Galleries | Comments Off on Covid-19 Current Situation.

Covid 19 # dash tableau Marcin

BARUCH: DASHBOARD cis 4710,PDF MO

Posted in Uncategorized | Comments Off on Covid 19 # dash tableau Marcin

Covid 19

This gallery contains 2 photos.

The statistical technique used is cluster analysis. Cluster analysis is a multivariate statistical technique that groups observations on the basis of features or variables they are described by. The goal of the problem was to maximize the similarity of observation … Continue reading

More Galleries | Comments Off on Covid 19

More Galleries | Comments Off on

COVID-19 situation

Modeling Technique: Cluster Analysis

Legend: The Clustering in Tableau and Visualization in Python

N.Magarati

Data Source :  https://github.com/nytimes/covid-19-data

Last Updated: 04/07/2020

I used a clustering in Tableau where I tried to visualize the total number of cases and the total number of deaths starting from January to date in the United States. I divided all the states into 5 clusters based on the average number of cases and the average number of deaths. I indicate each cluster with different shapes and colors. As we can see the virus has quickly grown over time and New York has the highest number of cases and death rates among all the states in April. In the month of March New York and New Jersey had higher cases and death rates.

Cases and Death cluster Correlation each month

 

Cases and Deaths cluster corelation

From the above clustering, we can observe that initially the clusters were concentrated among different cities in the US. But NewJersey’s clusters started to rise exponentially eventually hitting New York the hardest so far. It could be because of the population density of New York. It is an obvious fact that COVID is more prone to cities with higher population density.

Clustering inputs and summary

Below is the different visualization of data using python.

Fig i shows average cases of Covid recorded by different dates whereas fig ii illustrates averages death by Covid recorded by different dates. These figures are self-explanatory as to stress importance of social distancing to save lives. The curve of deaths and cases are similar.

 

Fig i: Plot of cases vs date

Fig ii: Plot of Deaths vs date

Fig iii: Scatterplot of Cases vs Deaths

The above scatterplot illustrates the cases and deaths, which is increasing exponentially. It is quite imperative that failure to contain Covid could result some serious rise in these numbers.

Posted in Uncategorized | Comments Off on COVID-19 situation

State Effectiveness at Treating COVID-19

Problem Statement:

How effective has each U.S. state been at treating patients diagnosed with the coronavirus? This question is important because each state has different resources and receives varying levels of government funding. If more people in one state recover from the illness compared to the people living in another state, studying the variables that might have led to this has the potential to increase the nationwide recovery rate. There are many variables that would contribute to state recovery rate, like population density, number of hospitals, number of resources, government funding, geographical features, average diet, average level of physical exercise, and so on. It will be important to study these factors. However, the first step is to study the number of cases and deaths by state over time.

Link to download: 

https://github.com/paulmarinos/covid-19/blob/master/covid-states.py

Objectives: 

  • Analyze the number of COVID-19 cases and deaths by U.S. state over time
  • Calculate COVID-19 state mortality rate
  • Create a tool that will quickly visualize the data using tables and charts
  • Allow user to search by either state or date and sort data by column
  • Allow user to plot the data and fit a linear regression model
  • Calculate the R-squared value

Libraries used:

  • pandas for data preparation
  • numpy for data manipulation
  • matplotlib for data visualization
  • sklearn for statistical modeling

R-squared value meaning:

  • The r-squared value, or correlation of determination, explains how closely correlated the dependent value (cases) is with the independent value (deaths). As the number of cases increases, the number of deaths to the virus should increase as well. If the cases and deaths of a state do not have a linear relationship then the way the state is treating patients infected with the disease should be studied in order to help explain why, so that they can modify their treatment practices if its recovery rate is subpar or so that other states can see if they can emulate their treatment practices if the state has a higher recovery rate / mortality rate.

Datasource (updated daily):

Screenshots (Viewing Data):

search by state: New York

search by state to view date, cases, deaths, and mortality rate

search by date to view state cases, deaths, and mortality rate for a given day

search by date to view state cases, deaths, and mortality rate up to a given day

Sort by Mortality Rate

Sort by Mortality Rate

Screenshots (Plotting and Performing Linear Regression): 

Create Scatterplot and perform Linear Regression

Create Scatterplot and perform Linear Regression

Calculate R-Squared Value

Calculate R-Squared Value

 

Screenshots of Python Code:

Header

Header

Import Libraries

Import Libraries

Use Pandas for Data Preparation

Use Pandas for Data Preparation

Create Functions to Filter Data Frame

Create Functions to Filter Data Frame

Create function to give option to sort by column

Create function to give option to sort by column

Plot Regression

Plot Linear Regression

Create Options Menu for the User

Create Options Menu

Start the Program

Call Main Function to Start the Program

 

Posted in Uncategorized | Comments Off on State Effectiveness at Treating COVID-19

Regression and Clustering Analysis by County

Using the public data from NY Times, I ran cluster analysis and regression analysis on the data of cases and deaths amounts by county.

I answered a couple questions: What is the relationship between the amounts of cases and deaths? Which counties have it the worse?

Using KNN cluster analysis, I divided all the counties in the US into 6 clusters based on the number of cases. New York City is its own cluster since it has so many more cases than any other place, then the rest of the country is divided into 5 more clusters. Since one of the main problems we are facing is the overwhelming number of patients that require treatment in hospitals, it’s important to know how many confirmed cases of covid19 are in each county, and how severe is it. By knowing how bad each county is doing, government can assess where they need to invest most resources.

In the following graphs each point is a county, data is updated to April 7th. In the clustering graphs each cluster gets its own color, and the red dots are the centroids of each cluster:

 

I added a tool that lets the user enter which county, state they reside in, and find out how bad is the status of covid19 cases here in relation to all other counties in the US.

 

 

 

Then I ran a linear regression analysis to find the relationship between the number of cases and the number of deaths, and predict the number of deaths based on the number of cases. My model has a decent R-sqaured score, and after fixing negative predictions to become 0 (zero), it predicts the amount of deaths within less of 25 deaths of the real value on average. Following are the model measures, some predictions it made, and plot of predictions vs. real values:

:

I added a tool that lets the user enter amount of cases, and find out how many deaths are

 

I also plotted a map view of deadliness rate on a state level in order to find out how states deal with treating sick patients. In the following map, the colors represent the amount of deaths out of how many people are sick. The greener the state is – the closer it is to 0% death rate, and the redder the state is – the closer it is to 5% death rate:

 

Posted in Uncategorized | Comments Off on Regression and Clustering Analysis by County

Covid19

Modeling Technique: Cluster Analysis

Legend: The clusters according to location separated by new cases, new deaths and total cases and total deaths.

David Huang

Data Source: https://ourworldindata.org/coronavirus-source-data

Last Updated: 4/8/2020

 

The problem I am trying to solve is how much has the new cases and new deaths differed in particular locations since the outbreak began or in this case the total. As there are new ways for governments to neutralize this outbreak are they effective? This question will allow us to see if the measures taken so far have any impact and if they don’t then there should be new ideas and implementations to solve this issue. In this case the ones with the least amount of cases with coronavirus are Germany, France and the UK. Their cases are a lot less than US and China. Therefore I suggest US and China to try to implement similar measures as these lower case locations. Or, US and China can implement or devise new strategies to combat this virus.

Posted in Uncategorized | Comments Off on Covid19

COVID-19 Deaths And Lifestyle

This gallery contains 6 photos.

In the following linear regression models, I tried to determine if there was any correlation between COVID19 deaths and an unhealthy lifestyle. I gathered data on COVID-19 deaths up until April 7th, as well as 2018 lifestyle data from the … Continue reading

More Galleries | Comments Off on COVID-19 Deaths And Lifestyle

Kareem Wright Quiz

I’ve attached a link to my blogs@baruch site with my response to the quiz.

Covid-19 update and May 1st projections

 

Posted in Uncategorized | Comments Off on Kareem Wright Quiz

Underlying Conditions Analysis

This gallery contains 6 photos.

With reports about COVID coming out every hour declaring the death count, it is important to be informed about all the composites that affect the deadliness of this virus. This report will answer the question: “For whom COVID is the … Continue reading

More Galleries | Comments Off on Underlying Conditions Analysis

Covid-19 Visualization

Name: A.Yee

Legend: shown in the graph

Modeling Technique: Regression in Tableau and Time Series Bar Graph in Python

Data Source: https://github.com/nytimes/covid-19-data

Last Updated: April 1st, 2020

The question that I am trying to answer is to figure out how quickly can the virus grow over time. This question is important because if we can figure how much it can grow within a few days, we can easily figure out a way to control the growth of the virus for the next following weeks. As you can see from the tableau dashboard, it made a prediction that there will be more cases and more deaths. This can help the city and the people to understand how to quickly protect themselves even more in order to control it. In the Python graphs, I made a comparison between the deaths and the cases, you can see that there are less death. It can help people to understand that even though there may be less death than confirmed cases, people should still be careful with the virus due to how quickly it can spread.

Posted in Uncategorized | Comments Off on Covid-19 Visualization

COVID Visualisation and Analysis: Focusing on R0

COVID Visualisation and Analysis:
Focusing on R0

Author: Francis Yuan

Data Last Updated: 1 April 2020

Estimation of R0 for COVID-19

The R0 index for a disease in a given community is defined as the expected secondary infections generated by one case. Epidemiologists use this value to quantify the transmissibility of an infectious disease. For a given disease like COVID-19, R0 may vary across regions, due to differences in demographic and socioeconomic factors. Though R0 doesn’t necessarily indicate the spreading rate of diseases partly because lifespans of diseases vary (for example, R0 indices of AIDS and the common cold are in the same range, but their spreading patterns are vastly different), R0 indices for a given disease in different regions may be a good indicator for spreading rates.

Based on The New York Times’ COVID-19 database, I used an R toolbox to calculate R0 for each state in the US.

Figure 1: R0 and Total Cases of States in the US (click me to get the interactive version of this plot, same for plots below)

We can see that New Jersey and New York, which have the most cases, are among the highest in terms of R0. Interestingly, R0 is very high in Idaho and Indiana; it may be worthwhile to dig into this, which I haven’t been able to do yet.

Threat Analysis

Though almost everyone is susceptible to COVID-19, the elderly are particularly vulnerable: The fatality rate is much higher for patients above 65, and those patients are also more likely to develop severe symptoms. Moreover, a major threat this disease is posing is the pressure on the healthcare system: If the outbreak overwhelms the healthcare system, the fatality rate will sky-rocket and the social consequences will also be severe, which is happening in Italy and Spain.

To do a threat analysis, I acquired demographic data from the US Census Bureau and hospital capacity data from the National Center for Health Statistics. Combined with the R0 estimates, I was able to perform a threat analysis. The idea is: a community is at greater threat if COVID-19 is spreading faster (higher R0), if it has more elderly persons (higher percentage of persons above 65), or if its healthcare capacity is lower (fewer hospital beds per 1,000 people).

Based on the three variables, I performed a cluster analysis in Tableau. Fifty-one regions in the US (50 states plus DC) were divided into three clusters, which were manually labeled as high threat, medium threat, or moderate threat. For example, Florida has a large portion of senior population, and COVID-19 is spreading faster, so it’s classified as facing a high threat; the DC has lots of hospital beds and the spread is slower, which means the risk is relatively moderate.

Figure 2: COVID-19 Threat to States in the US


Figure 3: Map of COVID-19 Threat Level

This analysis may be able to provide some information about where medical resources are most needed and it may also be of use when state governments are evaluating their response to the disease.

Digging into the Variation of R0

It can be noticed that R0 has great variations across states, which makes it interesting to look into the causes of this variation. To do this, I did several regressions in Python (code available upon request).

A natural idea would be that R0 may be related to population density, so I did a univariate regression.

Dependent Variable: Estimated R0
Coefficient P-Value
Const 1.9998 0
Population per square mile  (2010) -0.000003 0.8735
Observations 49

(Note: Population density data from the US Census Bureau.)

In the regression above, the coefficient of population density is by no means significant. Statistically, there’s no relation between population density and R0, which is a surprise. At first, this result makes me wonder if my estimation of R0 is entirely wrong, but then I did another regression and it started to make sense.

Dependent Variable: Estimated R0
Coefficient P-value
Const 2.441 0
Vehicles per 1000 (2017) -0.0005 0.0404
Observations 49

(Note: Vehicle ownership data from Wikipedia.)

This regression shows that the more vehicles people own, the slower COVID-19 spreads, which is significant at the 5% level. Though the coefficient is small, the measure for vehicle ownership ranges from 539 to 1140, which can translate into a considerable difference of 0.25 in R0. This result makes sense because if people have their own cars and often travel in them, they’ll get in touch with fewer others, hence slower disease transmission rate.

I also noticed response to the disease is very different across states, and it occurred to me that a part of R0’s variation might be explained by the political affiliation of states. Therefore, I added 2016 presidential election results into the regression.

Dependent Variable: Estimated R0
Coefficient P-Value
const 2.535042 0
Vehicles per 1000 (2017) -0.0007 0.006
GOPWon 0.1256 0.031
Observations 49

(Note: GOPWon equals one if the republicans won in this state in 2016, zero otherwise. Data from GitHub.)

After adding this new variable, the coefficient for vehicle ownership became more significant, while the new variable itself is also significant at the 5% level. This suggests that in red states, COVID-19 spreads faster. Interpretation of this result should be done with care, as this result may well be driven by demographic or socioeconomic factors that are related to party support (i.e., adding those factors into the regression would make GOPWon insignificant).

Disclaimer

This passage is for information purposes only.

Analytical results in this passage are only preliminary. The validity of analytic methods used is not rigorously verified and this passage is subject to revisions. The author is by no means accountable for any conclusions drawn from this passage.

Posted in Uncategorized | Comments Off on COVID Visualisation and Analysis: Focusing on R0

COVID-19 Clustering – Rachel Abreu

Author: Rachel Abreu

Date Last Updated: April 5, 2020

The question I wanted to answer is: Which state has the most deaths due to COVID-19?Using clustering in Tableau, I was able to visualize this data (https://github.com/nytimes/covid-19-data/blob/master/us-states.csv). As you can see in this graph, New York has the most deaths due to COVID-19. I believe this question is important because it brings attention to which state needs the most help. Right now, New York needs the most assistance being that many deaths are occurring and continue to occur as we speak. This data also encourages people, in particular New Yorkers, to stay home and practice self-isolation.

Posted in Uncategorized | Comments Off on COVID-19 Clustering – Rachel Abreu

Hello world!

Thank you for using Blogs@Baruch!

This site has been established for  CIS 4170 Data Visualization students to apply various approaches, concepts, and principles to visualizing data that they have been studying for the purpose of gaining deeper insights into the 2020 pandemic.

The assignment is to mine data specific to the pandemic, use visualization methodologies to convert data into information, and then write a brief analytical commentary that transforms the information they have generated into knowledge.  This process is aligned with the objectives of the class.

Posted in Uncategorized | 1 Comment