February 20, 2020
The Coronavirus: Easy Visualizations for Keeping Track of this Outbreak
By: Andrew Rosa
Using Data Visualizations to Track the Coronavirus Outbreak
By now, many are aware of the dashboard produced by Johns Hopkins University on Coronavirus COVID-19. According to CityLab, the dashboard had 52 million views as of January 31st. Our data analytics team at Timmons Group decided to prototype some new visualizations based on the daily data extracts that the University provides on their GitHub repository. Our approach was to create multiple visualizations using Tableau to help communicate trends within the data.
Above is an example utilizing Tableau Stories, which allows us to combine all the dashboards into one separated by tabs at the top. Each tab is a different dashboard that focuses on a single area of interest. For discussion purposes, we have decided to embed each dashboard within this blog article. Continue reading to get more insight and to learn more about the functionality developed within each dashboard. If you'd like to view in full screen mode, a full screen button is located at the bottom right corner of each visualization.
Dashboard 1 – Confirmed Coronavirus Cases
Our first dashboard simply looks at the confirmed cases of the outbreak. The two types of maps are combined into one to show data at both the country and providence levels. The choropleth map identifies which countries contain confirmed cases of the virus. The color scale is based on a count of those cases at the country level. It ranges from a brighter gold for the lower end, to a shade of red at the higher end. We have overlaid a point layer on top of the choropleth map. The points are sized by the number of confirmed cases at the providence level. This provides the viewer an easy way to understand the total counts for the country and where the cases are located within the country. To the left of the map we are presenting the grand total of confirmed cases, and then provide a table with the totals per country. If the viewer wants to see these totals at the providence level, they can hover their mouse over the attribute title at the top of the table “Country Region” to reveal a small “+” icon. By clicking this icon, the table expands to reveal the hierarchy of country and providence. If the user would like to zoom and focus in on one country or providence on the map, they can do this by selecting one of the rows in the table. For example, by clicking “China” in the table the map will automatically zoom to China and filter out the rest of the data. Since this dashboard serves as the first tab in the Tableau story, we have provided a “last updated” date in the bottom left hand corner to inform the user of when data was last retrieved from Johns Hopkins University’s GitHub data repository.
Dashboard 2 – Confirmed Cases Over Time
The second dashboard provides multiple time series plots highlighting confirmations over time. The first chart shows the cumulative sum of confirmed cases per day. Like Johns Hopkins University’s dashboard, we have provided a time series for China and one for all other countries. The next visualization shows the rate of change per day of new confirmed cases. To the right, a bar chart time series shows the number of new cases per day. In this visual, we start to see fewer and fewer new confirmations per day from February 2nd to February 12th. We then see a huge spike on February 13th. This is due to how China has been reporting their data and certainly reminds us that methods on reporting data can raise concerns. To the left, we have provided the same table of total confirmations per country from the first dashboard. The viewer can use this to filter and adjust all other visuals in the dashboard to take a deeper dive and gain a better understanding of the numbers for specific countries or providences.
Dashboard 3 – Reported Deaths
Our third dashboard highlights the number of deaths that have been reported. A map is provided with a color scale ranging from blue on the low end to red on the higher end. Well over 95% of the deaths that have been recorded at the time of this post have happened in China. Only within the past week (second week of February) has a death related to the virus occurred outside an Asian country (in France). Apparently this specific case is related to a Chinese tourist. Below the map is a timer series showing the cumulative sum of deaths per day, and to the left is a table showing the totals per country and providence. Similar to our first two dashboards, this one has filtering capabilities based on selecting a country or providence from the table. While the Johns Hopkins University dashboard provides a total count of deaths and a very similar table, they have not provided a map visualization or time series like the one above.
Dashboard 4 – Deaths and Recoveries
The final dashboard ends on a more positive note. Here we compare the number of deaths to the number of recoveries and the trend appears positive. Our table provides the totals for deaths and recoveries as well as the ratio of deaths to recoveries. We provide two time series as well. The first shows the cumulative sums of recoveries in green and deaths in light grey. The number of recoveries is far out pacing the number of deaths. The second time series uses the death to recovery ratio to show this change over time. Here, the color is determined by whether the ratio for the day was above 1 in red (marking more deaths than recoveries), and purple for below 1 (marking more recoveries than deaths). In this visualization, we can see large spikes back in January when the world was just learning of the virus. Since the beginning of February, the trend has become encouraging with the ratio moving closer to zero.
Dashboard Walkthrough with Data Scientist Andrew Rosa
Johns Hopkins University has done a lot of hard work to bring the data together from multiple sources. Our dashboard feeds off the data they have already aggregated, and we have identified some opportunities. The most glaring opportunity with the data comes from their Providence Attribute. While many of the entries in this field are either a providence or a state, some are cities such as Boston, MA. We will be looking to clean up some of the data deficiencies in our ETL (Extract, Transform, Load) pipeline within the near future. Our current ETL pipeline downloads the data in the form of multiple spreadsheet files from Johns Hopkins University’s GitHub repository, calculates additional features, and then loads the data into a relational database. Before we load into our database, we’ll be able to do additional data cleansing. We will also look to enhance the dashboards with more data, such as population data, and other data that directly relates to the outbreak, such as counts of ages and gender of those who have been infected.
Our data analytics team at Timmons Group has really enjoyed iterating on and prototyping visualizations and dashboards using the regularly updated Coronavirus COVID-19 data. Our team has many other examples of compelling analytics, visualizations, dashboards, data discoveries, and applications across many industries that we’d love to share. If you are interested in hearing more, please email us for more information.