Importing and preparing the data

We will be looking at data from the following countries:

  • Italy
  • Austria
  • Germany
  • Belgium
  • France
  • United Kingdom

We begin by importing the data, and adding calculating some new features so that we can compare the data from different countries. For example we calculate 'confirmed cases per 100k population', 'deaths per 100k' and 'new cases' since these are not initially in the dataset.

from covid19dh import covid19
import altair as alt
import datetime

countries = ["Italy", 
             "Austria",
             "Germany",
             "Belgium",
             "France",
             "United Kingdom",
             "Switzerland"
            ]

yesterday = datetime.date.today() - datetime.timedelta(days=1)

x, src = covid19(countries, raw=True, verbose=False, end=yesterday, cache=False)

x_small = x.loc[:, ['administrative_area_level_1', 'date', 'vaccines', 'confirmed','tests', 'recovered', 'deaths', 'population']]
x_small.rename(columns={'administrative_area_level_1': 'id'}, inplace=True)

x_small['confirmed_per'] = 100000 * x_small['confirmed'] / x_small['population']
x_small['deaths_per'] = 100000 * x_small['deaths'] / x_small['population']
x_small['ratio'] = 100 * (x_small['deaths']) / (x_small['confirmed'])
x_small['tests_per'] = 100000 * (x_small['tests']) / (x_small['population'])
x_small['vaccines_per'] = x_small['vaccines'] / x_small['population']

x_small['new_cases']=x_small.groupby('id').confirmed.diff().fillna(0)
x_small['new_cases_per']=x_small.groupby('id').confirmed_per.diff().fillna(0)

Here is a random sample of 5 rows from the dataset.

x_small.tail()
id date vaccines confirmed tests recovered deaths population confirmed_per deaths_per ratio tests_per vaccines_per new_cases new_cases_per
53599 Italy 2021-09-10 80587942.0 4596558.0 86852737.0 4338241.0 129828.0 60421760 7607.454665 214.869610 2.824461 143744.136218 1.333757 5617.0 9.296320
53600 Italy 2021-09-11 80791152.0 4601749.0 87186478.0 4344238.0 129885.0 60421760 7616.045941 214.963947 2.822514 144296.488550 1.337120 5191.0 8.591276
53601 Italy 2021-09-12 80922217.0 4606413.0 87453836.0 4349160.0 129919.0 60421760 7623.765014 215.020218 2.820394 144738.974833 1.339289 4664.0 7.719073
53602 Italy 2021-09-13 81176739.0 4609205.0 87573881.0 4353346.0 129955.0 60421760 7628.385866 215.079799 2.819467 144937.653256 1.343502 2792.0 4.620852
53603 Italy 2021-09-14 81409122.0 4613214.0 87892474.0 4360847.0 130027.0 60421760 7635.020893 215.198961 2.818577 145464.935149 1.347348 4009.0 6.635027

Plotting the data

We will first look at the total numbers of cases and deaths in each country, before moving on to cases and deaths per 100k population.

In each of the charts below, you can click on the legend to filter the lines shown

Total cases per 100,000

leg_selection = alt.selection_multi(fields=['id'], bind='legend')

alt.Chart(x_small).mark_line().encode(
    x=alt.X("yearmonthdate(date):T", axis=alt.Axis(title='Date')),
    y=alt.Y("confirmed_per:Q", axis=alt.Axis(title='Confirmed per 100k')),
    tooltip=['id', 'confirmed_per'],
    color=alt.Color('id', legend=alt.Legend(title="Countries")),
    opacity=alt.condition(leg_selection, alt.value(1), alt.value(0.2))
).add_selection(leg_selection).properties(title='Total number of cases per 100,000 population for selected European Countries', width=600).interactive()