Note: If you just would like to see the graphs, just use the link in the table of contents to go to the last sections!

UK Covid19 API

In this post we will explore the data found in the UK governments Covid API (the Python version), which can be found on their website. Lets import the Python module first and checkout the main function.

from uk_covid19 import Cov19API

import pandas as pd
import altair as alt

Collapse the following output to see the help documents for Cov19API. It tells us that it is a class with the parameters filters (a list of strings), structure (a dictionary with a str key and dict or str value) and latest_by (a str or None). The class also has a method called get_dataframe() which will return the data as a Pandas DataFrame.

help(Cov19API)

Help on class Cov19API in module uk_covid19.api_interface:

class Cov19API(builtins.object)
 |  Cov19API(filters: Iterable[str], structure: Dict[str, Union[dict, str]], latest_by: Union[str, NoneType] = None)
 |  
 |  Interface to access the API service for COVID-19 data in the United Kingdom.
 |  
 |  Parameters
 |  ----------
 |  filters: Iterable[str]
 |      API filters. See the API documentations for additional
 |      information.
 |  
 |  structure: Dict[str, Union[dict, str]]
 |      Structure parameter. See the API documentations for
 |      additional information.
 |  
 |  latest_by: Union[str, None]
 |      Retrieves the latest value for a specific metric. [Default: ``None``]
 |  
 |  Methods defined here:
 |  
 |  __init__(self, filters: Iterable[str], structure: Dict[str, Union[dict, str]], latest_by: Union[str, NoneType] = None)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__ = __str__(self)
 |  
 |  __str__(self)
 |      Return str(self).
 |  
 |  get_csv(self, save_as=None) -> str
 |      Provides full data (all pages) in CSV.
 |      
 |      .. warning::
 |      
 |          Please make sure that the ``structure`` is not hierarchical as
 |          CSV outputs are defined as 2D tables and as such, do not support
 |          hierarchies.
 |      
 |      Parameters
 |      ----------
 |      save_as: Union[str, None]
 |          If defined, the results will (also) be saved as a
 |          file. [Default: ``None``]
 |      
 |          The value must be a path to a file with the correct
 |          extension -- i.e. ``.csv`` for CSV).
 |      
 |      Returns
 |      -------
 |      str
 |      
 |      Raises
 |      ------
 |      ValueError
 |          If the structure is nested.
 |      
 |      Examples
 |      --------
 |      >>> filters = ["areaType=region"]
 |      >>> structure = {
 |      ...     "name": "areaName",
 |      ...     "newCases": "newCasesBySpecimenDate"
 |      ... }
 |      >>> data = Cov19API(
 |      ...     filters=filters,
 |      ...     structure=structure,
 |      ...     latest_by='newCasesBySpecimenDate'
 |      ... )
 |      >>> result = data.get_csv()
 |      >>> print(result)
 |      name,newCases
 |      East Midlands,0
 |      ...
 |  
 |  get_dataframe(self)
 |      Provides the data as as ``pandas.DataFrame`` object.
 |      
 |      .. versionadded:: 1.2.0
 |      
 |      .. warning::
 |      
 |          The ``pandas`` library is not included in the dependencies of this
 |          library and must be installed separately.
 |      
 |      Returns
 |      -------
 |      DataFrame
 |      
 |      Raises
 |      ------
 |      ImportError
 |          If the ``pandas`` library is not installed.
 |  
 |  get_json(self, save_as: Union[str, NoneType] = None, as_string: bool = False) -> Union[dict, str]
 |      Provides full data (all pages) in JSON.
 |      
 |      Parameters
 |      ----------
 |      save_as: Union[str, None]
 |          If defined, the results will (also) be saved as a
 |          file. [Default: ``None``]
 |      
 |          The value must be a path to a file with the correct
 |          extension -- i.e. ``.json`` for JSON).
 |      
 |      as_string: bool
 |          .. versionadded:: 1.1.4
 |      
 |          If ``False`` (default), returns the data as a dictionary.
 |          Otherwise, returns the data as a JSON string.
 |      
 |      Returns
 |      -------
 |      Union[Dict, str]
 |      
 |      Examples
 |      --------
 |      >>> filters = ["areaType=region"]
 |      >>> structure = {
 |      ...     "name": "areaName",
 |      ...     "newCases": "newCasesBySpecimenDate"
 |      ... }
 |      >>> data = Cov19API(
 |      ...     filters=filters,
 |      ...     structure=structure,
 |      ...     latest_by='newCasesBySpecimenDate'
 |      ... )
 |      >>> result = data.get_json()
 |      >>> print(result)
 |      {'data': [{'name': 'East Midlands', 'newCases': 0}, ... }
 |  
 |  get_xml(self, save_as=None, as_string=False) -> xml.etree.ElementTree.Element
 |      Provides full data (all pages) in XML.
 |      
 |      Parameters
 |      ----------
 |      save_as: Union[str, None]
 |          If defined, the results will (also) be saved as a
 |          file. [Default: ``None``]
 |      
 |          The value must be a path to a file with the correct
 |          extension -- i.e. ``.xml`` for XML).
 |      
 |      as_string: bool
 |          .. versionadded:: 1.1.4
 |      
 |          If ``False`` (default), returns an ``ElementTree``
 |          object. Otherwise, returns the data as an XML string.
 |      
 |      Returns
 |      -------
 |      xml.etree.ElementTree.Element
 |      
 |      Examples
 |      --------
 |      >>> from xml.etree.ElementTree import tostring
 |      >>> filters = ["areaType=region"]
 |      >>> structure = {
 |      ...     "name": "areaName",
 |      ...     "newCases": "newCasesBySpecimenDate"
 |      ... }
 |      >>> data = Cov19API(
 |      ...     filters=filters,
 |      ...     structure=structure,
 |      ...     latest_by='newCasesBySpecimenDate'
 |      ... )
 |      >>> result_xml = data.get_xml()
 |      >>> result_str = tostring(result_xml, encoding='unicode', method='xml')
 |      >>> print(result_str)
 |      <document>
 |          <data>
 |              <name>East Midlands</name>
 |              <newCases>0</newCases>
 |          </data>
 |          ...
 |      </document>
 |  
 |  head(self)
 |      Request header for the given input arguments (``filters``,
 |      ``structure``, and ``lastest_by``).
 |      
 |      Returns
 |      -------
 |      Dict[str, str]
 |      
 |      Examples
 |      --------
 |      >>> filters = ["areaType=region"]
 |      >>> structure = {
 |      ...     "name": "areaName",
 |      ...     "newCases": "newCasesBySpecimenDate"
 |      ... }
 |      >>> data = Cov19API(
 |      ...     filters=filters,
 |      ...     structure=structure,
 |      ...     latest_by='newCasesBySpecimenDate'
 |      ... )
 |      >>> head = data.head()
 |      >>> print(head)
 |      {'Cache-Control': 'public, max-age=60', 'Content-Length': '0',
 |       ...
 |      }
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  get_release_timestamp() -> str
 |      :staticmethod:
 |          Produces the website timestamp in GMT.
 |      
 |      .. versionadded:: 1.2.0
 |      
 |      This property supplies the website timestamp - i.e. the time at which the data
 |      were released to the API and by extension the website. Please note that there
 |      will be a difference between this timestamp and the timestamp produced using
 |      the ``last_update`` property. The latter signifies the time at which the data
 |      were deployed to the database, not the time at which they were released.
 |      
 |      .. note::
 |      
 |          The output is extracted from the header and is accurate to
 |          the miliseconds.
 |      
 |      .. warning::
 |      
 |          The ISO-8601 standard requires a ``"Z"`` character to be added
 |          to the end of the timestamp. This is a timezone feature and is
 |          not recognised by Python's ``datetime`` library. It is, however,
 |          most other libraries; e.g. ``pandas``. If you wish to parse the
 |          timestamp using the the ``datetime`` library, make sure that you
 |          remove the trailing ``"Z"`` character.
 |          
 |      Returns
 |      -------
 |      str
 |          Timestamp, formatted as ISO-8601.
 |      
 |      Examples
 |      --------
 |      >>> release_timestamp = Cov19API.get_release_timestamp()
 |      >>> print(release_timestamp)
 |      2020-08-08T15:00:09.977840Z
 |      
 |      >>> from datetime import datetime
 |      >>> release_timestamp = Cov19API.get_release_timestamp()
 |      >>> parsed_timestamp = datetime.fromisoformat(release_timestamp.strip("Z"))
 |      >>> print(parsed_timestamp)
 |      2020-08-08 15:00:09
 |  
 |  options()
 |      :staticmethod:
 |          Provides the options by calling the ``OPTIONS`` method of the API.
 |      
 |      Returns
 |      -------
 |      dict
 |          API options.
 |      
 |      Examples
 |      --------
 |      >>> from pprint import pprint
 |      >>> options = Cov19API.options()
 |      >>> pprint(options)
 |      {'info': {'description': "Public Health England's Coronavirus Dashboard API",
 |       'title': 'Dashboard API',
 |       'version': '1.0'},
 |       'openapi': '3.0.1',
 |        ...
 |      }
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  api_params
 |      :staticmethod:
 |          API parameters, constructed based on ``filters``, ``structure``,
 |          and ``latest_by`` arguments as defined by the user.
 |      
 |      Returns
 |      -------
 |      Dict[str, str]
 |  
 |  last_update
 |      :property:
 |          Produces the timestamp for the last update in GMT.
 |      
 |      This property supplies the API time - i.e. the time at which the data were
 |      deployed to the database. Please note that there will always be a difference
 |      between this time and the timestamp that is displayed on the website, which may
 |      be accessed via the ``.get_release_timestamp()`` method. The website timestamp
 |      signifies the time at which the data were release to the API, and by extension
 |      the website.
 |      
 |      .. note::
 |      
 |          The output is extracted from the header and is accurate to
 |          the second.
 |          
 |      .. warning::
 |      
 |          The ISO-8601 standard requires a ``"Z"`` character to be added
 |          to the end of the timestamp. This is a timezone feature and is
 |          not recognised by Python's ``datetime`` library. It is, however,
 |          most other libraries; e.g. ``pandas``. If you wish to parse the
 |          timestamp using the the ``datetime`` library, make sure that you
 |          remove the trailing ``"Z"`` character.
 |      
 |      Returns
 |      -------
 |      str
 |          Timestamp, formatted as ISO-8601.
 |      
 |      Examples
 |      --------
 |      >>> filters = ["areaType=region"]
 |      >>> structure = {
 |      ...     "name": "areaName",
 |      ...     "newCases": "newCasesBySpecimenDate"
 |      ... }
 |      >>> data = Cov19API(
 |      ...     filters=filters,
 |      ...     structure=structure,
 |      ...     latest_by='newCasesBySpecimenDate'
 |      ... )
 |      >>> timestamp = data.last_update
 |      >>> print(timestamp)
 |      2020-07-27T20:29:16.000000Z
 |      
 |      >>> from datetime import datetime
 |      >>> parsed_timestamp = datetime.fromisoformat(timestamp.strip("Z"))
 |      >>> print(parsed_timestamp)
 |      2020-07-27 20:29:16
 |  
 |  total_pages
 |      :property:
 |          Produces the total number of pages for a given set of
 |          parameters (only after the data are requested).
 |      
 |      Returns
 |      -------
 |      Union[int, None]
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __annotations__ = {'_last_update': typing.Union[str, NoneType], '_tota...
 |  
 |  endpoint = 'https://api.coronavirus.data.gov.uk/v1/data'
 |  
 |  release_timestamp_endpoint = 'https://api.coronavirus.data.gov.uk/v1/t...

So now we need to define two things: the filters and the structure.

Data Filters

The filter tells the API what kind of area we would like data about. Valid values for the filters are:

List of valid filters
areaType
Area type as string
areaName
Area name as string
areaCode
Area Code as string
date
Date as string [YYYY-MM-DD]

We must specify the areaType, so we will set it to nation. This will give us the data on the country level - so the total data for Wales, Scotland, Northen Ireland and England.

filter_all_nations = [
    "areaType=nation"
]
filter_all_uk = [
    "areaType=overview"
]

Other options for areaType will give:

  • overview overview data for the UK
  • region Region data (regions for England only)
  • nhsregion NHS region data (only England)
  • utla Upper-tier local authority data (Again, only England)
  • ltla Lower-tier local authority data (...only England)

Data Structure

The structure parameter describes what metrics we want the data to describe. There are a lot of them, but the main metrics are areaName, date and newCasesByPublishDate. Click the arrow below to expand the full list of valid metrics.

See a list of valid metrics for structure
areaType
Area type as string
areaName
Area name as string
areaCode
Area Code as string
date
Date as string [YYYY-MM-DD]
hash
Unique ID as string
newCasesByPublishDate
New cases by publish date
cumCasesByPublishDate
Cumulative cases by publish date
cumCasesBySpecimenDateRate
Rate of cumulative cases by publish date per 100k resident population
newCasesBySpecimenDate
New cases by specimen date
cumCasesBySpecimenDateRate
Rate of cumulative cases by specimen date per 100k resident population
cumCasesBySpecimenDate
Cumulative cases by specimen date
maleCases
Male cases (by age)
femaleCases
Female cases (by age)
newPillarOneTestsByPublishDate
New pillar one tests by publish date
cumPillarOneTestsByPublishDate
Cumulative pillar one tests by publish date
newPillarTwoTestsByPublishDate
New pillar two tests by publish date
cumPillarTwoTestsByPublishDate
Cumulative pillar two tests by publish date
newPillarThreeTestsByPublishDate
New pillar three tests by publish date
cumPillarThreeTestsByPublishDate
Cumulative pillar three tests by publish date
newPillarFourTestsByPublishDate
New pillar four tests by publish date
cumPillarFourTestsByPublishDate
Cumulative pillar four tests by publish date
newAdmissions
New admissions
cumAdmissions
Cumulative number of admissions
cumAdmissionsByAge
Cumulative admissions by age
cumTestsByPublishDate
Cumulative tests by publish date
newTestsByPublishDate
New tests by publish date
covidOccupiedMVBeds
COVID-19 occupied beds with mechanical ventilators
hospitalCases
Hospital cases
plannedCapacityByPublishDate
Planned capacity by publish date
newDeaths28DaysByPublishDate
Deaths within 28 days of positive test
cumDeaths28DaysByPublishDate
Cumulative deaths within 28 days of positive test
cumDeaths28DaysByPublishDateRate
Rate of cumulative deaths within 28 days of positive test per 100k resident population
newDeaths28DaysByDeathDate
Deaths within 28 days of positive test by death date
cumDeaths28DaysByDeathDate
Cumulative deaths within 28 days of positive test by death date
cumDeaths28DaysByDeathDateRate
Rate of cumulative deaths within 28 days of positive test by death date per 100k resident population

We will look at new cases by publish date and new deaths by death date, so the structure will look like this

structure_cases_death = {
    "date": "date",
    "areaName": "areaName",
    "newCases": "newCasesByPublishDate",
    "cumCases": "cumCasesBySpecimenDate",
    "cumCasesRate": "cumCasesBySpecimenDateRate",
    "newDeaths": "newDeathsByDeathDate"
}

Pulling and cleaning data

Now we create the class and get the DataFrame from it. We also use fillna(0) to fill any entries that are NaN's - because that is the default if a value is missing.

uk_cases = Cov19API(filters=filter_all_nations,
                    structure=structure_cases_death).get_dataframe().fillna(0)

uk_cases['date'] = pd.to_datetime(uk_cases['date'], format='%Y-%m-%d')
uk_cases.sort_values(['areaName', 'date'], inplace=True)
uk_cases.reset_index(drop=True, inplace=True)

Note that the Welsh Government announced that 11,000 cases were missing from between the 9th and 15th of December. This explains the large spike after the 17th of December, and also the decrease in cases before that. See this BBC article and relevant announcement by Public Health Wales about how they are changing the way they report cases.

In the data from the COVID19 API, all 11,000 cases are allocated to the 17th of December. To overcome this, we will evenly distribute the cases out over the preceeding 5 days. This may not be the most accurate way of doing it, but it will result in the cleanest picture when it comes to plotting the graphs.

date_list = ['2020-12-13', '2020-12-14',
             '2020-12-15', '2020-12-16', '2020-12-17']

uk_cases.iloc[(uk_cases.query("areaName=='Wales'").query("date==@date_list").index), 2] = np.flip(
    np.array(list(range(2494 + int((2801 - 2494)/6), 2801 - int((2801 - 2494)/6), int((2801 - 2494)/6)))))

Finally we add a column to the dataframe called dailyChange which will keep track of if the number of new cases has gone up or down per day.

grouped_df = uk_cases.groupby('areaName')

uk_cases['casesChange'] = grouped_df.apply(
    lambda x: x['newCases'] - x['newCases'].shift(1).fillna(0)).reset_index(drop=True)

uk_cases.sample(5, random_state=40)  # a random sample of rows
date areaName newCases cumCases cumCasesRate newDeaths casesChange
1074 2020-12-02 Scotland 951 98512.0 1803.2 0.0 197.0
50 2020-02-22 England 0 23.0 0.0 0.0 0.0
512 2020-05-24 Northern Ireland 25 5195.0 274.3 0.0 -16.0
961 2020-08-11 Scotland 52 19190.0 351.3 0.0 23.0
1054 2020-11-12 Scotland 1212 80204.0 1468.1 0.0 -49.0

Notice that only deaths in England have been counted in the newDeaths column. I prefer to look at the number of cases per 100k population, but to do this with the newCases column, we would need to grab population data for each country. Alternatively we can estimate the population by using the cumulative cases per 100k column - the cases per 100k is given by

$$ \text{cases per 100k} = 100000 * \frac{\text{cases}}{\text{population}} $$

We will take numbers from the latest available day (just to make sure there are no zeros). For Wales:

wales_pop = round(100000 * uk_cases.query("areaName == 'Wales'").cumCases.max() /
                  uk_cases.query("areaName == 'Wales'").cumCasesRate.max())
print(f'Wales population: {wales_pop}')
Wales population: 3152885

which is about right (it was 3,152,879 in 2019..). And for the rest of the countries:

countries = ['Wales', 'Scotland', 'Northern Ireland', 'England']
countries_population = dict()
for country in countries:
    countries_population[country] = round(100000 * uk_cases.query(
        "areaName == @country").cumCases.max() / uk_cases.query("areaName == @country").cumCasesRate.max())

if 'population' not in uk_cases.columns:
    countries_pop_df = pd.DataFrame.from_dict(countries_population, orient='index', columns=[
        'population'])
    uk_cases = uk_cases.join(countries_pop_df, on='areaName')

uk_cases['newCasesRate'] = 100000 * uk_cases.newCases / uk_cases.population
uk_cases['casesChangeRate'] = 100000 * \
    uk_cases.casesChange / uk_cases.population

After all that, we just added some new columns that use the 'per 100k' metric. The last thing we will add is a column showing the number of cases over a 7 day period.

Weekly cases

We will take the 7 day rolling sum of the new cases rate (i.e, new cases per 100k population) grouped by each country, and fill the missing values with 0's.

uk_cases['weeklyCasesRate'] = uk_cases.groupby(by='areaName')['newCasesRate'].rolling(7).sum().reset_index(drop=True).fillna(0)

Overview of UK cases

For the plotting we will also take the total cases for the UK. We could do this by grouping by date in the uk_cases dataframe and summing up the new cases like that - however we will just run another query with the Cov19API and run the same preprocessing as above, but this time cleaned up into a function.

overview_cases = Cov19API(filters=filter_all_uk, structure=structure_cases_death).get_dataframe().fillna(0)
def preprocess_dataframe(df):
    df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
    df.sort_values('date', inplace=True)
    df.reset_index(drop=True, inplace=True)
    df['casesChange'] = df['newCases'] - df['newCases'].shift(-1).fillna(0)
    population = round(100000 * df.cumCases.max() /
                  df.cumCasesRate.max())
    df['newCasesRate'] = 100000 * df.newCases / population
    df['casesChangeRate'] = 100000 * df.casesChange / population
    df['weeklyCasesRate'] = df['newCasesRate'].rolling(7).sum().fillna(0)
    return df
preprocess_dataframe(overview_cases)
date areaName newCases cumCases cumCasesRate newDeaths casesChange newCasesRate casesChangeRate weeklyCasesRate
0 2020-01-03 United Kingdom 0 0.0 0.0 0 0.0 0.000000 0.000000 0.000000
1 2020-01-04 United Kingdom 0 0.0 0.0 0 0.0 0.000000 0.000000 0.000000
2 2020-01-05 United Kingdom 0 0.0 0.0 0 0.0 0.000000 0.000000 0.000000
3 2020-01-06 United Kingdom 0 0.0 0.0 0 0.0 0.000000 0.000000 0.000000
4 2020-01-07 United Kingdom 0 0.0 0.0 0 0.0 0.000000 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ...
365 2021-01-02 United Kingdom 57725 2743385.0 4107.1 0 2735.0 86.418759 4.094505 511.919426
366 2021-01-03 United Kingdom 54990 2791549.0 4179.2 0 -3794.0 82.324254 -5.679909 548.581340
367 2021-01-04 United Kingdom 58784 2830849.0 4238.0 0 -2132.0 88.004163 -3.191768 574.628979
368 2021-01-05 United Kingdom 60916 2836795.0 4246.9 0 -1406.0 91.195931 -2.104890 586.277734
369 2021-01-06 United Kingdom 62322 0.0 0.0 0 62322.0 93.300821 93.300821 604.690282

370 rows × 10 columns

Plotting the data

We will use the Python library Altair for visualising the data, see the altair docs for more information.

First we have a graph which shows the daily change in the number of new cases for each country. This number jumps up and down all over the place, which is likely due to delay in reporting of new cases over the weekend. Another interesting thing is that it looks like the daily cases in Wales experienced a much shorter period of calm over the summer (calm in the sense of daily cases not jumping up and down).

The orange bars are days when the number of new cases (per 100k population) was more than the previous day, while the blue are days when the number of new cases dropped. The red line is the 7 day moving average.

When the moving average line is below 0, it means that there is a consistent drop in new cases. We can see this clearly happening around the times that lockdowns were introduced (though, to varying degrees). I will update the graphs soon with a marker of when each lockdown started.

import altair as alt

bars = alt.Chart(uk_cases).mark_bar().encode(
    x="yearmonthdate(date):T",
    y="casesChangeRate:Q",
    tooltip='casesChange',
    color=alt.condition(
        alt.datum.casesChangeRate > 0,
        alt.value("orange"),  # The positive color
        alt.value("blue")  # The negative color
    )
).properties(title='Daily change in number of new cases with 7 day rolling mean',width=800).interactive()

line = alt.Chart(uk_cases).mark_line(
    color='red',
    size=2,
    opacity=0.6
).transform_window(
    rolling_mean='mean(casesChangeRate)',
    frame=[0, 7],
    groupby=['areaName']
).encode(
    x='yearmonthdate(date):T',
    y='rolling_mean:Q'
)

alt.layer(bars, line, data=uk_cases).facet(alt.Column(
    'areaName'), columns=1).resolve_scale(y='independent')

Next is a bar chart of the number of new cases in each country (per 100k population), with the 7 day moving average of cases. Again we see that Wales saw a longer period of raising and falling cases compared to the other countries.

After a period of cases falling, each nation is now seeing a rise in the number of cases - especially in Wales.

bars = alt.Chart(uk_cases).mark_bar().encode(
    x="yearmonthdate(date):T",
    y="newCasesRate:Q",
    tooltip='newCasesRate',
    color=alt.condition(
        alt.datum.dailyChange > 0,
        alt.value("orange"),  # The positive color
        alt.value("blue")  # The negative color
    )
).properties(title='New cases per 100k population with rolling 7 day average', width=800).interactive()
line = alt.Chart(uk_cases).mark_line(
    color='red',
    size=2,
).transform_window(
    rolling_mean='mean(newCasesRate)',
    frame=[0, 7],
    groupby=['areaName']
).encode(
    x='yearmonthdate(date):T',
    y='rolling_mean:Q'
)
alt.layer(line, bars, data=uk_cases).facet(alt.Row('areaName'), columns=1)

countries = uk_cases['areaName'].unique()
countries.sort()

selection = alt.selection_single(
    name='Select',
    fields=['areaName'],
    init={'areaName': 'Wales'},
    bind={'areaName': alt.binding_select(options=countries)}
)

# scatter plot, modify opacity based on selection
bars = alt.Chart(uk_cases).mark_bar().add_selection(
    selection
).encode(
    x=alt.X("yearmonthdate(date):T", axis=alt.Axis(title='Date')),
    y=alt.Y("weeklyCasesRate:Q", axis=alt.Axis(title='Incidence rate')),
    tooltip='weeklyCasesRate:Q',
    opacity=alt.condition(selection, alt.value(1), alt.value(0))
).properties(title=f'7 day incidence rate of individual countries vs rolling mean across the UK', width=800)

line = alt.Chart(overview_cases).mark_line(
    color='red',
    size=2,
).transform_window(
    rolling_mean='mean(weeklyCasesRate)',
    frame=[0, 7]
).encode(
    x='yearmonthdate(date):T',
    y='rolling_mean:Q'
)

alt.layer(bars, line)