In this post Chris Gutteridge muses on working with open Covid data to make it more understandable to the lay person. Chris works in iSolutions, the University’s computer services department. He is an open data champion who has been recognised for his work with national awards. This month he also published a paper on data hygiene factors in Patterns.
Itās love data week starting Monday 15th February. I like playing with data but am so better at finding interesting ways to visualise it than I am at interpreting it!
The data that impacts all of our lives right now is the UK COVID statistics. Helpfully, these are available from an API. My first idea for this data was to try to find a way for the numbers to land with people in an emotional way. I hit on the idea of showing milestones in UK COVID deaths in terms of city populations, getting the data from DBPedia. However this was a bit sparse so I expanded it to any āUK settlement of 1000 or moreā. Sadly, we were already past that threshold when I wrote it. You can see my COVID cities tool on the web, and the code is available from GitHub.
I thought a lot about how R is a terrible value for public understanding. R=0.9 good, R=1.1 bad. Itās a measure of change in scale but not in a format anybody uses in their daily lives. I figured the one place the public are comfortable with exponential growth rates is mortgages so a clearer number would be, say, (R-1)*100 which would give a +/- percentage of new cases per case. But R is also very difficult to relate to whatās going on and whatās likely to happen next. It also really matters what the current number of cases are. Occasional spikes in R when thereās 10 cases a day isnāt a big concern. When thereās 1000 cases a day it really is. So after a lot of though Iāve made a visualisation that shows things I think are more useful for public understanding of whatās going on with COVID.
My COVID projection page shows the data I think is useful. The grey barchart is the 7 day averaged number of cases. The green & red bar is the daily change (this is very smoothed out by working out the weekly change in the 7 day average and converting that to a daily change. I think this is a really good indicator of when things were getting worse and better rather than the case numbers themselves. To make the recent values visible I truncated the really big growth from earlier in the year so it didnāt skew the scale too much.
Next I wrote a function to work out how long it is taking cases to double or halve. I think this is a really good way to explain why the virus is different to cancer. This virus can double in cases every three days which sounds more like a loan shark than a mortgage. Writing this equation meant that I used log() for the first time in a really long time!
The final column is what I think of as the āEEEP!ā factor. It projects todayās cases (not the 7 day average) by the current growth rate for 28 days and sees where weāre heading for if things donāt change. For this to be a concern cases have to be high AND increasing. I think that EEEP! factor is good for showing why lockdowns were needed.
The final feature I added is to mark weekends and the national lockdowns and other events ā although this is complicated buy the fact that NI, Wales, England & Scotland may not quite be doing the same thing as each other.
You can see that lockdowns followed peaks in the āEEEP!ā factor, although lockdown #2 was several weeks after Eeeeep #2.
What it does show is that contrary to what youād believe from my Facebook feed, there was a sustained fall in cases for many weeks after the Cummings incident. It makes it very visible when lockdowns started working and it does show a few notable spikes in the rate of increase ā red peaks in the red/green column, which suggest something caused them in the days or weeks before hand.
Notable peaks in the rate of increase were July 8th, September 12th, October 9th, December 22nd, Jan 4th. Iām tempted to correlate Schools and Universities reopening and Christmas to explain the Sept, Oct and Jan peaks but correlation/causation and all that. I have no guess at what caused the spikes in July & December. Any ideas?
Iām not an expert at interpreting data so please donāt quote my interpretations of this data as a source of truth!
[UPDATE: I got inspired this morning to add a new column, āDoublingsā which shows how many times cases have to halve to get back to 1 case. Obviously by that point cases from borders will be more of an issue that community transmission, but itās sobering to see that with lockdown it takes two weeks to halve and to get to 1 case would take exactly half a year, so mid-August. Hopefully Summer and vaccines will change this picture]
This piece was originally published by Chris Gutteridge on the webteam’s blog