Dashboard Week Day 1: European Water Quality Data

by Ross Easton

And so it begins…

Here at the Data School, this week is Dashboard week. This means that every day this week myself and my DS17 colleagues will be tasked each morning with prepping some data, vizzing it and writing a blog about it all by 5pm so that we can present it the next morning. So prepare yourself for an influx of blog posts.

Today we were tasked with visualising some water quality data from https://gemstat.org/data/data-portal/ – a site with a comprehensive amount of data. We each had to choose data from a specific continent – I chose Europe – and take that data from the site. This task was made markedly easier by the interactive design of the website allowing you to ‘lasso’ areas of the world you wished to request data for.

What a nice website

I then however encountered a problem as, after making a request, the site would email you a link to download your requested data….but after waiting for quite a while there was no link in sight. I got past this by making 7 smaller requests – which were granted much more quickly but required more data prep.

The data required a small amount of preparation but nothing too complex or time consuming. Essentially just a split to rows tool and then a few joins to attach the measurements to the metadata containing measuring station locations and parameter information.

Some very brief data prep

With the data now in a usable format I dove into Tableau. I initially thought that looking at indicator organism information would be most interesting – as a way of quantifying areas in which drinking the water would be most likely to make you ill. So I set about analyzing this data for trends.

I immediately noticed some bizarre looking results which, on inspection proved likely to be erroneous. The data set helpfully included a field called ‘data quality’ which allowed me to simply apply a data source filter to exclude any ‘suspect’ data – which sorted the issue out very quickly.

Seems suspicious…
Data source filter to the rescue

It was at this point, however, that I began to notice that I didn’t seem to have many measurements taken outside of the 1990’s…and on inspection this revealed some worrying inconsistencies in the recording of the data.

Turns out most of my records are exclusively from the early 90’s…

At this point I decided to see if there were similar trends across the other data sources I had produced and found that these inconsistencies were common among many of them. With this in mind I decided to focus on the chemicals present in water rather than indicator organisms, as there was greater consistency to the measurement of the data which I felt would stand me in better stead when making my viz.

Similar trends were present in most of the datasets such as here (although less severe) in the Temperature dataset.

At this point however I then found that across the whole of Europe there was no great consistency in the collection of data, save for in a few small areas. The best of these areas proved to be Belgium – where apart from a decade long gap from 1992-2001 they had been consistent about data collection and had a large number of measurement stations. I therefore decided to focus my viz specifically on Belgium, and set about researching the impact of different chemicals on bodies of water.

From here the chemicals measured split simply into carbon, phosphorus and nitrogen – which would form the basis for what was going to be an exploratory dashboard allowing the user to dig into the trends for each of these chemical groups, alongside some information explaining the damage that each of these chemicals can do.

The result was this:

Interactive link: https://tabsoft.co/37A0wMi