Todays topic is looking at tornadoes across North America. The dataset was generally clean, extensive, and came with a data dictionary (which was invaluable).
The Data
The data ranges from 1950-2014, but I immediately found that the data is collected differently during different time periods. For example, dollar loss is ranked from 1-6 until 1966, until 2015, the value is listed as '1, 0.5' which can then be multiplied by 1 million to get the actual value, and from 2016 it is in actual amount (eg 1,000,000). This also applies for other metrics, and with this in mind (and the extensive amount of data), I decided to focus on data from 1996 onwards. Furthermore, I wanted to do some exploration into measures instead of focusing only on dimension-measure analysis.
Something I was a little unsure about at the start was 'what does 1 row actually represent?'. There was a column which looked like a unique id field, but I realised that it restarted every year. This unique id field also had repetitions, and each of those repetitions had a different state (i.e, 1 tornado affected multiple states per record). This meant that some calculations required a bit more thought, and LODs became very useful when tackling these problems.
Narrowing Focus
I think I want to aim this week on really drilling down to just a few tornadoes, and going into deeper analysis. I found something really interesting in 2011, which shows 3 values that looks like 'outliers', but they actually aren't. 1 happened in Missouri, and 2 happened in Alabama. I think i'm going to really focus on these today, and try and get some answers!
My idea was to do whatever I could to really highlight these values as 'extreme'. I did quite a bit of research into the three tornadoes, and that information became very useful (and lead to more discoveries in the data).
Immediate Findings
This graph looks at different magnitudes of tornadoes on the EF scale, and the size indicates that the total economic losses of the specific tornado reached $1bn. I then coloured the top 3 to show that they all happened in 2011.
After this, I had a few problems finding additional data (I wanted to look at temperature trends in different atmospheres to try and find any relationships) but I was unable to do this in the time frame.
Formatting Problems
I commonly struggle with formatting 'non-business' dashboards. Struggling for creativity, finding a good colour palette, organisation... So today I spent a fair bit of time pushing myself to get a long form dashboard. I think it went okay, but that (and spending an hour trying to query an API realising I was using 'torpedo' instead of 'tornado') definitely took up a large amount of time today.
