Dashboard Week Day 2

Day 2 was upon us and we were tasked with a more analytical challenge this time.

We were given the job of exploring a Smoking Questionnaire's results and diving into the data to find interesting insights.

To start I looked at the data dictionary to see what fields I would like to use and started coming up with lots of questions and listing out what fields I could use to answer them.

and threw a few sketches together to see how I might want to explore the data for example:

But now it was time to actually get the data.

So I downloaded the data and at this first hurdle is where I started to fall. The data was stored in a dat file which in my understanding can't yet be read by Alteryx. As I didn't know how the file was structured I brought it into a text editor to have a look at it.

There were lots of fields and it was quite difficult to understand but knew that I could bring the data in as a fixed width string which I'm going to show because it was useful and important for later.

I went through many different attempts at how to extract the data. Trying parsing out each space, but not every field had a space, adding a space inbetween those values, but then there were some inconsistencies with the fields, regexing out the data but there was just too many fields!!

That's when it hit me I realised that in the documentation the fields referenced columns and there were 701 columns referenced and at this point I remembered that the length of the strings I had brought in were 701. So this is when I realised that in the documentation the column referenced the place in the string.

At this point I was frustrated and my morale was low but then I decided to do something crazy. I brought in the pdf of the documentation as a csv to get the places the field would be in the string. Then parsed out each character in the dat file and joined them together. IT WORKED.

But to get to this point I had used a lot of my time so I had to quickly get on with the analysis.

Author:

Dominic Emm

View Profile