DS17 First Project Time (that I didn’t complete!)

by Juliana Camparotti

Our first week in the Data School was purely focused on Alteryx basic tools and I’m so excited to discover what else this genius piece of software is capable of! It was my first time using Alteryx and it didn’t take me long to regret about not trying it out before.

Our first Project Time was on Friday and we received a pretty straight forward task. We were asked to take one of our old applications’ data set, supplement it with additional data using Alteryx and then update our viz using Tableau. It sounded very exciting and I straight away knew what I was adding to my data set.

My Application viz

I applied to the Data School with a viz about the European Union budget where I ranked EU countries by who contributes the most and who receives the most back from the EU budget. I also analysed how the net contribution looks like per capita and found very cool insights about it.

My Application viz

The data set came from the European Union Commission website in an Excel file with over a hundred rows of information that I had no idea what they meant, lots of text and numbers and one column for each one of the 28 EU countries. It included data from the years 2000 to 2017 and it was presented one year per Excel sheet. It looked like this:

It was a big WOW first time I opened it, but after some online reading, I learned that in order to do the calculations I wanted I would need only 2 rows of information from each of the 18 sheets in the document.

As I had no idea how to use Alteryx back then, I did it all manually by creating a new table in Excel. I took me a couple of hours making sure each country matched its information from year to year as some countries joined the EU along the years and it could have changed the country’s order. Then I also had to figure out how to pivot and cross-tab in Excel as I wanted the countries in one single column, etc.

If you are an Alteryx expert you might be laughing at me right now, I get it! Every time we would learn a new tool in Alteryx this week in the Data School I would think about how stupid I was in doing it all manually. But it worked! I was left with a clean table of only 4 columns with just the right information I needed to send the message.

The Challenge I didn’t complete

It would have been pretty simple to complete the challenge for our first Project Time as I had a very clean table and already knew what I wanted to use as additional data. I had seen before an analysis of the EU budget where they calculated the net contribution as a proportion of each country’s GDP, and that data set was also available on the EU Commission website.

However, I opted to do something different instead. I wanted to know how it would have been to clean and prepare my initial data set (that one very messy!) using Alteryx this time, then I could just add the GDP data afterwards. This way it would be way more challenging and exciting to complete the challenge for your first Project Time. Everyone in the audience agreed with me when I showed them the raw data set, but I’m gonna be honest with you, I had no idea where I was putting myself into!

Because the data was divided into sheets, I had to use a new tool we haven’t learned yet: Dynamic Input. It was not hard to find instructions online on how to use it, but I got a couple of problems that I had no idea how to fix. Before I get into more details, I want to give a big shout-out to Brian, Hanna and a couple of other folks that helped me so much!

My first problem was that my file was in the wrong Excel type and it took me a while to find one that works. Then it came to the biggest problem ever: the Excel sheets (one for each year) were not in the same format!!!!!!!!!!!

This was because some countries joined the EU later on, or some years they simply decided to use a different table format with some more insights about their data. Painfully, I learned that I could divide my data set into 5 chunks of years containing tables that matched each other’s format and run the whole process of cleaning, filtering, transposing, cross-tabbing, renaming, etc, in each one of them separately. When I say in each one of them, it’s literally in each one of them because we haven’t learned Macros yet and no one thought it was a good idea for me to try it yet.

After using the Union tool to put all the containers I had created together, I used a couple of other tools to add each country’s population to it (a thing that I also had done manually before) and then my data set was ready to use!

My workflow in Alteryx

Unfortunately, it took much longer than I expected to do it all. I was told that once I learn Macro it will become way quicker so I honestly can’t wait for it.

In the end, I did not have time to add the GDP data to it so I didn’t rebuild my viz in Tableau. However, I found this whole process I experienced so challenging that I’m very happy I was able to do it!

Alright coaches, I’m ready for the next challenge!