DW Day 4 - The Home Runs of Major League Baseball

by Alex Fridriksson

This day we had over a century’s worth of data to analyse, I chose to focus on home runs.

Prepping the data was fairly simple. We got one nicely formatted CSV file that we just had to separate into columns. The catch was however that there were no headers. We had to consult the data dictionary to find out what the headers for the fields should be.

This is what the data dictionary looked like:

Datadict

As you can see, is it not ideal, since for example fields 50-66 have the same name. In order to use dynamic rename or find and replace we needed to reshape it.

Some chose to try to do it programmatically in Alteryx (which did not work out), one just renamed all 160 fields manually in Alteryx, another and myself adjusted the documents in excel to be able to use the dynamic rename tool later.

datadict2

Then it was just a matter of renaming the columns and adjusting field types Alteryx

workflow

After playing around with the data a bit and getting a quick lesson from Andy about how Baseball works, I decided to just focus on home runs.

Here is the final result:

DWDay4viz