Time flies so fast. I still remember the first day I came to The Data School with a little knowledge in preparing data and building visualization. But now, looking back on the progress, I have learned and improved my skills thanks to all the coaches who spent much time helping, supporting, and giving valuable feedback to each team member. Today marks the first day of the dashboard week. It means that I have almost finished the training course. The Dashboard week is a good test for me to review what I have learned so far and check how I can solve the problem quickly.
In this blog, I share how I built the first dashboard on the Dashboard week. The process includes:
1/ Understand the data
2/ Clean/ prepare the data on Alteryx
3/ Sketch a draft on Excalidraw
4/ Build visualization on Tableau Desktop
5/ Test and publish on Tableau Public
1/ Understand the data
On the first day of the dashboard week, the DSNY 5 team was assigned Young Survey People data on Kaggle. There are 1010 rows with 150 columns in the dataset. The data dictionary on the Kaggle page is very helpful (Image 1). I spent the first half hour looking at the data and trying to understand the structure of the dataset based on the data dictionary.
There are many columns in the dataset. Most columns include the numeric values. Demographics include the text values. There are many categories: music preferences, movie preferences, hobbies & interests, phobias, health habits, ... In 5 hours, I think I only choose Music Preferences and Movie Preferences.
After I know which columns that I need, I start cleaning and preparing data in Alteryx.
2/ Clean/ Prepare Data in Alteryx
I need to remove unnecessary fields in the dataset and only choose columns related to Music Preferences, Movie Preferences, and Demographics. (Image 2)
To do that, I need to import the dataset into Alteryx. Then, use the select tool to remove unnecessary fields. I also need to use the Transpose tool to transform the data from columns into rows (The title of each column will be in each row with the value). The key columns in Configuration are all Demographics fields; while the Data Columns are columns with numeric values.
I did one branch for music and another branch for Movies with the same step. Then, I added a new column called Category with Music in the first branch and Movie in the second branch. The purpose is to split the data into 2 categories, so I can union the data and filter with parameters in Tableau later.
After running the workflow, I got the result same as Image 4. Now, I know which rows are related to Music or Movies. The Name field is the genre of each category and the value is the rating score for each row.
3/ Sketch a Draft on Excalidraw
Now, my data is ready for building visualization but I need to sketch my ideas before jumping into Tableau. I remember in the past that I did a WorkoutWednesday challenge. It relates to the rating score. That is WOW2023_Week 4 challenged by Sean Miller. So I tried to build my visualization based on that challenge.
I got the first visualization in my mind. Then, I am curious how other factors could affect the survey result. I see in the prepared dataset includes age, gender, height, weight, number of siblings, town/ village, ... It could help me to see the correlation between those fields. Therefore, I chose age, gender fields, and the number of rating scores in each category.
The last part of my dashboard could let the user explore the data by themselves by letting them choose the fields they want to see if the factors could affect the survey result.
Now, I have ideas and know what I need to build in Tableau. I can open Tableau Desktop.
4/ Build Visualization on Tableau
The first visualization that I need to build is the visualization same as WOW2023_Week4. I need to compute the percentage of positive scores, the percentage of negative scores, and the neutral score.
For example: To calculate the percentage of the positive score
SUM(IF [Value] in (4,5) THEN 1 ELSE 0
END)
/
COUNT([Category])
I sum all values have 4 or 5 in the Value column. If it's true, then return 1 and sum them up. Otherwise, return 0. Finally, divide that sum by the total rows in the dataset.
Repeat that step but with other values for the percentage negative score (1 or 2) and neutral score (3).
The Circle mark value is calculated by summing up the percentage of positive scores and the percentage of negative scores. Then sort the bar chart by that circle mark.
In this step, I also create a parameter to let the user switch music to movie or movie to music. I use the calculated field to filter the value when the user switches to another category.
The next visualization is the butterfly chart. I could compare the Male and Female voters by age. The user could choose different rating scores to compare who voted the most and at which age range. (Image 8)
The last visualization I built is the bar chart to compare the number of siblings by each voter in the city or village. When the user clicks on the age on the butterfly chart, it will filter out how many siblings the voter at that age has.
5/ Test and Publish on Tableau Public
In the last step, I follow the structure of the sketch on Excalidraw. I added more BANs and found, and downloaded the icon for each category (music/ movie). Then, I format the dashboard with outer padding, and inner padding, making sure the container is correct. In the butterfly chart, I need to use filter action for the last bar chart. I need to make sure that the source and target sheet is correct. Finally, try testing everything again from the top to bottom.
Now, I am ready to post it on Tableau Public. You can view my dashboard here.
That is all the steps that I did on the first day of the dashboard week. The dashboard week helped me very much in applying all the skills I have to solve the problem and build the dashboard in 5 hours. It is good practice before I do a real project.
Now, I am looking forward to doing the next challenge on the second day of the dashboard week.
My next blog is coming soon!