Cluster analysis can also be performed in Tableau. This post will show you how, using the same beer data as my previous post which introduces cluster analysis and shows how to perform it in Alteryx. This is part of a series of blogs I’m writing on Statistical techniques in Alteryx and Tableau.
Note that clustering is available in Tableau Desktop, but not for posting online, e.g. Tableau Server or Tableau Online. This was an additional capability within Tableau from version 10 onwards. After showing how to cluster in Tableau, there is a comparison of the two results – clustering in Tableau vs. Alteryx.
Cluster Analysis in Tableau
Step One: Build out your view
Firstly build out a view that you’d like to group your data up from. Usually this is from a scatter plot, with an ID on detail; here individual Beers.
There are some limitations when clustering in Tableau. It can’t be done when: other dimensions are within the view, table calculations have been used, blended data is used, using parameters/ groups/ sets/ bins, etc.
Step Two: Add in the clusters
Once the view has been set up, add in the clusters using the ‘Cluster’ part of the analytics pane. Click and drag it onto the view, similarly to adding reference lines.
Applying clustering to the view brings up the Clusters menu. Tableau automatically assigns two clusters. If more are needed then this need to be manually inputted, e.g. four in the above example. This menu can also be brought up by right clicking on the Cluster pill and clicking ‘Edit clusters…’.
The Clusters can be saved as groups by dragging the Cluster pill in the view onto the data pane.
Step Three: Adding more variables to the classification
Similarly to Alteryx, Tableau can use more than two variables to build clusters, as opposed to just the two in the view when the cluster is dragged onto it. To do this it’s simply a case of clicking and dragging extra measures onto the Cluster menu box.
Information regarding the various clusters can be found by right clicking the Cluster pill and selecting ‘Describe clusters…’. This will bring up 2 tables of cluster metadata: Summary Diagnostics and Analysis of Model Variance.
Description of clusters in Tableau
Tableau also uses the k-means clustering algorithm like Alteryx, where the centre of each cluster is the is the mean value of all it’s members. It uses the squared Euclidean distances to create the clustering for each group/ cluster. Tableau then automatically scales each variable used in determining the clusters to account for the variety in values – similar to how Alteryx standardises the data using the z-score.
Tableau vs. Alteryx
The clustering method seems very similar (k-means clustering) in both Alteryx and Tableau. However, when using the same variables (ABV, IBU and size) to cluster the data, they do result in slightly different groupings.
In the clustering method, however, the most common approach (and only approach in Tableau) takes the mean of the cluster and any new data coming is assigned to the cluster whose mean is closest to itself.
Alteryx doesn’t let you include categorical variables in the cluster analysis, only on continuous data. But… Tableau does.
…how does Tableau take a mean of a categorical variable? This doesn’t make sense, as it would be like asking to take the mean of cats and dogs, which is why Alteryx doesn’t allow it. In Tableau it used the mode of that category rather than the mean, incorporating representation. So any new data coming in is assigned to whatever cluster most closely represents that data.
Alteryx simply takes the mean or the median, and is, therefore, potentially a more relied upon clustering tool due to trust in it’s more clear cut numerical methodology.
So which should I use?
The best thing about clustering in Alteryx and Tableau is that it takes a relatively complicated statistical technique and allows it to be accessible to any user – not just those who understand statistics. Although, it is useful to roughly understand any statistical technique before applying it. This is because it is important to carry them out correctly to accurately interpret the results.
But when it comes down to which one to use, Alteryx will often win due to the numerical clustering, rather than any categorical/ mixed inputs when using Tableau – although that can be useful at times.
Either way, it’s always good to visualise the clusters in a way that is useful for truly understanding their meaning – whether they were created in Tableau itself or indirectly using Alteryx.