A Plot to Learn

…a dot strip plot, that is!

First off, what’s a dot strip plot? It’s similar to a scatterplot, but instead of two axes, there’s just one. Dot strip plots show the distribution of values with one dot representing one value, as well as outliers and gaps in the range. 

As you can see in the GIF above, I used dot strip plots to show where each state ranks across each measure. You can see that in the year 2021, despite California being ranked #1 for the highest population and the most breaches, California loses its #1 ranking for most affected individuals, percent of state affected, and median breach size (i.e., when you define “hardest hit by HIPAA breaches” in ways beyond which state experienced the most breaches). 

Dot strip plots are quite compact, allowing you to showcase a lot of points for individual investigation (especially outliers) without taking up a lot of space. This compactness allows you to place them side by side to compare multiple distributions, like the way I did for the third section of my HIPAA breaches reviz. However, they become less useful if you have many of the same value, since those dots would all occupy the same space, making it difficult to discern the individual values. When they’re all in the same spot, it’s also difficult (if not impossible) to tell how often that same value appears in your data. Thus, you should weigh the pros and cons of dot strip plots before using them to visualize your data, as this type of chart isn’t as common or easily understood as, say, a bar or line.

The Process

#1: Create Basic Dot Strip Plot

Start simple, you know the drill. We’re going to first create a basic dot strip plot before tricking it out a little more to make the chart easier to read.

1. Start by bringing your measure (in my case, it’s CNT(Name of Covered Entity)) to Rows.

  1. Switch from Automatic to Circle on the Marks Card. You should now see a big circle on the canvas; this is because it’s still aggregated right now.
  1. To disaggregate, bring a dimension into Detail. In this case, it’s State Name.
  2. Seeing all years at once could be confusing and obscure changes over time (a state may have experienced a lot of breaches in the 2010s but dropped off more recently), so I’m also going to add Year to filters and show only one year at a time.
  3. Finally, I’m going to change the circles’ Size and Color, by making them smaller and lowering the color opacity. That way, if there are many dots close to each other, it’s easier to discern the individual dots.

    Your final result should look something like this:

#2: Customize the Dot Strip Plot

Now, let’s add a few visual aids to make the dot strip plot easier and faster to understand.

1. Add a Reference Line: The dot strip plot shows the full distribution of points, specifically the number of HIPAA breaches by state for the year of 2022. One question that could come up upon seeing it is: What’s the national average? If we bring in a reference line for the average, we’ll 1) answer that question and 2) show how each individual state compares to that average.

a. Drag and drop a Reference Line from the Analytics Pane onto the canvas. We’re looking for the national average, so drop it over Table.

b. Configure how you want to see the reference line visually:

I chose to forgo the tooltip and write a simple label for the reference line because I don’t want to make my users waste time hovering over the line to see the value. Not only that, it’s cognitively taxing to hold a value in your head while going to the next value and then comparing the two. 

Once you’re done, hit OK. Now you have a reference line!

2. Always Color in Certain States: For my data narrative of understanding the impact of HIPAA breaches as a New Yorker, by using certain states’ populations for comparison, I want to gray out all the other states and keep New York, Florida, and California always colored in.

(As a reminder, I am comparing New York and Florida because they have similar population sizes, and also California because it’s the most populous state and around double New York’s and Florida’s populations.)

a. Create a calculated field LABEL: CA, FL, NY to flag the three states:

IF [State Abbr] = “CA” OR [State Abbr] = “FL” OR [State Abbr] = “NY”

THEN [State Abbr]

END

This calculated field says, if it’s any of these three states, return that state’s abbreviation. Otherwise, the result is null, as we don’t want every state’s label to show, nor do we want all states to each be a different color.

b. Drag this field to Color and Label on the Marks Card. The legend will have four items/colors to set: Null, CA, FL, and NY. Too many colors will overwhelm the visualization, so I’m going to set each of the three flagged states to the same shade of blue, and gray out all the nulls.

  1. Let Users Select Their Own State(s): Now, I want to introduce a third color, to allow users to select their desired state(s) for comparison. Having this third color will keep all the other states grayed out and keep attention on the user-selected state(s), while allowing the selected state(s) to be compared to the always-blue points for CA, FL, and NY.

    I also want to give users the option to either select one state (by clicking it) or multiple states (by clicking and dragging the cursor over the multiple points).

a. Start by creating a set. Right-click over the desired dimension (State Name, in  this case) > Create > Set…

b. Now, we want to bring the State Name Set to Color on the Marks Card, so that we can set the color for the user-selected state(s). However, we already have LABEL: CA, FL, NY on Color. If you drag State Name Set to Color as well, then the calculated field that’s already there will be replaced, when we want to add a second field to Color.

Instead, drag the set to Detail first. Then, click on the Detail icon, and change it to Color. This is how you can add a secondary color legend that creates combinations of the two fields you have on Color

c. Select a state that isn’t CA, FL, or NY, like IL.

You’ll notice that the color legend now has five items:

Each item follows the format of [Field #1 on Color], [Field #2 on Color]; we now have combinations for whether the states were flagged as CA, FL, or NY and whether the state is in the set. 

CA, FL, and NY each have their own line due to the LABEL: CA, FL, NY field flagging them individually, and they’re out of the set. The Nulls (all the other states) are divided into two groups, whether they’re in or out of the set. The purple item that’s second to the bottom of the list is IL, which is the only null/not-flagged state that’s in the set. If more states were selected, there’d be more purple dots in the chart. 

Now, try selecting one of the flagged states. You’ll see that the value has changed to say NY, In. This is a completely new combination/value for the color legend, so we need to change the color from Tableau’s default colors. I want the three flagged states to always be blue to prevent confusion, so I’m going to set the colors as follows:

  • CA, FL, NY; In/Out (6 total combinations/values) = Blue
  • Null, In (one combination/value) = Purple
  • Null, Out (one combination/value) = Gray

d. You’ll notice that even though IL is now purple, there’s no label. This is because our initial calculated field LABEL: CA, FL, NY only specifies those three states and nulls the rest. 

I could edit that field to add an extra OR to account for the State Name Set, like below:

I could. But I don’t want to. Why? Because as you’ll remember, when you add an additional field to Color, the color legend will churn out data items for every possible combination. When you bring in the states, of which there are (currently) 50, and each one can either be in or out of the set…

Well, you can see where that’s going. Tableau doesn’t know they’re all in the same category of “not one of the flagged states”; it treats every state + IN/OUT as a unique value.

So. What are we going to do instead? Let’s create another calculated field LABEL: State Names in State Set.

IF "California" IN [State Name Set] OR "Florida" IN [State Name Set] OR "New York" IN [State Name Set]

THEN "

ELSEIF [State Name Set]

THEN [State Abbr]

END

This calculated field says, if any of the flagged states are in the set, the output should be null. If it’s a not-flagged state in the set, show us the state abbreviation. Everything else is null. Thus, the only time something isn’t null is when it’s a state in the set and that state isn’t CA, FL, or NY. 

Why do we not want a flagged state in the set to show its abbreviation? Because CA, FL, and NY will show up already, thanks to the LABEL: CA, FL, NY field and don’t need their labels to be repeated.

Then, bring the LABEL: State Names in State Set to Text on the Marks Card. Now, finally, the label for the selected state (IL) shows:\

Again, we created a second calculated field for the State Set labels, instead of adding to the original calculated field, to avoid having to set colors for every individual combination of [State Name] + [IN/OUT of State Name Set], and instead only have to worry about the 8 data items listed in Step 3c of this section.

  1. Create a Set Action for Step #3’s Set: Now, to build off the previous step of creating the State Name Set, if we bring the sheet to the dashboard right now, we can allow the user to select the desired state(s) from a dropdown. However, it’s far easier – and more intuitive – for a user to be able to select them by clicking on those dots (if selecting one state at a time) or clicking and dragging over multiple dots (if selecting multiple states at a time).

    That’s right, time to add a set action! Depending on whether you’ve brought the sheet to a dashboard yet, start from either Worksheet or Dashboard at the top > Actions > This sheet > Change Set Values…

I’ve configured the set action to bring the state(s) the user clicks on into the set and color the state(s) purple. When the user clicks off the visualization, the purple state(s) will turn gray again, leaving only CA, FL, and NY colored in (still blue). I have five dot strip plots using the same State Name Set, so if the user clicks on the desired state(s) on one dot strip plot, the same state dot will turn purple on all the other dot strip plots.

  1. Create a Rank Calculation: Dot strip plots allow you to compare distributions across multiple categories or measures. In this case, I have five dot strip plots (well, just one at the moment, but once I’ve finished customizing the existing one, I can duplicate it and replace the metric to avoid having to format every sheet one by one). 

    Now, pretend that I’ve created all five dot strip plots as is, so that it looks like this, with five rows of dots:
  1. CA, FL, and NY are, as always, blue, and the user-selected state (New Mexico/NM) is purple. This is pretty useful already, with the grayed out dots fading into the background, the flagged states being blue, and NM being purple for comparison purposes with the blue.

    However, there’s no denying there’s a lot of dots – 250 in total, to be exact. You can get an idea of the dots’ relative positions compared to each other and across the different measures, but let’s make it easier to compare those positions – with a rank calculation.

Tableau goes into detail about the different rank calculations, but I’m going to go with RANK_DENSE(), which doesn’t allow for gaps between the rankings, i.e. if two people come in first, the person who comes in “second” is #2, whereas with RANK(), the person who comes in “second” is #3.

a. Create a calculated field DENSE RANK – # Of Breaches.

RANK_DENSE(COUNT([Name of Covered Entity]))

This calculated field will calculate the rank based on the number of breaches.

b. Just like how the dots for CA, FL, and NY are always blue, let’s also always show the ranking for these three states. For the user-selected state, we’ll show that label as well. I called this calculated field LABEL: RANK – # of Breaches.

IF ATTR([State Abbr] = ‘CA’ OR [State Abbr] = ‘FL’ OR [State Abbr] = ‘NY’ OR [State Name Set])

THEN [DENSE RANK – # of Breaches]

END

c. Wait, we’re not done! Right now, we only have calculated fields returning the rank, i.e. just the number. It would look pretty weird to only put the rank on Label, because the label would show up as “CA 1.” However, we can’t just write “#” in Label, because the # would pop up for every dot.

We need the # to pop up only for CA, FL, NY, and the user-selected state(s), which is going to use the same condition as the LABEL: RANK – # of Breaches calculated field; it’s just going to return something different, i.e. the #. I called this field LABEL: # for Rank:

IF [State Abbr] = ‘CA’ OR [State Abbr] = ‘FL’ OR [State Abbr] = ‘NY’ OR [State Name Set]

THEN ‘#’

END

d. You’ll also notice that the label I created has parentheses. We’ll need to duplicate the calculated field for the # for each of the parentheses; that’s three separate calculated fields for the #, (, and ). You can’t put them all in the same calculated field because you need to insert the rank in between. If your sole calculated field returned “(# ),” with a space in between for the rank, there’s no way to insert it.

e. Format the Tooltip: Don’t forget to format your tooltips! With so many dots in the view, use tooltips to provide extra information about each of the points.

f. Finally, once you’re happy with the way your dot strip plots look, you can duplicate the worksheet and swap out the original measure for the next one.

I hope this was helpful! Until next time :)

Author:
Vivian Ng
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab