Managing Messy Strings With Prep

by Salome Grasland

Have you ever had a list of words with minor misspellings. Perhaps a list of authors, or a set of months. You may have used excel before to manually fix things by dragging and copying the proper word. Tableau prep is an excellent tool to use for managing messy strings. You can find the prep flow here.

  1. Open Tableau Prep and connect to your data source. For this blog I am using some data that I mocked up that you can follow along with by downloading it here
  2. Create a Clean Step after your input step.
  1. Find the Field with misspellings. This dataset only has one field that contains 77 unique values, but we will need to get it down to 5 values.

4. Use the Group and Replace feature:

  • By Pronunciation: This method groups words that sound similar, which is useful for correcting phonetic misspellings.
  • By Common Characters: Useful for slight misspellings where a few characters are off.
  • Manual Selection: Directly group misspelled variants to the correct spelling by selecting them and assigning them to the correct group.

5. Group by Spelling To the Second Level

I found for this data grouping by spelling to level 2 produced the best results.

6. Double Click the Value and Adjust Spelling 

While prep managed to get the words into 5 unique groupings, it didn’t leave us with the word correctly spelled. This can be changed by double clicking on the value and typing in the correct spelling.

Review and confirm the groupings to ensure accuracy.