Hey! Today I’ll be going through challenge 56 of the Alteryx Challenges – a regex special edition. Now if you’ve got no experience in Regex, don’t worry, this will be a good place to start.
Let’s go!
The Challenge
Turn this
Into this
The challenge is to take the hashtags from the text and parse them out so they can stand alone on the rows, with the count ID on the columns.
Step 1
What do we need and what don’t we need? Well, we will need the text because that contains the hashtags, and it looks like the id_str_ returns in the bottom picture as well. I see no use for lang, user.screen_name or created_at, so lets bin them.
Pick up the SELECT tool and drop it in. Now we can untick lang, user.screen_name and created_at.
Step 2
Now, it’s time for REGEX.
We are parsing the text so we should select that as the column.
Equation key;
\ – new character/word
a-z – lower case letter from a-z
A-Z – capital letter from a-z
+ – one or more of the character
So the expression reads – hashtag then any character from a-z or A-Z and the + means to take characters until a space arrives (only use + for one or more characters).
Tokenise so to split it up and split it to rows (columns can work here as well).
Step 3
Lets get rid of those nulls. The output does not have it in so neither should we.
Pick up FILTER, drop it in and only keep text if it = is not null
Step 4
The output also has the id_str on the communs and the hashtags on the rows. Look like well need to pivot. Pick up CROSS-TAB, and drop it in. So plug in ID_str so it appears on columns and text and rows. Concatenate or First doesn’t matter this time, either or.
Step 5
We’re almost there. Now let’s make those blank spaces into 0’s and those hashtags into 1’s. There’s multiple fields and we need a formula that will apply to all at once ideally, drag on the MULTI FIELD FORMULA tool.
Now tick the 3,10,16,17 fields so the formula works for them. And put the Id_str_ prefix in.
So what do we want to do? – we want any hashtag field to appear as a 1 and any blank to appear as a 0. Let’s not overcomplicate this. Expression is at the bottom of the next picture.
Step 6
Final Step. So the last thing we need is that total collumn. But if we want to create a total collumn we have to add them up, and we can’t do that with string fields Pick up SELECT tool, and change ID_str’s to int64 (integers).
Now finally, a quick FORMULA tool. Make a new collumn called total and plug in this simple formula;
And we are done!
Finished workflow here: