Preparation and data cleansing on Alteryx

Last Friday, we were tasked with our first presentation, which was to give a short lesson or demonstration on some of the key skills we garnered from our first week in The Data School. I was assigned to give a presentation on data cleaning and preparation within Alteryx; a somewhat daunting task given that I only had one session with the software. My presentation was relatively unsuccessful as a result of not being familiar enough with the processes I was attempting to demonstrate on screen, meaning I ended up getting a bit flustered and forgetting to fully explain what I was doing and why. However, despite it feeling pretty horrible at the time, this was a useful learning experience that I will endeavor to take onboard. In the future, I will ensure that I keep the scope and purpose of my project in mind at all times in order to deliver what is required. Greater preparation would have prevented me from getting so flustered and would’ve allowed me to deliver an all round better presentation.

Nevertheless, what I was trying to achieve works well in a blog format, as I will hopefully illuminate below:

The first step, as ever, is to load in the data. Upon doing this, it is of the utmost importance to  check the data to see the extent of cleaning or preparation that is required, if any.

After loading the data, the results window gives a plethora of information.


We can then see that we have a number of null values. Using the browse tool (the binoculars in the initial image) we can take a closer look to try and investigate the cause of these nulls.

Investigating the configuration pane within the browsing tool allows us to see the full extent of values within a particular field, so in this example viewing the country field shows us every value within it. It is clear that every entry should be a 2 or 3 character country code, with entries of country, this data is private, redacted all contravening this. These are likely as a result of formatting within spreadsheet software, so this needs to be rectified in order to manipulate this data set within Alteryx.


Scrolling through the data, we can see that the last 4 rows only contain text rather than the necessary values as shown below:


So to get rid of these, we can sample the data to only spit out the first 525 rows. To do this, we use the sample tool below, setting it to sample the first N = 525 rows in the dataset. The sample tools allows us to select a certain amount of data, according to the rows, that we would like to keep in our dataset.



After using this tool, we are left with 525 specified rows. Checking this, we can see there are still some null values suggesting there is some more issues to deal with

Viewing the data again using the browse tool, we can see that, in the country tab, there is one value that is clearly not correct.


Near the bottom, you can see there is a country value, whereas every other value in the country field is a 2 or 3 letter country code.

To rectify this, we can use the filter tool. This allows us to set conditions in order to remove any values that do not fulfil said conditions.




Here we are telling Alteryx to filter out any values in the country field with a string length of less than 4. This then outputs as true or false. Looking at the false window, we can see that there was a duplicate title that has been dealt with. This has also removed all the null values.


The last thing to check is whether we have any duplicate entries in our dataset.

To do this we use the unique tool:



What this tool does is essentially check all the fields selected for unique entries, meaning any duplicates will be removed. If we check the duplicate output of this tool, we can see that there were 2 duplicated entries that needed to be removed.

We are left with 522 records with no null values, therefore this data should then be ready for manipulation and use in visualizations.

Author:
Lucas Krokatsis
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab