Getting Started with Alteryx Designer
This week at The Data School, we had our first hands-on experience with Alteryx Designer. We built our first workflow to practice using key features of the software, with the objective of generating an output of the “Top 10 Corporate Customers by Profit.” In this blog, I will walk through the workflow step by step, explaining the role of each tool in the process.

Input Data

- The Input Data tool allows you to input various file types into your workflow.
- Initially, we loaded a single file titled “2017_transactions.csv”.
- We then modified the file path under “Connect a File or Database” from \2017_transactions.csv to \20*_transactions.csv to bring in all files from 2017-2020.

Data Cleansing

- The Data Cleansing tool was introduced after inputting the data to address spaces identified in the Category column.
- This tool provided an option to remove Leading and Trailing Whitespace, helping to standardise the data.

Browse

- The Browse tool allows you to examine the data at any stage in the workflow.
- We used it to confirm that the appropriate changes had been made, ensuring that only three product categories appeared in the column.

Select

- The Select tool enables you to change data types for specific columns.
- Since this was a CSV file, all data types defaulted to V_Strings, which needed adjustments.
- Alteryx provides the following data type options:
- String Data Types:
- String: Fixed-length Latin-1 string (max 8,192 characters).
- WString: Fixed-length wide string for Unicode characters (max 8,192 characters).
- V_String: Variable-length Latin-1 string, ideal for varying text lengths.
- V_WString: Variable-length wide string for Unicode characters.
- Numeric Data Types:
- Byte: 8-bit unsigned integer (0 to 255).
- Int16: 16-bit integer (-32,768 to 32,767).
- Int32: 32-bit integer (-2,147,483,648 to 2,147,483,647).
- Int64: 64-bit integer (-9 quintillion to 9 quintillion).
- Fixed Decimal: Precision-adjusted decimal values (max 50 digits).
- DateTime Data Types:
- Date: Format yyyy-mm-dd.
- Time: Format HH:MM:SS.ssssss.
- DateTime: Format yyyy-mm-dd HH:MM:SS.ssssss.
- Boolean Data:
- Bool: Represents a True or False value.
- Spatial Objects:
- SpatialObj: Represents geospatial data such as points, lines, and polygons.
- String Data Types:

Formula

- The Formula tool was used to create a new column calculating profit per order (row) using the following calculation and converting the data type to a double:

Join and Union

- Another Input Data tool was brought in to introduce the Customer dataset.
- This was connected to a Join tool.
- We matched records by selecting the Customer ID field from each table.
A Left Join was needed, which was achieved using a Union tool to connect both the L (Left Output) and J (Joined Output).


Filter

- The Filter tool was used to retain only customers from the Corporate segment.
- The next step of the workflow proceeded from the True output of this filter.

Summarise

- The Summarise tool was used to aggregate Profit for each Customer ID and Name.
- This was done by selecting these fields in the Action Table and applying the Sum function.

Sort

- The Sort tool was used to arrange the data.
- We sorted the Profit column in descending order.

Sample

- The Sample tool was used to extract the Top 10 Customers by Profit.
- We selected the option “First N Rows” and set N = 10.

Output Data

- Now that the data was in the appropriate format, we used the Output Data tool.
- The final dataset was exported to an Excel spreadsheet for further analysis.
Conclusion
This exercise provided a structured introduction to Alteryx Designer, allowing us to efficiently process transactional data to determine the Top 10 Corporate Customers by Profit. These fundamental tools set the stage for more advanced data transformations and analytics which we will encounter in the weeks to come at the Data School and in future projects.