Despite coming from an R and Python background, I've become a recent convert to the node based workflows of Tableau Prep and Alteryx. I've come to really appreciate the visual flows and the ease with which the different functions can be explained and documented.
Learning, or relearning, many of these data cleaning and preparation functions has involved translating between Python code and Prep's nodes. I'm often aware of how I might do something in Python and trying to translate it to prep, or vice versa, particularly if I want more customization that Prep might allow.
This article provides an introductory guide to how Python functions, primarily using the Pandas library, correlate to Prep functions (and vice versa). This will cover prep's clean, aggregate, and join and how you can get started with their equivalents in python.
Pandas is the go-to Python library for data manipulation and analysis. It introduces the DataFrame, or DF: a 2D table structure that looks and feels exactly like a spreadsheet or a Tableau Data Grid, but is manipulated through code rather than visual interactions or clicks.
Clean Step:
Preps clean step is a beautifully speedy way to clean your data, allowing you to filter, clean, group, and split values, as well as create new columns (calculated fields). While this isn't a comprehensive list of either Prep or Pandas' functions, here are some of the clearest equivalents between these two approaches.
| Tableau Prep Action | Visual Menu / Calculation | Python (Pandas) Equivalent |
| Rename Field | Double-click header | df.rename(columns={'Old': 'New'}) |
| Change Data Type | Click the Data Type icon (e.g., # to Abc) | df['Col'] = df['Col'].astype(str) |
| Remove Field | Right-click -> Remove | df.drop(columns=['Col']) |
| Filter: Keep Only | Click value -> Keep Only | df = df[df['Col'] == 'Value'] |
| Filter: Exclude | Click value -> Exclude | df = df[df['Col'] != 'Value'] |
| Calculated Field | Create Calculated Field | df['New_Col'] = df['A'] + df['B'] |
| Group Values | Manual Selection / Pronunciation | df['Col'].replace(['Val1', 'Val2'], 'Group') |
| Clean: Uppercase | Right-click -> Clean -> Uppercase | df['Col'].str.upper() |
| Clean: Trim | Right-click -> Clean -> Trim Spaces | df['Col'].str.strip() |
| Split (Automatic) | Right-click -> Split -> Automatic | df['Col'].str.split(expand=True) |
| Split (Custom) | Right-click -> Split -> Custom | df['Col'].str.split('-', expand=True)[0] |
| Handle Nulls | Right-click -> Filter -> Null Values | df.dropna(subset=['Col']) or df.fillna(0) |
Aggregate
While Prep has single centralized aggregate step, Pandas has one groupby step function that you can follow with a function that specifies the type of aggregation.
df.groupby('Column you want to group')['column to aggregate'].sum()
df.groupby('Column you want to group'): This tells Pandas to look for unique values in your grouping column. In Prep, this is the same as dragging a field into the Grouped Fields bucket.['column to aggregate']: This filters the data down to just the column you want to do math on. In Prep, this is like dragging a field into the Aggregated Fields bucket..sum(): This is the type of aggregation. Prep defaults toSum, but this is where you’d change the dropdown toAverage,Count, etc.
For a business example where you want to sum the sales and profit by region you might use this : regional_totals = df.groupby('Region', as_index=False)[['Sales', 'Profit']].sum()
In order to make sure you get a table output, you should use as_index=False so that the grouped column is still structured as a regular column.
df.groupby('Category', as_index=False)['Sales'].sum()
| Feature | Tableau Prep (Aggregate Step) | Pandas Equivalent |
| Group By | Drag field to Grouped Fields | df.groupby(['Column']) |
| Sum | Drag field to Aggregated Fields → Sum | .sum() |
| Count Distinct | Drag field to Aggregated Fields → Count (Distinct) | .nunique() |
| Min / Max | Drag field to Aggregated Fields → Min/Max | .min() or .max() |
| Mean/Average | Drag field to Aggregated Fields → Average | .mean() |
Join (merge)
While tableau prep allows you to visually select the type of join based on shaded circles, Pandas expects you to specify the type of join using how as a parameter in the merge function.
# Standard Inner Join
merged_df = pd.merge(
left_df, # left table
right_df, # right table
on='Join_Column', # what you're joining on
how='inner' # join type
)
If the column you're joining on has different names in each table then you would do it like this:
# Joining when column names differ
merged_df = pd.merge(
left_df,
right_df,
left_on='Column_In_Left_Table', # e.g., 'ID'
right_on='Column_In_Right_Table', # e.g., 'Customer_ID'
how='inner'
)
Note that because merge is from the pandas library we specify pd.merge
This is because the standard import for pandas is:
import pandas as pdso the pd is referring to pandas, but dependent on this style of import.
![SQL Joins Explained In <5 Minutes [4 Step-by-Step Examples]](https://www.acuitytraining.co.uk/wp-content/uploads/2022/01/sql-join-types.png.webp)
| Join Type | Tableau Prep | Pandas |
| Inner | Intersection (Center of Venn) | pd.merge(df1, df2, how='inner') |
| Left / Right | Left or Right circle of Venn | how='left' or how='right' |
| Full Outer | Both circles selected | how='outer' |
| Join Clause | The "Applied Join Clauses" pane | on='ColumnName' or left_on, right_on |
Thanks for reading and best of luck.
Stay tuned for future blogs on unions, pivoting, and more!
