Migrating between Tableau Prep and Python

Despite coming from an R and Python background, I've become a recent convert to the node based workflows of Tableau Prep and Alteryx. I've come to really appreciate the visual flows and the ease with which the different functions can be explained and documented.

Learning, or relearning, many of these data cleaning and preparation functions has involved translating between Python code and Prep's nodes. I'm often aware of how I might do something in Python and trying to translate it to prep, or vice versa, particularly if I want more customization that Prep might allow.

This article provides an introductory guide to how Python functions, primarily using the Pandas library, correlate to Prep functions (and vice versa). This will cover prep's clean, aggregate, and join and how you can get started with their equivalents in python.

Pandas is the go-to Python library for data manipulation and analysis. It introduces the DataFrame, or DF: a 2D table structure that looks and feels exactly like a spreadsheet or a Tableau Data Grid, but is manipulated through code rather than visual interactions or clicks.

Clean Step:

Preps clean step is a beautifully speedy way to clean your data, allowing you to filter, clean, group, and split values, as well as create new columns (calculated fields). While this isn't a comprehensive list of either Prep or Pandas' functions, here are some of the clearest equivalents between these two approaches.

Tableau Prep Action	Visual Menu / Calculation	Python (Pandas) Equivalent
Rename Field	Double-click header	`df.rename(columns={'Old': 'New'})`
Change Data Type	Click the Data Type icon (e.g., # to Abc)	`df['Col'] = df['Col'].astype(str)`
Remove Field	Right-click -> Remove	`df.drop(columns=['Col'])`
Filter: Keep Only	Click value -> Keep Only	`df = df[df['Col'] == 'Value']`
Filter: Exclude	Click value -> Exclude	`df = df[df['Col'] != 'Value']`
Calculated Field	Create Calculated Field	`df['New_Col'] = df['A'] + df['B']`
Group Values	Manual Selection / Pronunciation	`df['Col'].replace(['Val1', 'Val2'], 'Group')`
Clean: Uppercase	Right-click -> Clean -> Uppercase	`df['Col'].str.upper()`
Clean: Trim	Right-click -> Clean -> Trim Spaces	`df['Col'].str.strip()`
Split (Automatic)	Right-click -> Split -> Automatic	`df['Col'].str.split(expand=True)`
Split (Custom)	Right-click -> Split -> Custom	`df['Col'].str.split('-', expand=True)[0]`
Handle Nulls	Right-click -> Filter -> Null Values	`df.dropna(subset=['Col'])` or `df.fillna(0)`

Aggregate

While Prep has single centralized aggregate step, Pandas has one groupby step function that you can follow with a function that specifies the type of aggregation.

df.groupby('Column you want to group')['column to aggregate'].sum()

df.groupby('Column you want to group'): This tells Pandas to look for unique values in your grouping column. In Prep, this is the same as dragging a field into the Grouped Fields bucket.
['column to aggregate']: This filters the data down to just the column you want to do math on. In Prep, this is like dragging a field into the Aggregated Fields bucket.
.sum(): This is the type of aggregation. Prep defaults to Sum, but this is where you’d change the dropdown to Average, Count, etc.

For a business example where you want to sum the sales and profit by region you might use this : regional_totals = df.groupby('Region', as_index=False)[['Sales', 'Profit']].sum()

In order to make sure you get a table output, you should use as_index=False so that the grouped column is still structured as a regular column.

df.groupby('Category', as_index=False)['Sales'].sum()

Feature	Tableau Prep (Aggregate Step)	Pandas Equivalent
Group By	Drag field to Grouped Fields	`df.groupby(['Column'])`
Sum	Drag field to Aggregated Fields → Sum	`.sum()`
Count Distinct	Drag field to Aggregated Fields → Count (Distinct)	`.nunique()`
Min / Max	Drag field to Aggregated Fields → Min/Max	`.min()` or `.max()`
Mean/Average	Drag field to Aggregated Fields → Average	`.mean()`

Join (merge)

While tableau prep allows you to visually select the type of join based on shaded circles, Pandas expects you to specify the type of join using how as a parameter in the merge function.

# Standard Inner Join
merged_df = pd.merge(
    left_df,           # left table
    right_df,          # right table
    on='Join_Column',  # what you're joining on
    how='inner'        # join type
)

If the column you're joining on has different names in each table then you would do it like this:

# Joining when column names differ
merged_df = pd.merge(
    left_df, 
    right_df, 
    left_on='Column_In_Left_Table',   # e.g., 'ID'
    right_on='Column_In_Right_Table', # e.g., 'Customer_ID'
    how='inner'
)

Note that because merge is from the pandas library we specify pd.merge

This is because the standard import for pandas is:

import pandas as pd

so the pd is referring to pandas, but dependent on this style of import.

SQL Joins Explained In <5 Minutes [4 Step-by-Step Examples] — Image from https://www.acuitytraining.co.uk/news-tips/introduction-sql-joins/

Join Type	Tableau Prep	Pandas
Inner	Intersection (Center of Venn)	`pd.merge(df1, df2, how='inner')`
Left / Right	Left or Right circle of Venn	`how='left'` or `how='right'`
Full Outer	Both circles selected	`how='outer'`
Join Clause	The "Applied Join Clauses" pane	`on='ColumnName'` or `left_on`, `right_on`

Thanks for reading and best of luck.

Stay tuned for future blogs on unions, pivoting, and more!

Author:

Sita Pawar

View Profile