Migrating between Tableau Prep and Python

Despite coming from an R and Python background, I've become a recent convert to the node based workflows of Tableau Prep and Alteryx. I've come to really appreciate the visual flows and the ease with which the different functions can be explained and documented.

Learning, or relearning, many of these data cleaning and preparation functions has involved translating between Python code and Prep's nodes. I'm often aware of how I might do something in Python and trying to translate it to prep, or vice versa, particularly if I want more customization that Prep might allow.

This article provides an introductory guide to how Python functions, primarily using the Pandas library, correlate to Prep functions (and vice versa). This will cover prep's clean, aggregate, and join and how you can get started with their equivalents in python.

Pandas is the go-to Python library for data manipulation and analysis. It introduces the DataFrame, or DF: a 2D table structure that looks and feels exactly like a spreadsheet or a Tableau Data Grid, but is manipulated through code rather than visual interactions or clicks.

Clean Step:

Preps clean step is a beautifully speedy way to clean your data, allowing you to filter, clean, group, and split values, as well as create new columns (calculated fields). While this isn't a comprehensive list of either Prep or Pandas' functions, here are some of the clearest equivalents between these two approaches.

Tableau Prep ActionVisual Menu / CalculationPython (Pandas) Equivalent
Rename FieldDouble-click headerdf.rename(columns={'Old': 'New'})
Change Data TypeClick the Data Type icon (e.g., # to Abc)df['Col'] = df['Col'].astype(str)
Remove FieldRight-click -> Removedf.drop(columns=['Col'])
Filter: Keep OnlyClick value -> Keep Onlydf = df[df['Col'] == 'Value']
Filter: ExcludeClick value -> Excludedf = df[df['Col'] != 'Value']
Calculated FieldCreate Calculated Fielddf['New_Col'] = df['A'] + df['B']
Group ValuesManual Selection / Pronunciationdf['Col'].replace(['Val1', 'Val2'], 'Group')
Clean: UppercaseRight-click -> Clean -> Uppercasedf['Col'].str.upper()
Clean: TrimRight-click -> Clean -> Trim Spacesdf['Col'].str.strip()
Split (Automatic)Right-click -> Split -> Automaticdf['Col'].str.split(expand=True)
Split (Custom)Right-click -> Split -> Customdf['Col'].str.split('-', expand=True)[0]
Handle NullsRight-click -> Filter -> Null Valuesdf.dropna(subset=['Col']) or df.fillna(0)

Aggregate

While Prep has single centralized aggregate step, Pandas has one groupby step function that you can follow with a function that specifies the type of aggregation.

df.groupby('Column you want to group')['column to aggregate'].sum()

  1. df.groupby('Column you want to group'): This tells Pandas to look for unique values in your grouping column. In Prep, this is the same as dragging a field into the Grouped Fields bucket.
  2. ['column to aggregate']: This filters the data down to just the column you want to do math on. In Prep, this is like dragging a field into the Aggregated Fields bucket.
  3. .sum(): This is the type of aggregation. Prep defaults to Sum, but this is where you’d change the dropdown to Average, Count, etc.

For a business example where you want to sum the sales and profit by region you might use this : regional_totals = df.groupby('Region', as_index=False)[['Sales', 'Profit']].sum()

In order to make sure you get a table output, you should use as_index=False so that the grouped column is still structured as a regular column.

df.groupby('Category', as_index=False)['Sales'].sum()

FeatureTableau Prep (Aggregate Step)Pandas Equivalent
Group ByDrag field to Grouped Fieldsdf.groupby(['Column'])
SumDrag field to Aggregated Fields → Sum.sum()
Count DistinctDrag field to Aggregated Fields → Count (Distinct).nunique()
Min / MaxDrag field to Aggregated Fields → Min/Max.min() or .max()
Mean/AverageDrag field to Aggregated Fields → Average.mean()

Join (merge)

While tableau prep allows you to visually select the type of join based on shaded circles, Pandas expects you to specify the type of join using how as a parameter in the merge function.

# Standard Inner Join
merged_df = pd.merge(
    left_df,           # left table
    right_df,          # right table
    on='Join_Column',  # what you're joining on
    how='inner'        # join type
)

If the column you're joining on has different names in each table then you would do it like this:

# Joining when column names differ
merged_df = pd.merge(
    left_df, 
    right_df, 
    left_on='Column_In_Left_Table',   # e.g., 'ID'
    right_on='Column_In_Right_Table', # e.g., 'Customer_ID'
    how='inner'
)

Note that because merge is from the pandas library we specify pd.merge

This is because the standard import for pandas is:

import pandas as pd

so the pd is referring to pandas, but dependent on this style of import.

SQL Joins Explained In <5 Minutes [4 Step-by-Step Examples]
Image from https://www.acuitytraining.co.uk/news-tips/introduction-sql-joins/

Join TypeTableau PrepPandas
InnerIntersection (Center of Venn)pd.merge(df1, df2, how='inner')
Left / RightLeft or Right circle of Vennhow='left' or how='right'
Full OuterBoth circles selectedhow='outer'
Join ClauseThe "Applied Join Clauses" paneon='ColumnName' or left_on, right_on

Thanks for reading and best of luck.

Stay tuned for future blogs on unions, pivoting, and more!

Author:
Sita Pawar
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2026 The Information Lab