Python is applied in AI development, Data Science, Machine Learning and other applications. Tableau integrated Python script in Tableau Desktop from Tableau version 2019 and later. According to the Tableau page, TabPy allows Tableau users to remotely execute Python code and saved functions. To run the Python script in Tableau Desktop, we will need to install Python and TabPy. In this blog, I am going to share how to install Python, TabPy and run the Python script in Tableau Desktop. I will walk through step-by-step:
1/ Install Python and TabPy
2/ Set up the TabPy environment in Tableau Desktop
3/ Write Python function and run the Python code in Tableau Desktop
1/ Install Python and TabPy
- Install Python 3.10
First of all, you need to download Python 3.10 because later we will need to install TabPy. But TabPy only works with Python 3.7, 3.8, 3.9, and 3.10. Therefore, I recommend installing Python 3.10. You can download Python 3.10 here.
If you already installed Python in another version, you can uninstall that version and install version 3.10. If you don't want to uninstall the current version, you can edit the PATH.
Depending on your computer, you can choose which version to download. I chose to download Windows Installer (64-bit) (Image 1).
After downloading the Python 3.10 file, you can open to install. The Python Setup window opens, make sure you check Add Python 3.10 to PATH. It will let the computer know that you installed Python already with the directory path (Image 2). Then, install Python.
After the installation is done, open Command Prompt in your computer to check if Python is successfully installed in your computer. In the Search box, type Command Prompt. Then, type python --version and enter. If you see the result Python 3.10.0, it means Python was installed successfully and the PATH is correct (Image 3).
- Install TabPy 2.9
After installing Python 3.10.0 in your computer, we will install tabpy 2.9 next. Before installing tabpy 2.9, I recommend you read the TabPy documentation first. From the documentation page, in the TabPy Installation part, you should update the pip first by typing: python -m pip install --upgrade pip in the Command Prompt.
Then, you can install TabPy by typing: pip install tabpy. It takes a few minutes to install TabPy. After the installation is done, you can start Tabpy by typing in the Command Prompt: tabpy. When you see Web Service listerning on port 9004, that is the default port for TabPy server (Image 4). It means our installation is done and TabPy server is working. Then, we can move to the next step in Tableau Desktop.
Note that, keep the Command Prompt window opens, don't close it.
2/ Set up TabPy environment in Tableau Desktop
I opened Tableau Desktop and choose Sample SuperStore for demo. Then, you go to Help > Settings and Performance > Manage Analytics Extension Connection... (Image 5) to connect with TabPy environment in port 9004.
Then, the Manage Analytics Extension Connection window opens. In the Hostname box, I type localhost. In the Port box, I type 9004. Then, I click on Test Connection button to test. If you see a pop up shows Successfully connected to the analytics extension, it means you connected to TabPy (Image 6).
If you check the Command Prompt window, there are a few lines showing that connection is good. If you see the code 200 GET, it means the request to the TabPy server is good (Image 7). Now, you are ready to use Python in Tableau Desktop.
3/ Write Python script and run the Python code in Tableau Desktop
In this part, I will share some SCRIPT functions built in Tableau. You can read the documentation page about 4 SCRIPT functions in Tableau here. I will demo each SCRIPT function in this blog. Let's start with the SCRIPT_BOOL first.
- SCRIPT_BOOL(<string>, <expression>,...)
This function will return the Boolean result (True or False) based on the expression. For example: I would like to check if the Sum of profit is positive or not. If sum of profit is positive then return True; otherwise, return False.
In Tableau Desktop, I created a new Calculated Field called BOOLEAN_Profit. I write the SCRIPT_BOOL function:
SCRIPT_BOOL("
profit = _arg1[0]
result = True if profit > 0 else False
return result
", SUM([Profit])
)
Inside the double quotation marks, I type Python code. Outside of the quotation mark and after the comma is the argument that we will pass in.
On the first line, I set profit = _arg1[0] where _arg1 is the argument that we will pass the values in. In our case, _arg1 is SUM([Profit]). Besides that, I called the first element of the argument which is _arg1[0] because we have a Profit column with many records. In the row level, it will get one element from the Profit column and assign it to the profit variable.
On the second line, I set result = True if profit >0 else False. It means that if the profit variable that we set on the first line greater than 0, then True; otherwise, return False. On the third line, I return the result which is Boolean result (True or False).
After the comma is the argument that we will pass in. I passed the Sum of Profit value into the Python code, so Sum of Profit will be _arg1[0].
Then, I drag the BOOLEAN_Profit pill into the Text Marks. In Rows, I drag Category, Sub-Category, Manufacture and SUM(Profit) all in Discrete. It will show the result as Image 8.
- SCRIPT_INT(<string>, <expression>,...)
SCRIPT_INT will return the integer result based on the specified expression. For example: I would like to calculate the final Sale price after discount.
I created a new calculated field called Final Price after Discount. Then, I type the SCRIPT_INT function as (Image 9):
SCRIPT_INT("
sale = _arg1[0]
discount = _arg2[0]
final_price = sale - ( (sale * discount) /100 )
return final_price
", SUM([Sales]), SUM([Discount]) )
On the first line, I assigned the first argument _arg1[0] which is Sum of Sales to the sale variable. On the second line, I assigned the second argument _arg2[0] which is Sum of Discount to the discount variable. On the third line, I calculate the final price after discount; then, I assigned the value to the final_price variable. On the last line, I return the final_price value.
Outside of the quotation mark and after the comma, I passed in the argument in order. The first argument is sum of sales, the second argument is sum of discount in row level.
You can also check the Command Prompt to see how it works. After getting the result, the result will be posted on server. If everything works good, you will see 200 POST code (Image 10).
- SCRIPT_REAL(<string>, <expression>,...)
Same as the SCRIPT_INT function, but the SCRIPT_REAL returns the real type result (with decimal places). I copy the same code as I did in SCRIPT_INT. But at this time, I changed the SCRIPT_INT function to SCRIPT_REAL. I created a new Calculated Field called Final Price in Real Type.
SCRIPT_REAL("
sale = _arg1[0]
discount = _arg2[0]
final_price = sale - ( (sale * discount) /100 )
return final_price
", SUM([Sales]), SUM([Discount]) )
Then, I drag that pill into the Rows in Discrete to compare with the SCRIPT_INT result (Image 11).
- SCRIPT_STR(<string>, <expression>,...)
The SCRIPT_STR function returns a string result from the specified expression. For example: I would like to concatenate the Sub-Category and Manufacturer together.
I created a new Calculated Field called Concat Sub_cat and Man. Then I type the SCRIPT_STR function:
SCRIPT_STR("
sub_cat = _arg1[0]
manu = _arg2[0]
deli = ' - '
result = sub_cat + deli + manu
return result
", ATTR([Sub-Category]), ATTR([Manufacturer]))
I assigned the first argument to sub_cat variable and the second argument to manu variable. I also set the delimiter by deli variable with ' - '. Then, the return result is concatenating all three variables. Remember that the fields that we passed in should be in aggregate function. For example, I used ATTR function for Sub-Category and Manufacturer fields.
Then drag the Concat Sub_cat and Man pill into the Rows (Image 12).
Besides those 4 SCRIPT functions, you can also deploy Python function and post it on TabPy server. You can go to the TabPy documentation page here to research more. You can try with TabPy in Jupyter Notebook. I installed Anaconda in my computer, so I open Anaconda Prompt to install TabPy (same as I did in the Command Prompt).
In this blog, I shared how to install Python and TabPy, how to connect to TabPy server on Tableau Desktop, and how to write 4 SCRIPT functions in the calculated field to run the Python code. TabPy is helpful in many analysis cases such as Principle Component Analysis (PCA), Sentiment Analysis, T-Test, ANOVA (Analysis of Variance).
I hope you enjoy this blog and hope to see you soon in another blog!