Tableau users usually apply built-in functions to calculate, filter, and retrieve data. However, in some complicated problems, the built-in function in Tableau may not be useful in solving the problem. We can consider using the Python script through TabPy to solve the problem. Before, I wrote 2 blogs about how to install TabPy, run Python script (link), and import and run saved Python functions in Tableau (link). From those blogs, you can apply some SCRIPT built-in functions in Tableau to run Python scripts.
In this blog, I will share 3 ways to apply Python script in Tableau Desktop. In the end, I will compare the pros, and cons of each way and when to apply it. 3 ways include:
- Table Extension in the Data Source pane
- Run a Python file (.py) to import functions in the Tableau Desktop
- Deploy the Python functions on TabPy from Jupyter Notebook
The requirement for doing this, you need to install Python 3.10 and TabPy on your computer. You can check the link above. I shared how to install TabPy. In this blog, I use the Superstore dataset in Tableau as an example dataset and my Tableau Desktop version is 2024.1.4.
Now, if you all are ready. Let's get started!
Table Extension in the Data Source Pane
First of all, make sure your Tableau Desktop is already connected to TabPy. I'm using TabPy on my local machine, so I put localhost in Hostname and port number 9004 in the Port box. If TabPy is installed on the server/cloud, you should contact the IT team to get this info. Then, click on Test Connection to ensure that you successfully connected to TabPy, and click on the Save button. (Fig. 1)
In the Data Source pane, I already selected the Data Source (which is Sample-Superstore). There are many tables below (Orders, People, Returns). At the bottom of the list, the "New Table Extension" option is located at the bottom. Then, drag the "New Table Extension" option into the canvas on the right. (Fig. 2)
Then, drag the Orders table into the Table Extension window. In Fig. 3, there are 4 windows in the view.
- The top left one is where you can join and union multiple tables as in the Physical Layer.
- The top right window is where you will type the Python script.
- The bottom left will show the Table Extension metadata. You can rename the table extension. It also shows all the fields and their data type in the Output Table.
- The bottom right window shows the table of the Input dataset and the Output table after applying the Python script.
Now, I am going to type the Python script in the top right window. The input dataset (which is the Orders table) is represented as _arg1. I create a data frame df from the Pandas package. The output in the Python script should be a dictionary. In the end, I convert the data frame into a dictionary by using the to_dict function. (Fig.4)
In the Output Table menu on the bottom right, my 3 new columns are assigned at the end of the dataset. I also remove 2 columns (Country/Region and Postal Code) by using the drop function. In the Output Table, those 2 columns were removed.
I can also define a function and call it there. In the example below, I declared a function called upper_case_string and applied it to the Customer Name field. (Fig. 5)
To get more information about the Table Extension in Tableau, you can read the documentation page from Tableau here.
Import a Python File (.py) to apply Python Functions
I used to write a blog to share about this option before. You can check it here. I will summarize some important steps here.
First, you need to create a Python file (.py). You can use any Text Editor to type the script or download the .py file from the Jupyter Notebook.
For example: I declared some functions and saved them under the file called le_list_all_functions.py. (Fig. 6)
Second, I need to let TabPy know where it can pull those functions to use in Tableau. To do that, I open the Command Prompt. Then type:
set PYTHONPATH=%PYTHONPATH%;*<yourdirectorypath>*
After setting the Python Path, you can run TabPy from the Command Prompt and Connect to TabPy again in Tableau Desktop.
In Tableau Desktop, I created a new calculated field called testing. In this calculated field, I use the upper_char2 function from the list of functions I saved in the Python file. I know the return data type is a string, so I used the SCRIPT_STR function. (Fig. 8)
SCRIPT_STR("
from le_list_all_functions import upper_char2
return upper_char2(_arg1)
",MIN([Sub-Category]))
Now, you store functions in a Python file or many files in the folder. In Tableau Desktop, only need to call the file name and import the function that you want to use with 4 SCRIPT built-in functions.
Deploy Python Functions on TabPy from Jupyter Notebook
For this option, you can choose any Python IDE (Integrated Development Environment) to do. I mostly use Jupyter Notebook, so I will work on it for this section.
First of all, you need to install the TabPy client first. In the command prompt and/or your Anaconda prompt, you can install by typing the command: pip install tabpy-client. Then run tabpy in the command prompt.
After installing the TabPy Client, I open Jupyter Notebook or any Python IDE you are familiar with. Then, I import the client function to set the Host and Port number. You can also set the credentials (I use TabPy on my local machine, so I didn't set up the credentials). (Fig. 9)
Then, I copy or type the function that I declared before and run it. Note that the return value must be a single value or a list of values. In this example, I returned the list of values (the squared bracket). (Fig. 10)
Then, I deploy the function on the server by using the deploy function. In Jupyter Notebook, when you type client.deploy( <then press Shift+Tab> ). It will show the documentation of that function. (Fig. 11)
For example: if I want to deploy the function upper_char2 that I declared above, I will type:
To delete the deployed function: client.remove(' <unique_identified_name> '). For example: client.remove('Upper_character')
After running that line, I type localhost:9004 (the Host and Port number for my Tabpy on the local machine) in the browser. Then, the page will list deployed models on the TabPy server for you. (Fig. 13)
Now, let's open Tableau Desktop to apply that function from TabPy server.
I created a new calculated field. Then, use 1 of 4 SCRIPT built-in functions in Tableau. I know that the returned data type is a string, so I use SCRIPT_STR in this case. The syntax is:
SCRIPT_function (
"return tabpy.query( ' <unique_identified_name> ' , _arg1, ... ,[_argn] ") ['response']
, aggregate_function( <field_1> ), ..., aggregate_function<field_n>
)
Then, click on OK. In Fig. 14, I dragged that calculated field in the same Rows with Sub-Category. It capitalized all characters in the string for me.
I recommend reading the TabPy documentation page. There are many Python scripts that you can apply (ANOVA, tTest, PCA, Sentiment Analysis). You can access that page here.
At this point, I shared all 3 ways to apply the Python script in Tableau Desktop. Each way has pros and cons, so I summarized all points from my perspective in a table. From this table, I hope you will decide which option is good for you to solve the problem.
If you are planning or working on complex problems and need to apply Python script to solve them, I hope this blog is useful.
Thank you for reading and see you in the next blog!