Disclaimer: The views and opinions expressed in Expand Mapping are solely those of the author and do not necessarily reflect the views of their employer.
If you’re looking for a simple code-based way to push data to Tableau Cloud as a Hyper extract without using Tableau Bridge, then I hope you find this useful!
Before jumping in, I want to quickly address what businesses fit the mold for an approach like this:
You don’t want to manage Bridge servers and software updates
You prefer code-based data pipelines
You don’t need live connections for your private data (Bridge enables this1)
You don’t have the requirement for Creators to publish and schedule their own net-new published data sources from private data (Bridge enables this2)
The 4th point could warrant its own standalone post. It boils down to what does self-service analytics mean for your organization: how much do you want to empower vs control the business? What is the balance of responsibilities between data engineers vs data analysts? Do you want the business to connect directly to underlying data? Are data engineers a bottleneck for the business? Does the business create junk data sources? … Okay, let’s put a pin in this, as they say, and jump right in.
Here’s a link to the notebook:
Here's a link to the video walkthrough:
I use Colab (Google’s freemium hosted Jupyter notebook) to create a minimal Python/Bash workflow that does the following:
Connects to a database
Creates a Hyper file from a SQL query
Publishes that Hyper file to Tableau Cloud
This workflow does not involve scheduling or alerting - it’s intended to be a template to build off. In future posts, maybe I’ll show how you can schedule this or even make an interactive GUI.
The notebook and video explain each step, so I won’t go into that here. I will, however, talk through some considerations, observations, and quirks in my approach.
I use Snowflake as the sample database. In practice, running a local Jupyter notebook that connects to an on-prem SQL db would be a more representative workflow. However, I wanted to provide something tangible that can be widely distributed and easily accessed. Since Colab is public, I needed the data in my example to be available via public endpoint. This Colab notebook is meant to be copied and tweaked for an individual business’s needs.
I had to use an older version of Pantab and restart the runtime after installing it. I’m not exactly sure why I needed to do this - it was kind of a result of googling and trial and error to get it working as intended. I think the root cause had to do with conflicting dependencies in the Google Colab environment (specifically numpy). If you were using this in production, you would only need to install pantab once - hopefully you don’t run into this issue in your environment, and you’re able to take advantage of the latest version of pantab.
When loading SQL query into Pandas dataframe, I needed to explicitly cast the date columns as ‘datetime’. All other fields came with the correct datatype. I think today pantab doesn’t support geospatial, so if my example included that, I might need to break out the actual Tableau Hyper API. I much prefer the convenience and speed of pantab so looking forward to when geospatial gets added.
I chose not to use Tableau Server Client (TSC - python wrapper for the REST API). In general, my personal preference when using REST API’s is to use Python’s requests library. TSC might be a better fit for you, I don’t know you!
I couldn’t get the requests library to work for the actual publishing step, so I used cURL instead. I was running into an issue with the boundary string, and I couldn’t figure out how to properly format the request. The cURL sample in the documentation worked, so I went with it.
In closing, if you don’t want to use Tableau Bridge and you also don’t want to maintain a script, I’ll leave with you with a couple alternatives to my alternative:
Make a subset of your data available via a public endpoint and add your Tableau Cloud site’s static IP range to your db’s allowlist. Consider masking or removing sensitive fields if they’re not required for your analytics.
Use an ETL tool to either:
Write a subset of your data to a location that Tableau Cloud can query (i.e. accessible via public internet).
Create and push Hyper extracts directly to Tableau Cloud.
I hope you found this useful - please be on the lookout for future content!
Relevant links:
https://help.tableau.com/current/online/en-us/to_bridge_livequery.htm
https://help.tableau.com/current/online/en-us/to_bridge_scale.htm#centralized-management