Apache Superset

CODE

|NAM NGUYEN SON|

Apache Superset: A hidden gem in the heaps of BI tools? – Part I.

This article is part of the short series. Part I. deals with demonstrating the strengths of Superset and how it’s situated in the diverse dataviz market, while the upcoming Part II. will discuss our experience in implementing and testing Superset on the pipeline illustrated with the figure below.

Apache Superset
How we set up our infrastructure 

Introduction

When we talk about interactive data visualization, especially in the context of modern data stacks (e.g. data streamer (Stitch, Fivetran, Airbyte) + centralized cloud storage (AWS, Azure, GCP, Snowflake, Databricks) + transformation tool (dbt) + BI platforms), proprietary tools pop into our head such as Tableau, Power BI, and more recently Looker. These were considered as the mainstream solutions for quite a while, but now, emerging open-source alternatives are hungry enough to beat the market with their simplicity and cost-friendly approach. Among these promising technologies, Apache Superset is set to challenge its competitors in many ways, therefore, Hiflylabs closely monitors the roadmap of the project.

Given the maturity and dynamism of the BI market, consolidations are more frequent than ever. Over the past few years, other commercial tools were acquired by companies such as Tableau by Salesforce, Looker by Google Cloud, or Periscope by Sisense, which can infer the risk of vendor lock-ins forcing customers to migrate and rebuild their assets (e.g. Chart.io was downsized and shut down by Atlassian). Conversely, open-source platforms such as Superset avoid vendor lock-ins since you can both switch between commercial open source (COSS) vendors, and host it yourself on-premise. A good example for the former would be Preset (founded by the creator of Superset, Max Beauchemin), who offers a hassle-free and fully-managed cloud SaaS service to the platform.

So why is it relevant just now?

Well, as the name indicates, Superset started in the Apache incubator back in 2016 and quickly became one of its top priority projects coinciding with the recent stable release at the beginning of 2021. It’s a modern, lightweight, cloud-native, free, and open-source BI web application with an advantageous SQLAlchemy python backend, making it scalable and compatible with almost any database technology speaking SQL. To reflect its increasing popularity, Airbnb, Twitter, Netflix, Amex, and many other companies have already started incubating Superset in their workflows, while Dropbox managed to successfully exploit its advantages at an enterprise level. In addition, the growing community behind Superset further strengthens the argument that there is an undeniable potential in our sight.

Superset essentially stands on three main layers:

  1. Dashboarding:  
    • Seamless interaction with a wide range of tooltips
    • Drag and Drop crafting
    • Multiple ways of sharing your dashboard/charts (JSON, email, URL)
  2. Data Exploration (Slice & Dice):
    • Code-free visualization builder to extract and present datasets
    • Intuitive interface
    • Apply dozens of preset and custom visualization plugins instantly
    • User-defined metrics with scalable semantic layering
    • Lets you view the SQL statement for each visualization
  3. SQL Lab: 
    • A feature-rich SQL IDE written in React
    • A multi-tab environment lets you work on multiple queries at once
    • Metadata browsing of tables, columns, indexes, and partitions
    • Supports long-running queries by persisting query results and dispatching handlers to workers (Celery)
    • Equipped with interactive querying, autocomplete, scheduling, query history, user-defined parameters (with JINJA templating through dbt CLI and dbt Cloud), etc.

How does it perform on the big stage?

Head-to-head with Looker

We have seen many instances for pairing Looker with dbt, but the following arguments made us advocate for Superset:

Why use Superset? - the case study of Dropbox

Let’s see a case study on how Superset was chosen to be implemented into enterprise-level production!

The table below was retrieved from a Dropbox article on Jan 19, 2021. Long story short, they desired to find one particular BI tool to replace multiple other solutions. The main emphasis was on the security, user-friendliness, maintainability, flexibility, and extensibility of the platform. Their choice also landed on Superset as it was the best match for their specific internal needs. As a part of their decision-making process, they provided a comparison table that shows that Superset is superior compared to its peers in terms of the number of features and compatibility.

Data Visualization Platform Comparison Matrix 
 

Data Visualization Platform Comparison Matrix Apache Superset
Source: Dropbox Tech – Why we chose Apache Superset as our data exploration platform

Dropbox explained ditching Metabase with the argument that it's developed in Clojure rather than Python, while to their knowledge, it has shortcomings from many additional aspects compared to Superset, for example, it has less authentication and data backend support.  Since Dropbox has many strong Python developers, they decided to pursue Superset instead of Metabase.

Limitations

Just as other tools, Superset also has its limitations. 

Did Superset spark some interest in you or it still falls short of being a credible solution to your pipeline? What are the missing features which would elevate the product to a day-to-day alternative for you? Make sure that your voice is heard in the comment section below! Also, stay tuned for the second part of the series!

Author: 

Son N. Nguyen - Data Engineer

BI

Explore more stories

The Joy of Thinking

|HIFLYLABS|

Hiflylabs is supporting Flying School, a Math development program for ninth-grade students in spring 2024.

Thanks for the memories – How to fine-tune LLMs

|HIFLYLABS|

Fine-tuning is all about using genAI to fit your own context and goals. Explore our approach for both everyday and business settings. Open-source model Mistral has hardly seen any Hungarian in its training. Yet it learned to speak fluently from only 80k messages!

We want to work with you.

Hiflylabs is your partner in building your future. Share your ideas and let's work together!