dbt Fusion: A First Look and Hands-On Review

dbt Labs recently announced dbt Fusion, a complete overhaul of the dbt Core engine built in Rust. It promises to significantly improve the developer experience. In this article, we test its core features and share our hands-on experience with the public beta, exploring what works, what doesn't, and what potential it holds for the future of dbt development.

GÁBOR TÓTH
|
|

On May 28th, dbt Labs announced dbt Fusion, a complete overhaul of the engine that powers dbt. 

In this article, I'll cover the core features and goals of dbt Fusion. Then, I'll share my first impressions from putting the public beta through its paces.

 

What is dbt Fusion?

At its core, dbt Fusion is a ground-up rewrite of the dbt Core engine. It's intended to provide the same functionality but as a more optimized version with some added features as well. The most significant architectural change is the switch from Python to Rust. Rust is renowned for its performance and memory safety, and dbt Labs is leveraging it to deliver a faster, more robust experience.

 

Understanding Fusion's Performance Gains

The move to Rust promises significant performance improvements. However, it's crucial to understand where you'll feel this speed boost. dbt Fusion accelerates project parsing and SQL compilation—the steps dbt performs locally before sending code to your data warehouse.

The actual execution of your models still happens on your data warehouse, and dbt Fusion or Core has no impact on that runtime. Since parsing and compilation are often a small fraction of a total production run's duration, the effect on your pipeline's end-to-end runtime will likely be negligible.

The primary goal of dbt Fusion, as stated by dbt Labs, is not to shorten pipeline runtimes but to enhance the developer experience. This is the lens through which we should evaluate it.

 

The dbt Fusion Ecosystem

dbt Labs didn’t just release a new engine, they also introduced a collection of new components designed to work together. dbt Fusion uses new database adapters built on the Apache Arrow DataBase Connectivity (ADBC) standard, aiming for more efficient data transfer. The move to Rust also required a new Jinja engine. Finally, they released a VS Code Extension that leverages Fusion's capabilities to provide a powerful, IDE-native development environment.

 

dbt Now Understands SQL

Perhaps the most fundamental change is that dbt is no longer just a sophisticated text processor. The Fusion engine can now parse and understand the rendered SQL it generates.

This is a significant change. Previously, dbt treated your model code as a string to be manipulated with Jinja until it was ready to be sent to the warehouse. Now, by parsing the SQL, dbt Fusion enables powerful new capabilities. For instance, it allows for local syntax validation, letting you catch SQL errors directly in your IDE before ever running a command. Furthermore, this deep understanding of your code's structure is what enables a rich IDE integration, powering the features in the new VS Code extension.

The VS Code extension is not a part of the engine itself, but it is arguably the biggest contributor to the "improved developer experience" that dbt Labs is aiming for. It promises a dbt language server with features like:

  • Live error detection as you type
  • Autocomplete for models, columns, macros, and SQL functions
  • "Go to Definition" for ref macros and CTEs
  • Data previews for models and CTEs
  • Model and column-level lineage visualization
  • Informational hovers for database objects

 

Hands-On with the Beta

Theory is one thing, but how does it work in practice? So I decided to put the beta to the test.

Disclaimer: dbt Fusion is in public beta. The features are incomplete, and bugs are expected. I tested on a Windows machine, so your experience may vary on macOS or Linux.

 

Installation and Setup

First, you need to install dbt Fusion. It's distributed as a standalone binary executable, which is very convenient. You don't need to have Python installed and you can avoid the hassle of creating and managing virtual environments. Just download the binary and add it to your PATH.

You can find the official installation guide here: Install dbt Fusion

To try Fusion, your project must use a supported data platform. At launch, only Snowflake was available, with Databricks and BigQuery adapters planned for release in June. Additionally, Python models are not yet supported, so if your project relies on them, you'll have to wait.

Your project might also use deprecated YAML configurations. While dbt Core tolerates these with warnings, dbt Fusion will eventually fail on them. You can use the dbt-autofix tool to easily update your project's syntax. You can run it simply via uvx, it worked perfectly for me.

 

The Command Line Experience

Running a few CLI commands, the speed improvement is noticeable. However, I ran into some rough edges:

  • Terminal Issues: Using Git Bash on Windows, the terminal would frequently hang after a command finished, forcing me to kill the session and start a new one. This can be quite disruptive.
  • Error Messages: Error and warning outputs often lack a proper file reference. Instead of a message like Error in models/my_model.sql:3:10, I would see Error in <unknown>:3:10, making debugging much harder.

 

Closer look at the VS Code Extension

The VS Code extension is where the promise of an enhanced developer experience really lies. Unfortunately, in its current beta state, it falls short of expectations.

It did not work for me out-of-the-box, I had to do some troubleshooting. The setup requires you to register the extension with dbt Cloud and they advise you to turn of all other dbt related extensions that you may be using. Even after completing all these steps, the extension did not deliver the experience I was hoping for.

 

Missing Features

Most of the advanced editing features were completely non-functional for me. Autocomplete, "Go to Definition," live error detection, and informational hovers on tables/columns did not work at all. Despite significant time spent troubleshooting, I couldn't get them running, and the LSP server provides very little feedback for debugging.

 

Unreliable Features

  • Compiled Preview: The button to view a model's compiled SQL seems to open a cached version from the target/ directory rather than compiling the current code on the fly. To see recent changes, I had to manually run dbtf compile or restart the LSP server.
  • Data Preview: This feature was hit-or-miss. The preview tab would often get stuck on the loading screen, even when the corresponding terminal task appeared to log the data returned from the data warehouse. Sometimes in those cases, clicking "Cancel" on the loading screen would cause the data to appear. As an additional feature, it would be great to see the actual query sent to the data platform to verify Jinja compilation.

 

Working Features

On a positive note, the model and column-level lineage visualization works well. It's a great feature, though the user experience could be improved. On large projects, the graph becomes overcrowded quickly. Features like filtering on the UI or focusing the graph would be a welcome addition.

 

Final Thoughts

dbt Fusion represents a new direction for the future of dbt, especially if we consider the license changes as well (a topic that deserves its own article). While the ability to parse SQL and the new architecture unlock immense potential, the beta shows there is still a long way to go. While the CLI has some quirks, the VS Code extension - the key to the promised developer experience - is currently very rough.

For Fusion to see widespread adoption, these issues must be ironed out during the beta period  to ensure a seamless transition from dbt Core at general availability. The team is actively working on  fixes, as seen in the project's GitHub repository. It will be worth checking in on their progress in a few weeks.

Article by GÁBOR TÓTH
dbt
Modern Data Stack

Explore more stories

  • UX in the age of AI agents

    Building Digital Products Around AI Agents - UX Meetup Recording

    From more guardrails by developers to the UX challenge of showcasing sources in a non-deterministic system, the industry is still finding its grip on the whole process. We looked into the nitty-gritty details at our recent meetup on UX in the age of AI agents—check out the recording below!

  • Anonymization in Unstructured Data: A Guide for IT & Technical Managers - Part 2

    Anonymizing unstructured data like medical records or legal documents is much harder than with structured data. The primary challenge is identifying sensitive information (PII) within free-form text, which can be obscured by jargon, abbreviations, and OCR errors. This guide explores the viable approaches, from simple rule-based systems to advanced Machine Learning and hybrid models.

  • Streamlining Real-Time Data Pipelines with Lakeflow Declarative Pipelines

    More efficient development, simplified maintenance, and a higher level of accessibility. A power tool in the Databricks ecosystem, Lakeflow Declarative Pipelines simplifies the creation of data pipelines while providing a declarative framework, allowing data engineers to focus on the desired state of the pipeline instead of getting lost in details.

Flying high with Hifly

We want to work with you

Hiflylabs is your partner in building your future. Share your ideas and let’s work together.