On May 28th, dbt Labs announced dbt Fusion, a complete overhaul of the engine that powers dbt.
In this article, I'll cover the core features and goals of dbt Fusion. Then, I'll share my first impressions from putting the public beta through its paces.
What is dbt Fusion?
At its core, dbt Fusion is a ground-up rewrite of the dbt Core engine. It's intended to provide the same functionality but as a more optimized version with some added features as well. The most significant architectural change is the switch from Python to Rust. Rust is renowned for its performance and memory safety, and dbt Labs is leveraging it to deliver a faster, more robust experience.
Understanding Fusion's Performance Gains
The move to Rust promises significant performance improvements. However, it's crucial to understand where you'll feel this speed boost. dbt Fusion accelerates project parsing and SQL compilation—the steps dbt performs locally before sending code to your data warehouse.
The actual execution of your models still happens on your data warehouse, and dbt Fusion or Core has no impact on that runtime. Since parsing and compilation are often a small fraction of a total production run's duration, the effect on your pipeline's end-to-end runtime will likely be negligible.
The primary goal of dbt Fusion, as stated by dbt Labs, is not to shorten pipeline runtimes but to enhance the developer experience. This is the lens through which we should evaluate it.
The dbt Fusion Ecosystem
dbt Labs didn’t just release a new engine, they also introduced a collection of new components designed to work together. dbt Fusion uses new database adapters built on the Apache Arrow DataBase Connectivity (ADBC) standard, aiming for more efficient data transfer. The move to Rust also required a new Jinja engine. Finally, they released a VS Code Extension that leverages Fusion's capabilities to provide a powerful, IDE-native development environment.
dbt Now Understands SQL
Perhaps the most fundamental change is that dbt is no longer just a sophisticated text processor. The Fusion engine can now parse and understand the rendered SQL it generates.
This is a significant change. Previously, dbt treated your model code as a string to be manipulated with Jinja until it was ready to be sent to the warehouse. Now, by parsing the SQL, dbt Fusion enables powerful new capabilities. For instance, it allows for local syntax validation, letting you catch SQL errors directly in your IDE before ever running a command. Furthermore, this deep understanding of your code's structure is what enables a rich IDE integration, powering the features in the new VS Code extension.
The VS Code extension is not a part of the engine itself, but it is arguably the biggest contributor to the "improved developer experience" that dbt Labs is aiming for. It promises a dbt language server with features like:
- Live error detection as you type
- Autocomplete for models, columns, macros, and SQL functions
- "Go to Definition" for ref macros and CTEs
- Data previews for models and CTEs
- Model and column-level lineage visualization
- Informational hovers for database objects
Hands-On with the Beta
Theory is one thing, but how does it work in practice? So I decided to put the beta to the test.
Disclaimer: dbt Fusion is in public beta. The features are incomplete, and bugs are expected. I tested on a Windows machine, so your experience may vary on macOS or Linux.
Installation and Setup
First, you need to install dbt Fusion. It's distributed as a standalone binary executable, which is very convenient. You don't need to have Python installed and you can avoid the hassle of creating and managing virtual environments. Just download the binary and add it to your PATH.
You can find the official installation guide here: Install dbt Fusion
To try Fusion, your project must use a supported data platform. At launch, only Snowflake was available, with Databricks and BigQuery adapters planned for release in June. Additionally, Python models are not yet supported, so if your project relies on them, you'll have to wait.
Your project might also use deprecated YAML configurations. While dbt Core tolerates these with warnings, dbt Fusion will eventually fail on them. You can use the dbt-autofix tool to easily update your project's syntax. You can run it simply via uvx, it worked perfectly for me.
The Command Line Experience
Running a few CLI commands, the speed improvement is noticeable. However, I ran into some rough edges:
- Terminal Issues: Using Git Bash on Windows, the terminal would frequently hang after a command finished, forcing me to kill the session and start a new one. This can be quite disruptive.
- Error Messages: Error and warning outputs often lack a proper file reference. Instead of a message like Error in models/my_model.sql:3:10, I would see Error in <unknown>:3:10, making debugging much harder.
Closer look at the VS Code Extension
The VS Code extension is where the promise of an enhanced developer experience really lies. Unfortunately, in its current beta state, it falls short of expectations.
It did not work for me out-of-the-box, I had to do some troubleshooting. The setup requires you to register the extension with dbt Cloud and they advise you to turn of all other dbt related extensions that you may be using. Even after completing all these steps, the extension did not deliver the experience I was hoping for.
Missing Features
Most of the advanced editing features were completely non-functional for me. Autocomplete, "Go to Definition," live error detection, and informational hovers on tables/columns did not work at all. Despite significant time spent troubleshooting, I couldn't get them running, and the LSP server provides very little feedback for debugging.
Unreliable Features
- Compiled Preview: The button to view a model's compiled SQL seems to open a cached version from the target/ directory rather than compiling the current code on the fly. To see recent changes, I had to manually run dbtf compile or restart the LSP server.
- Data Preview: This feature was hit-or-miss. The preview tab would often get stuck on the loading screen, even when the corresponding terminal task appeared to log the data returned from the data warehouse. Sometimes in those cases, clicking "Cancel" on the loading screen would cause the data to appear. As an additional feature, it would be great to see the actual query sent to the data platform to verify Jinja compilation.
Working Features
On a positive note, the model and column-level lineage visualization works well. It's a great feature, though the user experience could be improved. On large projects, the graph becomes overcrowded quickly. Features like filtering on the UI or focusing the graph would be a welcome addition.
Final Thoughts
dbt Fusion represents a new direction for the future of dbt, especially if we consider the license changes as well (a topic that deserves its own article). While the ability to parse SQL and the new architecture unlock immense potential, the beta shows there is still a long way to go. While the CLI has some quirks, the VS Code extension - the key to the promised developer experience - is currently very rough.
For Fusion to see widespread adoption, these issues must be ironed out during the beta period to ensure a seamless transition from dbt Core at general availability. The team is actively working on fixes, as seen in the project's GitHub repository. It will be worth checking in on their progress in a few weeks.