From SQL DAGs to Databricks Jobs: How We Automated Snowflake Task Migration
Migrating Snowflake Tasks to Databricks Jobs used to be tedious manual work. We built a simple automation tool that converts SQL DAGs to Databricks YAML workflows, saving 80% of orchestration migration effort.
At Hiflylabs, we’ve seen increasing demand from clients who want to migrate to Databricks. While helping them with these migrations, we noticed one recurring use case that consistently took more time than it should: moving Snowflake Tasks to Databricks.
Snowflake Tasks
If you’re not familiar with Snowflake tasks, you’re not alone. Most people use dbt or some other transformation and scheduler tool to manage their pipelines. However, for simpler use cases, Snowflake introduced Tasks and Workflows. Tasks are SQL statements that can be chained together to define directed acyclic graphs (DAGs), resulting in SQL-only data pipelines.
You can also call stored procedures inside Tasks, but that wasn’t our focus for this project. That’s a topic for another post.
Databricks Jobs
Databricks Jobs (or, more precisely, Lakeflow Jobs), on the other hand is a "workflow automation for Databricks, providing orchestration for data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow. You can optimize and schedule the execution of frequent, repeatable tasks and manage complex workflows."
They consist of one or more tasks, support branching, and provide a clean visual UI.
They’re widely used across data engineering and data science pipelines because of their ease of use.
Experimenting With LLMs
Migrating between two vastly different platforms comes with numerous challenges. The migration itself isn’t overly complex, but it used to be tedious manual work, so automating the migration was estimated to save around 80% of the effort.
Initially, I thought this could be easily solved by prompting an LLM accordingly.
I tried building a prototype that would get a detailed prompt of what I want to do: migrate some SQL files that make up a DAG and generate a Databricks job YAML file. Sort of what you would do when doing AI assisted development, but accompanied by a couple of extra validation and testing steps in the end.
In retrospect, this approach was overly ambitious and somewhat naive. Prompts had to be extremely precise, handle varying context sizes (imagine a DAG with 50 complex SQL statements), and still produce a valid YAML file in the format Databricks expects. I experimented with several prompts, but none of them produced satisfying results. After sharing my findings with the team, we concluded it was time to switch to a "non-LLM-assisted" approach.
Simple, But Effective
While talking to our CTO at Hiflylabs, András Zimmer, he mentioned that great solutions don’t have to be complex. Maybe we can achieve similar, or even better results by using a deterministic approach.
So, I went back to my desk to create a simple prototype. I already had a clear idea of the building blocks:
get all necessary information of the Snowflake tasks DAG
convert it to a json file that represents the pipeline
convert the json file to the correct Databricks YAML format
I asked my favorite AI coding assistant to help me get started by generating the initial scripts I needed. This went surprisingly well, even the first output was close to what I wanted. After a few adjustments, the code quality wasn’t great but sufficient for our proof of concept.
Because we liked the results and feel like the this should be useful for other teams, we decided to share the solution, so below I'll describe how to use it.
Automating Migration of Snowflake Tasks to Databricks Jobs
First, download the CSV output of the SHOW TASKS command in Snowflake. You might need to apply additional filters to include only the tasks belonging to your specific DAG, otherwise tasks from other pipelines could confuse our tool.
Moves all SQL commands into separate files. These can be uploaded later by the user to the Databricks Workspace. We will reference the workspace paths in our YAML file later.
Generates a json file which describes the pipeline, having the following structure:
After this step, we need to convert the json into the expected YAML format of Databricks: python databricks_jobs_converter.py task_pipelines.json user@email.com -o databricks_workflow.yml
Notice the "user@email.com" parameter: this will be used to create the proper path to the SQL files in our Databricks workspace. In order to reference our user's workspace files we need to know this email address.
This command generates a YAML file formatted like the example below:
Now most of the magic is done, so it's time to open up Databricks.
First, we need to upload all SQL files to the user workspace. In Databricks, navigate to your user’s directory (typically Workspace/Users/<your_email>). Click the three-dot menu in the top-right corner and select Import. Then, upload all your generated SQL files.
Finally, let's create a Databricks job using the generated YAML file.
Go to Jobs & Pipelines
Click on: Create Job
Click on the 3 dots next to 'Run now' and select 'Edit as YAML'
Overwrite the current content with your YAML file's content
Click on 'Save'
Go to the 'Tasks' tab
By this point, you should see the pipeline with all dependencies in the job preview.
Conclusion
This tool isn’t magic, but it saves significant time and reduces manual effort. What used to be a painful, manual process is now a clean, repeatable workflow.
This tool isn’t magic, but it efficienty handles the logic of orchestration, acting as a reliable component within our larger toolkit. For the heavy lifting of complex code conversion, our Smart Migration Assistant leverages advanced AI to ensure accuracy and speed.
If you want a migration approach that combines the precision of deterministic tools with the power of AI, check out our Databricks migration services.
A tool that fits one private equity firm might completely derail another. UX designers must keep specific users in mind while fulfilling specific requirements of a highly sensitive industry. Here’s how we do it.
If you're in sales or management, it can be tough to tell what matters from noise. Here’s our reality check for the 2025 tech market, covering where AI truly stands today.
Flying high with Hifly
We want to work with you
Hiflylabs is your partner in building your future. Share your ideas and let’s work together.