Streamline your CI pipelines with Dagger. Write portable pipelines in your favorite language and execute on any CI/CD environment.
As an Analytics Engineer, your development workflow likely involves various tools for linting models, testing Looker views, diffing datasets, and so on. However, managing all these tools can be time-consuming and challenging.
This post explores how easy it is to leverage better CI pipelines with Dagger and create custom CI pipelines that fit together like Lego pieces.
A typical Analytics Engineer's development workflow consists of many tools.
We use sqlfluff to lint the models, git hooks to raise issues automatically, Spectacles for testing Looker, and the Datafold CLI to compare datasets. This list varies based on the different business needs/environments.
Often you want to execute these processes:
locally before a commit/PR
automatically as part of some CI automation
Tools tends to use different interfaces for invocation. You can use some of them as git hooks and others with fancy CLIs, but managing/setting up all this can be time-consuming and challenging for less tech-savvy Analytics Engineers.
Previously, I used to wrap most of these as GitHub Actions. If you're lucky, you can find Github Actions provided by the vendor or the community that you can plug and play, but in many cases, you need to create your workflow. This setup works well, but not all CI tools have the same thrilling community as GH.
Once the action was properly set up, I used a tool called act to execute the CI pipelines locally.
Although I liked this process because it enabled us to evaluate changes faster, with this approach, it was still painful to reproduce these actions in different CI runners (e.g., from GitHub Actions to Gitlab-CI).
In the following paragraphs, I will show you how easy it is to solve these problems using Dagger.
So, what on earth is Dagger, and why should you care?
Dagger is an open-source programmable CI/CD engine created by Solomon Hykes, the founder of Docker. It makes it easy to develop portable CI pipelines in your favorite programming language that executes entirely on standard OCI containers...
CI/CD as Code
Unlike conventional CI/CD tools, Dagger lets you write a pipeline as code instead of writing proprietary YAML, you can write your CI/CD code in CUE, Go, Python, or NodeJS. This approach makes it easy to create dynamic pipelines and enables you to test your CI/CD just like any other project element.
You can test and debug instantly on your local machine. No need to push your changes to trigger the CI pipeline can run it anytime in your own isolated environment. This enables you to get instant feedback on the impact of your changes.
Dagger executes your pipelines entirely as standard OCI containers. Therefore, it is compatible with most CI/CD runtime environments, including:
This approach has the benefit of enabling you to execute the same CI process everywhere.
Caching across pipeline runs is one of Dagger's most potent but often overlooked power.
You can designate one or more directories as cache volumes in your pipeline, and its content will be persistent across runs. Therefore, this makes it possible to reuse the cache's contents at each pipeline run, which speeds up pipeline operations.
Demo - How to streamline your CI pipelines with Dagger
To demonstrate how easy it is to set up a CI process, let's write our first pipeline with Dagger.
Our dummy pipeline will:
- initialize the Dagger client
- mount the dummy projects dir to the container
- install some Python dependencies
- run the linter by executing the sqlfmt command
Simple as that, we were able to create a universal CI pipeline that can be executed on different CI runners.
You can execute the example locally:
Or wrap the same pipeline as Github Actions just like this:
This demonstrates how Dagger makes it easy to create custom pipelines that fit together like Lego pieces. Now, my CI setup is primarily built around Dagger for the build, testing, and publish pipelines. With this approach, I can run my CI anywhere I can run Docker containers.
In conclusion, Dagger is a powerful tool that enables Analytics Engineers to create portable CI pipelines in their favorite programming language and execute them in any Docker-compatible runtime environment.
Fine-tuning is all about using genAI to fit your own context and goals. Explore our approach for both everyday and business settings. Open-source model Mistral has hardly seen any Hungarian in its training. Yet it learned to speak fluently from only 80k messages!