Path to AGI
Explore the current state of Generative AI and the challenging road towards Artificial General Intelligence (AGI). We review key limitations, needed breakthroughs, and future scenarios.
In this post, I want to show you how you can provision Snowflake components with Terraform in our dbt projects.
Terraform is an open-source Infrastructure as a code (IaC) tool that is used to manage the cloud services with a declarative, version controlled configuration format. It is cloud-agnostic so it works on AWS, GCP, Azure and many other cloud providers. Its benefits include that you can write maintainable, reusable, modular pieces of code. So it fits really well with the mindset that makes so many of us love using dbt.
In the upcoming paragraphs I will focus on the Snowflake provider.
First you need to install the Terraform CLI. To do this, follow the instructions here: install-cli
Note that this varies from OS to OS, on MacOS, for example, I had to run these commands:
$ brew tap hashicorp/tap
$ brew install hashicorp/tap/terraform
After a successful installation, we need to initialize the Terraform project.
To do this we need a new directory, then create the main.tf file with the following content:
main.tf
terraform {
required_providers {
snowflake = {
source = "chanzuckerberg/snowflake"
version = "0.25.28"
}
}
}
From there run the following command:
$ terraform init
The project will consist of the following files:
Without the use of Terraform on projects, Snowflake dependencies need to be created manually. For example, in a new project, one thing that you definitely need in a Snowflake tenant is at least one warehouse:
It looks like this in pure SQL:
CREATE WAREHOUSE TRANSFORMING WITH
WAREHOUSE_SIZE = 'XSMALL'
WAREHOUSE_TYPE = 'STANDARD'
AUTO_SUSPEND = 600
AUTO_RESUME = TRUE;
There is nothing fundamentally wrong with this approach, but there is no clear way to manage this kind of code in your dbt project. You can save it to a new repo or to one of the corners of your project, but there is no trivial solution to manage these external dependencies, and to make it parametrizable, reusable. Managing these external resources can cause technical debt.
The first step is to set up the required providers. This is necessary because the technical user on whose behalf we run the commands needs different roles for different operations.(This user must be at least SYS, SECURITY admin within Snowflake.)
Add the following to the contents of main.tf:
provider "snowflake" {
username = <snowflake-username>
password = <snowflake-password>
account = <snowflake-account-name>
region = <snowflake-region-name>
alias = "sys_admin"
role = "SYSADMIN"
}
(of course, sensitive information will not be stored in main.tf later)
To create a Snowflake warehouse in Terraform is as simple as this:
resource snowflake_warehouse w {
name = "test"
comment = "foo"
warehouse_size = "small"
}
It’s looks something like this:
We successfully provisioned a Snowflake warehouse from Terraform.
Now we are going to make the process more sophisticated. We will derive each value into a variable, for example, if you need more than one warehouse (which you probably do).
With the following DRY approach, it’s easy to automate the creation of warehouses in a configurable and reusable way.
You can simply add something this to your variable.tf:
variables.tf
variable "warehouses" {
type = map(object({
name = string,
size = string
}))
default = {
"test01" = {
name = "TEST_WAREHOUSE",
size = "small"
}
"test02" = {
name = "TEST_WAREHOUSE_2",
size = "small"
}
}
main.tf
resource "snowflake_warehouse" "warehouses" {
for_each = var.warehouses
provider = snowflake.sys_admin
name = upper(each.value.name)
warehouse_size = each.value.size
auto_suspend = 60
initially_suspended = true
}
So, as described above, with Terraform we can dynamically generate the necessary resources to provision Snowflake components. I think this is a good way to demonstrate the power of Terraform.
Here is high level view of our final bootstrapping project:
This will create the following objects:
You can find the source code here: link
In this post, I tried to show you how to provision Snowflake infrastructure with Terraform.
Development of the dbt Cloud Terraform provider started 3 months ago ( Oct 3, 2021), which will allow us to put dbt Cloud resources under version control in the future as well.
Author: Zsombor Földesi - Data Engineer
You can find our other blog posts here.
Explore the current state of Generative AI and the challenging road towards Artificial General Intelligence (AGI). We review key limitations, needed breakthroughs, and future scenarios.
Discover how digital transformation is revolutionizing everyday activities to improve business productivity and revenue.
Let’s examine top use cases for different industries for maximizing ROI, find intrinsic value of process automation across sectors, and paint a vision of the midterm future of AI progress.
Hiflylabs is your partner in building your future. Share your ideas and let's work together!