cloud cost optimization

CODE

|NAM NGUYEN SON|

The Zen of Avoiding the Cloud Cost Fallacy

Sunk cost in the data space.

Let’s be honest, 2022 was a good year for data and a lot of companies moved closer to working with data (kinda induced by the pandemic). Also, you probably saw a lot more job openings on LinkedIn to kickstart an analytics engineering team. There is still a lot to explore, but more and more companies start to realize how valuable data is in the first place and how dbt shortens the learning curve.

This also means that they probably got slapped with a big chunk of the budget as an investment into digitalization. What do you do when you have money? You SPEND, right?!

Kep1_ccf.png
Wake up Huel, we need to plan next year’s budget!

Bad signs

Improvements in data analytics in the last decade spoilt us in a way. Extremely cheap storage decoupled from compute completely turned the notion of “scalability” upside down. On-demand pricing instead of hefty license fees back in the OracImprovements in data analytics in the last decade spoilt us in a way. Extremely cheap storage decoupled from compute completely gave the notion of “scalability” a different meaning and boundaries. On-demand pricing instead of hefty license fees back in the Oracle days gave us flexibility to only pay for what you use. You can always use more resources and increase our capacities to fit our scale. However, economies of scale are different. You can’t keep up with the exponentially increasing costs as a function of data growth.

“Sit back, enjoy the ride! We will pay the bills at the end of the month.” — Our inner devil

As the data grows along with the business requirements, you start to see bills getting into a region where it needs to be seriously addressed. Boards are pulled together and Bill asks

“Hmm why is there a big spike in Snowflake costs, Joe?” — Bill

The team starts to nervously dig into parts of the code where you can cut costs, but oftentimes it seems like they are navigating unknown waters.

cloud cost optimization
Data cloud costs if we don’t monitor them…

*does not use warehouse for information schema queries (row count, size, table type, etc) + cached queries.

cloud cost optimization
Jim, come get your free tool!

One tool to rule them all, one tool to bring them all — Me

cloud cost optimization
“THE” data tool

Good Signs

So what else happened in 2022? The global economy got hit seriously and many tech companies overstaffed, and overestimated their productivity, so they started to do mass lay-offs. No, this is not a “good sign” by any means, but it really brought us back to reality to appreciate the cost limiting features of cloud data warehouses.

What can you do? 

If you abide to these laws, you can scale better as a team and have a better ROI to “recover” the sunk costs.

Btw, Our CTO, Andras has already written a handy article about cutting costs in Snowflake.

  1. Appreciate all the great things vendors made available to us. Use partitioning and clustering to facilitate partition pruning. Use well what makes this space modern (e.g. ZCC, table clones, external tables, incremental loading, etc.)
  2. Don’t overdo partitioning and clustering! Partition on small and very large blocks doesn’t make any sense. Clustering is costly, but works well if you frequently hit the key in the BI tool.
  3. Combine well. Partitioning and clustering work well together, scan less data per query, and pruning is determined before query start time, embrace them!
  4. Don’t overuse dbt tests. They are amazing, but test only what’s needed and test it once (don’t do unnecessary tests again downstream if the logic does not change). Never lose track of tests with warning severity, otherwise, they are redundant.
  5. Understand how your warehouse costs are generated by starting with the underlying architecture and query optimizer. The pricing models of Snowflake and BigQuery are different, although both offer flat rate pricing. The former bills on warehouse activity, while the latter on the data processed.
  6. BigQuery does a good job balancing performance & price. Storage is inexpensive, and computing feels more flexible. BigQuery excels when query traffic is low. Snowflake doesn’t employ the pay-per-query paradigm, and turning on and off virtual warehouses isn’t quite the same thing.
  7. Choose the right warehouse, warehouse suspend strategy, table type for your use case.
  8. Keep auto suspend at 60s, it’s likely you don’t need more. Use a set of warehouse configurations for differing workloads.
  9. Ask the right questions. Know the expectations and tolerance of your business stakeholders. Does it bother them if a job runs for a bit long, using a smaller warehouse on Snowflake to save X$? How frequently do you need to see certain data refreshed? Does it make sense to build X times per day? Sometimes less is more.
  10. Evaluate alternatives. There is fierce competition in this space, and it might be the case that you can find something more suitable for your use-case and save fortunes. They might need more maintenance (=time=money), but most of these products are now made under the notion of ‘plug and play’.
  11. Tread carefully. Today, we live in an attention economy, where product usage is likely driven by FOMO and companies are pouring money into their marketing. Trying out new products is always fun, but never lose sight of the gains and losses.

Do we have to care more about cloud cost optimization ourselves and do our research, or is it fair to expect a better query optimization, and these advanced tools to identify our bottlenecks? 

Either way, my prediction for 2023 is that cloud costs are going to be an even more sensitive topic when these vendors go Either way, my prediction for 2023 is that cloud costs are going to be a counterpoint topic when these vendors go head-to-head, and we are going to see more companies on focusing pushing down the monthly bill.

Shameless plug 🔌

If cost optimization is a concern for your company, don’t miss our next Snowflake webinar!

On the 8th of February 2023, Andras Zimmer, Head of Analytics Engineering at Hiflylabs is going to walk you through multiple real-life applicable tips & tricks that can help your Snowflake costs plummet.

Book your seat here!

Finally, a list of great resources:

Cloud

Explore more stories

Thanks for the memories – How to fine-tune LLMs

|HIFLYLABS|

Fine-tuning is all about using genAI to fit your own context and goals. Explore our approach for both everyday and business settings. Open-source model Mistral has hardly seen any Hungarian in its training. Yet it learned to speak fluently from only 80k messages!

We want to work with you.

Hiflylabs is your partner in building your future. Share your ideas and let's work together!