Project Experiences with Provisioning Databricks Environments on Azure via Terraform
Automated Databricks provisioning on Azure with Terraform and Python scripts—transitioning from PoC to production, overcoming challenges, optimizing deployment.
The retail and e-commerce industry is currently experiencing a data deluge during a resourcing drought. An enormous influx of customer, inventory, and other data types are pressuring businesses, primarily due to the industry's rapid digitization. This comes as no surprise as “non-store and online sales are expected to grow between 10% and 12% year over year to a range of $1.41 trillion to $1.43 trillion,” according to NRF’s 2023 forecast.
With today's technologies, companies can gain more insight about their customers and operations with every click, purchase, return, subscription and review. On the inventory front, teams are dealing with complex data related to stock keeping unit (SKU) expansion, multiple sales channels, and supply chains. Sorting through this can be tedious and result in inventory mismanagement, stock-outs, or overstocks. Furthermore, businesses are combing through other types of data, such as customer, financial, and marketing data, each carrying critical decision-making and customer experience potential.
While this data may be advantageous to some brands, many growing companies face the challenges of standing up proper in-house pipelines and infrastructure in a scalable, real-time, cost-effective manner. There is not only significant investment needed from engineering and data teams to build, but the maintenance burden can be even more taxing over a longer time horizon. In a recent survey conducted by Ably of over 500 engineering leaders, “70% of real-time experience infrastructure projects lasted more than three months and over 90% of projects had 4+ engineers working on them.”
On top of the pressures of rapid digitization and data maneuverability, slowing market conditions and shrinking budgets have increased pressure on these brands to maintain operational efficiency. This trend often means reducing staff, limiting technology procurement, and curtailing data investments. This environment can lead to delayed or poor decision-making, impacting inventory management, customer relations, and overall sales.
For instance, inadequate customer data modeling could result in missed opportunities to personalize marketing efforts, reducing customer engagement and lowering conversion rates. If a brand can't correctly identify your historical purchasing behavior across channels (in-store, app subscriptions, etc.), how can it properly communicate to you offers, new product releases, or other updates to improve brand loyalty?
The Modern Data Stack Solution
This macro environment where data innovation and opportunity meets resourcing constraints is unfortunately what keeps many business leaders up at night. “How can we do more with less?” is an all too familiar mantra for teams looking to meet lofty sales targets and goals. Luckily, a wave of technical innovation and operational efficiency is occurring around a disruptive piece of technology - the cloud data warehouse.
The ideal instrument for storing copious amounts of data, effortlessly conducting queries, and establishing connectivity across your company is the data warehouse. In all likelihood, your organization already utilizes one, thereby reducing the need to procure Customer Relationship Management tools (CRM), Customer Data Platforms (CDP), Data Management Platforms (DMP), or other such tools that bloat your technical infrastructure. While the concept and components of the Modern Data Stack are evolving and being debated, the utility of composable architecture around a centralized hub is clear.
Positioning the Cloud Data Warehouse as the central element of your Modern Data Stack yields considerable advantages such as:
However, if you want to drive business outcomes from your warehouse truly, there are three critical layers of a Composable CDP for you to consider:
As mentioned earlier, the data deluge is just one of the hurdles facing retail teams. While the data warehouse and surrounding MDS components allow organizations to manage data complexity, there is still a human element involved with this change management. A NYC-based e-commerce company established in 2018 faced such challenges and decided to look externally for help in its evolution around the MDS.
The company, despite lacking an internal data team, aspired to be data-driven and enlisted Hiflylabs to assist in their data management and activation initiatives. Challenges ranged from inherited model errors and low-quality code from previous partners to lack of documentation for important metrics such as customer acquisition cost (CAC), annual recurring revenue (ARR), and lifetime value (LTV). They also faced issues with data synchronization between key commerce platforms such as WooCommerce and Recharge.
In response, Hiflylabs utilized the MDS to standardize development and deployment processes. They refined hundreds of models, conducted tests, and implemented macros to enhance data operations, redefining crucial metrics like customer cancellations, cohort analysis, churn, and retention calculations. This collaboration resulted in a modernized CI/CD pipeline, streamlined data management, and facilitated dashboard creation and reporting for non-technical stakeholders. The successful partnership has paved the way for a stable data ecosystem for their teams to operate within and a commitment from Hiflylabs to provide ongoing opportunities on the company's data journey.
While this architecture and operational overhaul was a priority for this particular brand, a crawl approach might appeal more to your company's situation. Hiflylabs' expertise in dbt and Census could be a great first place to start. Let’s look at how Census streamlines dbt modeling through its recently released packages and tackles some common identity modeling use cases.
Modern problems require modern packages
If the basic concepts of dbt (data build tool) didn’t catch your attention the first time, dbt macros will surely do. Macros allow you to add Pythonic functions to your SQL transformations. You can create your own macros with Jinja formatted SQL files, or if you are facing common issues, you can easily pick one from the shelf among the several dbt packages.
How do dbt packages work? You simply just add the selected version to your requirements.txt in your dbt project folder. Run a few commands, and all the macros are available from the given provider. Not all packages are equally valuable, and for Hiflylabs, a few “must-have” packages can be used for everyday development.
A great example is the dbt_utils package, which the census_utils builds upon.
The census_utils dbt package is a practical solution for some customer identity challenges
The census_utils package includes six simple but super-handy macros as per:
With the is_internal() and is_personal_email() macros, you can solve some fundamental problems distinguishing internal and personal email domains.
Meanwhile, this package can also extract commonly used data parts with the extract_email_domain() and get_country_code() macros.
The cherry on top is the simple but super-handy data transformation capabilities with the clean() and parse_ga4_client_id() functions. These parse your data into a neat format, and you can load it directly to the Facebook or Google Ads APIs.
Closing Thoughts
In the grand theater of digital transformation, the Modern Data Stack has become the star performer. It's a riveting act, juggling with scalability and real-time processing, transforming mundane data into insightful narratives that illuminate the path to customer engagement, operational efficiency, and decision accuracy.
Sometimes it takes a skilled choreographer like Hiflylabs to guide the dance, ensuring every step is in sync with the rhythm of business goals. So, as the retail sector takes the stage in the age of information, it's clear that mastering the dance with data determines who gets the standing ovation.
Interested in how Hiflylabs and the Modern Data Stack can benefit your business? Want more information about how Census's dbt packages can address your data modeling concerns? Contact Hiflylabs for a free consultation or book a demo with a Census product specialist!
Automated Databricks provisioning on Azure with Terraform and Python scripts—transitioning from PoC to production, overcoming challenges, optimizing deployment.
Explore the current state of Generative AI and the challenging road towards Artificial General Intelligence (AGI). We review key limitations, needed breakthroughs, and future scenarios.
Discover how digital transformation is revolutionizing everyday activities to improve business productivity and revenue.
Hiflylabs is your partner in building your future. Share your ideas and let's work together!