The Power of the Modern Data Stack in the Retail and E-commerce Industry

Facing a data deluge in your retail business? Adopt the Modern Data Stack for effective data management. See Hiflylabs' solutions in action!

August 03, 2023

Retailers’ Challenges - More Data, Less Humans

The retail and e-commerce industry is currently experiencing a data deluge during a resourcing drought. An enormous influx of customer, inventory, and other data types are pressuring businesses, primarily due to the industry's rapid digitization. This comes as no surprise as “non-store and online sales are expected to grow between 10% and 12% year over year to a range of $1.41 trillion to $1.43 trillion,” according to NRF’s 2023 forecast.

With today's technologies, companies can gain more insight about their customers and operations with every click, purchase, return, subscription and review. On the inventory front, teams are dealing with complex data related to stock keeping unit (SKU) expansion, multiple sales channels, and supply chains. Sorting through this can be tedious and result in inventory mismanagement, stock-outs, or overstocks. Furthermore, businesses are combing through other types of data, such as customer, financial, and marketing data, each carrying critical decision-making and customer experience potential.

While this data may be advantageous to some brands, many growing companies face the challenges of standing up proper in-house pipelines and infrastructure in a scalable, real-time, cost-effective manner. There is not only significant investment needed from engineering and data teams to build, but the maintenance burden can be even more taxing over a longer time horizon. In a recent survey conducted by Ably of over 500 engineering leaders, “70% of real-time experience infrastructure projects lasted more than three months and over 90% of projects had 4+ engineers working on them.”

On top of the pressures of rapid digitization and data maneuverability, slowing market conditions and shrinking budgets have increased pressure on these brands to maintain operational efficiency. This trend often means reducing staff, limiting technology procurement, and curtailing data investments. This environment can lead to delayed or poor decision-making, impacting inventory management, customer relations, and overall sales.

For instance, inadequate customer data modeling could result in missed opportunities to personalize marketing efforts, reducing customer engagement and lowering conversion rates. If a brand can't correctly identify your historical purchasing behavior across channels (in-store, app subscriptions, etc.), how can it properly communicate to you offers, new product releases, or other updates to improve brand loyalty?

The Modern Data Stack Solution

This macro environment where data innovation and opportunity meets resourcing constraints is unfortunately what keeps many business leaders up at night. “How can we do more with less?” is an all too familiar mantra for teams looking to meet lofty sales targets and goals. Luckily, a wave of technical innovation and operational efficiency is occurring around a disruptive piece of technology - the cloud data warehouse.

The ideal instrument for storing copious amounts of data, effortlessly conducting queries, and establishing connectivity across your company is the data warehouse. In all likelihood, your organization already utilizes one, thereby reducing the need to procure Customer Relationship Management tools (CRM), Customer Data Platforms (CDP), Data Management Platforms (DMP), or other such tools that bloat your technical infrastructure. While the concept and components of the Modern Data Stack are evolving and being debated, the utility of composable architecture around a centralized hub is clear.

Positioning the Cloud Data Warehouse as the central element of your Modern Data Stack yields considerable advantages such as:

Data Ownership: This arrangement bolsters your compliance with various privacy stipulations and regulations.
Quick Value Realization: Depositing historical data into a database is significantly more straightforward than integrating the data into an additional tool.
Streamlined Synchronization: Databases are universally compatible, unlike SaaS tools, which may offer limited APIs.
Reusability: Various teams within your organization can rely on this dependable information reservoir.

However, if you want to drive business outcomes from your warehouse truly, there are three critical layers of a Composable CDP for you to consider:

Data Collection Layer: Utilize Snowplow to collect real-time events occurring on your website or application. In parallel leverage an ETL platform such as Fivetran to properly load data from other external platforms into your data warehouse.
Data Transformation Layer: dbt is the preeminent data modeling tool for you to transform your data warehouse into a model applicable to your business needs.
Data Activation Layer: Census seamlessly integrates with dbt, Snowplow, and Fivetran, thus enabling the synchronization of your data with all your business tools.

By implementing complementary, composable data platforms around your source of truth (the data warehouse), you can seamlessly collect, transform and activate data in a way that makes sense for your business. Despite this architecture's ease of integration and use, not all companies will have the internal bandwidth to tackle this evolution and must look outwards for help.

How businesses are adapting to the Modern Data Stack thanks to Hiflylabs

As mentioned earlier, the data deluge is just one of the hurdles facing retail teams. While the data warehouse and surrounding MDS components allow organizations to manage data complexity, there is still a human element involved with this change management. A NYC-based e-commerce company established in 2018 faced such challenges and decided to look externally for help in its evolution around the MDS.

The company, despite lacking an internal data team, aspired to be data-driven and enlisted Hiflylabs to assist in their data management and activation initiatives. Challenges ranged from inherited model errors and low-quality code from previous partners to lack of documentation for important metrics such as customer acquisition cost (CAC), annual recurring revenue (ARR), and lifetime value (LTV). They also faced issues with data synchronization between key commerce platforms such as WooCommerce and Recharge.

Census Athena 02.png — The new architecture, after refining 350 models, conducting 250 tests, and implementing 770 macros.

In response, Hiflylabs utilized the MDS to standardize development and deployment processes. They refined hundreds of models, conducted tests, and implemented macros to enhance data operations, redefining crucial metrics like customer cancellations, cohort analysis, churn, and retention calculations. This collaboration resulted in a modernized CI/CD pipeline, streamlined data management, and facilitated dashboard creation and reporting for non-technical stakeholders. The successful partnership has paved the way for a stable data ecosystem for their teams to operate within and a commitment from Hiflylabs to provide ongoing opportunities on the company's data journey.

While this architecture and operational overhaul was a priority for this particular brand, a crawl approach might appeal more to your company's situation. Hiflylabs' expertise in dbt and Census could be a great first place to start. Let’s look at how Census streamlines dbt modeling through its recently released packages and tackles some common identity modeling use cases.

Modern problems require modern packages

If the basic concepts of dbt (data build tool) didn’t catch your attention the first time, dbt macros will surely do. Macros allow you to add Pythonic functions to your SQL transformations. You can create your own macros with Jinja formatted SQL files, or if you are facing common issues, you can easily pick one from the shelf among the several dbt packages.

How do dbt packages work? You simply just add the selected version to your requirements.txt in your dbt project folder. Run a few commands, and all the macros are available from the given provider. Not all packages are equally valuable, and for Hiflylabs, a few “must-have” packages can be used for everyday development.

A great example is the dbt_utils package, which the census_utils builds upon.

The census_utils dbt package is a practical solution for some customer identity challenges

The census_utils package includes six simple but super-handy macros as per:

With the is_internal() and is_personal_email() macros, you can solve some fundamental problems distinguishing internal and personal email domains.

Meanwhile, this package can also extract commonly used data parts with the extract_email_domain() and get_country_code() macros.

The cherry on top is the simple but super-handy data transformation capabilities with the clean() and parse_ga4_client_id() functions. These parse your data into a neat format, and you can load it directly to the Facebook or Google Ads APIs.

Closing Thoughts

In the grand theater of digital transformation, the Modern Data Stack has become the star performer. It's a riveting act, juggling with scalability and real-time processing, transforming mundane data into insightful narratives that illuminate the path to customer engagement, operational efficiency, and decision accuracy.

Sometimes it takes a skilled choreographer like Hiflylabs to guide the dance, ensuring every step is in sync with the rhythm of business goals. So, as the retail sector takes the stage in the age of information, it's clear that mastering the dance with data determines who gets the standing ovation.

Interested in how Hiflylabs and the Modern Data Stack can benefit your business? Want more information about how Census's dbt packages can address your data modeling concerns? Contact Hiflylabs for a free consultation or book a demo with a Census product specialist!

Article by

Data Warehouse

Census

Explore more stories

AI Predictions: Tracking the Prophecies
Humanity has always tried to predict the future. We’re swapping tea leaves for datasets. We explore our collective obsession with AGI timelines, the moving goalposts of superintelligence, and why we ultimately decided to build a scoreboard to keep everyone honest.
Think Big, Start Small: Digital Twinning a Coffee Machine
Digital twinning allows you to minimize downtime when maintaining large-scale machinery. But it’s not exclusive to gigantic pieces of equipment—even something as small as a coffee machine can showcase its benefits.
Building an AI Application with Databricks Apps in 30 Days
Discover how to build a production-ready AI application on Databricks Apps in just under a month. Learn from our journey, challenges, and architectural choices.

Flying high with Hifly

We want to work with you

Hiflylabs is your partner in building your future. Share your ideas and let’s work together.