Project Experiences with Provisioning Databricks Environments on Azure via Terraform
Automated Databricks provisioning on Azure with Terraform and Python scripts—transitioning from PoC to production, overcoming challenges, optimizing deployment.
Reading time: 5 min
When it comes to customer relationship management, every retailer faces the same three challenges: customer acquisition, expansion, and retention. Bringing new customers to the store is essential, but it is not enough to keep the business growing – that’s why it is just as important to find ways to create extra value that will keep customers happy and loyal. The first two, however, are quite ineffective without a robust customer retention strategy. In what follows, we will outline one of our previous projects that focused on churn prevention.
Our client owns 4000 drugstores across Europe, of which circa 200 are located in Hungary – these are the stores our project focused on exclusively. There are roughly 1 million customers in our client’s loyalty program, but because of GDPR, we were allowed to work with only 60% of them – which is still a great amount of data.
Identifying customers with a higher probability to leave a business is an essential step to build an effective customer retention strategy. Realizing this, our client requested that we build a model that detects customers who are in the “danger zone” – customers, who are likely to churn.
The location of this “danger zone” is very important: when we can detect the signs of churn early, the costs of retention are lower since customers who haven’t left just yet require less intense intervention. That means early detection equals smaller marketing costs, thereby a higher revenue.
The first step of our project was to define ‘churn’, as there was no commonly used definition available on our client’s side. This can be a very tricky task, as we can hardly ever tell for sure that a customer has churned – it all depends on the definition we use. We need to find a definition that will flag customers with an optimal false positive and false negative ratio: falsely flagging customers as ‘churned’ increases marketing costs unnecessarily, but not detecting customers who truly churned will result in losing them. Therefore, finding the best possible churn definition is crucial, and unfortunately, there is no general definition, as it highly depends on which type of retail store we are working with, and several other factors. During this phase, we worked closely together with our client, and we performed multiple analyses to help us decide on the optimal churn definition.
After a few iterations, we landed on a definition that both our client and our team found feasible, which led to the next step: finding features that contribute to the prediction of churn.
We collected about a hundred variables in total, including demographic variables; features that describe customer behaviour, such as campaign affinity and private label affinity; and we used dynamic variables to grasp potential change in behaviour as well.
Some of these had greater effect on the prediction than others, of course. We found that number of days since registration was most relevant to the target variable (as customers who have been loyal for a long time are less likely to churn), but the quantity of coupons a customer usually redeems is also a strong factor, and age plays a significant role in churn as well.
We constructed a churn definition, we extracted important features that may contribute to the prediction, so there was only one thing left to do: to build a model. To predict whether a customer will churn or not, we used logistic regression with lasso regularization to find the best model which keeps the most significant variables only. Our final model assigned a churn probability to each customer with relatively high accuracy: we used a very simple algorithm for baseline, which our model exceeded by almost 25%.
Conclusion
After handing over the list of customers who are in the danger zone, our client’s marketing team constructed a series of marketing campaigns, and our team was highly involved in this stage of the project as well. The campaigns were highly successful in terms of redemption rate, but the real results of our project should be long-term, and it is too early to see anything yet, but if our client decides to continue with the series of marketing campaigns, the ROI generated from our project will reach more than 300% by the end of the year.
We should also emphasize that detecting customers who are likely to churn is only one side of the problem. It is also very important to create campaigns that keep customers engaged continuously: a few marketing actions scattered randomly throughout the year is not going to stop customers from leaving. Maintaining campaigns constantly can seem like an awful lot of resources, but at the end of the day, investing in customer retention surely pays off.
Authors:
Eszter Dudás - Data Scientist
Márton Biró - Senior Data Scientist
Automated Databricks provisioning on Azure with Terraform and Python scripts—transitioning from PoC to production, overcoming challenges, optimizing deployment.
Explore the current state of Generative AI and the challenging road towards Artificial General Intelligence (AGI). We review key limitations, needed breakthroughs, and future scenarios.
Discover how digital transformation is revolutionizing everyday activities to improve business productivity and revenue.
Hiflylabs is your partner in building your future. Share your ideas and let's work together!