Dotz Data Labs

Introduction

Summary

In order to be able to innovate and maintain itself in a highly changing and evolving market, Dotz went through a process of digital transformation and had the help of some consultants along the way.

Among the steps to get closer to a digital model, the implementation of a Data Lake emerged, with the requirements of being serverless and cloud-native to support the decision-making process and shorten time-to-market during the launch of products.

Problem

Dotz is one of the largest companies in the field of loyalty program in Brazil and they’d face a high number of issues with data disconnection making it difficult to analyze their users' behaviors. Since they received data from numerous supermarkets and stores, it’s difficult to clusterizate products, since the name is different depending on the source. To help with this analysis, they decided to build a Data Lake.

Solution

Technical Implementation

We built and deployed a Big Data's managed architecture using Google's Cloud Platform (GCP) to support this strategy and allow a 360-degree view of customers (users with points a.k.a. Dotz) and partners (the supermarkets offering the loyalty program).

The design was focused on cloud-managed services and serverless computing offered by Google, serving the core competencies of a Data Lake such as scalable storage using Google Cloud Storage, and Google BigQuery. With part of the process running inside Kubernetes, responsible for data cleansing ETL flow management.

We streamed data with Apache Beam running under Google DataFlow, parallel mass processing with Apache Spark jobs running on Google DataProc, exploratory analysis with Google DataLab, Machine Learning Analysis with Google ML and Data visualization in Google Data Studio.

Data is transported using an event-driven model, where every data is collected using a streaming model, even the ETL (which runs on a micro-batch, to enable near real-time exploration). These data goes through the data pipeline using Google Pub/Sub message-oriented middleware, and every message is serialized using Avro format, which reduces the payload and allows transportation to be cost-effective, fast and reliable.

Impact and results

It all allowed Dotz to have a better structure on their analytical platform, previously managed inside a large MS SQL Server instance, which moved to a Data Lake with layers allowing data categorization, governance, quality and security.

Supporting analytical processes of users data, faster exploration and monetization of their knowledge on customers' behavior.

Matheus Cunha
Matheus Cunha
Systems Engineer and Magician

Just a technology lover empowering business with high-tech computing to help innovation (:

comments powered by Disqus