[{"data":1,"prerenderedAt":102},["ShallowReactive",2],{"\u002Fen\u002Fprojects\u002F2017\u002Fdotz-data-labs":3},{"id":4,"title":5,"body":6,"createdAt":79,"description":80,"extension":81,"meta":82,"navigation":92,"path":93,"seo":94,"slug":95,"stem":96,"tags":97,"website":100,"__hash__":101},"projects\u002Fprojects\u002F2017\u002Fdotz-data-labs.md","Dotz Data Labs",{"type":7,"value":8,"toc":71},"minimark",[9,14,19,23,34,38,41,45,49,52,55,58,61,65,68],[10,11,13],"h1",{"id":12},"introduction","Introduction",[15,16,18],"h2",{"id":17},"summary","Summary",[20,21,22],"p",{},"In order to be able to innovate and maintain itself in a highly changing\nand evolving market, Dotz went through a process of digital\ntransformation and had the help of some consultants along the way.",[20,24,25,26,33],{},"Among the steps to get closer to a digital model, the implementation of\na Data Lake emerged, with the requirements of being\n",[27,28,32],"a",{"href":29,"rel":30},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FServerless_computing",[31],"nofollow","serverless"," and\ncloud-native to support the decision-making process and shorten\ntime-to-market during the launch of products.",[15,35,37],{"id":36},"problem","Problem",[20,39,40],{},"Dotz is one of the largest companies in the field of loyalty program in\nBrazil and they'd face a high number of issues with data disconnection\nmaking it difficult to analyze their users' behaviors. Since they\nreceived data from numerous supermarkets and stores, it's difficult to\nclusterizate products, since the name is different depending on the\nsource. To help with this analysis, they decided to build a Data Lake.",[10,42,44],{"id":43},"solution","Solution",[15,46,48],{"id":47},"technical-implementation","Technical Implementation",[20,50,51],{},"We built and deployed a Big Data's managed architecture using Google's\nCloud Platform (GCP) to support this strategy and allow a 360-degree\nview of customers (users with points a.k.a. Dotz) and partners (the\nsupermarkets offering the loyalty program).",[20,53,54],{},"The design was focused on cloud-managed services and serverless\ncomputing offered by Google, serving the core competencies of a Data\nLake such as scalable storage using Google Cloud Storage, and Google\nBigQuery. With part of the process running inside Kubernetes,\nresponsible for data cleansing ETL flow management.",[20,56,57],{},"We streamed data with Apache Beam running under Google DataFlow,\nparallel mass processing with Apache Spark jobs running on Google\nDataProc, exploratory analysis with Google DataLab, Machine Learning\nAnalysis with Google ML and Data visualization in Google Data Studio.",[20,59,60],{},"Data is transported using an event-driven model, where every data is\ncollected using a streaming model, even the ETL (which runs on a\nmicro-batch, to enable near real-time exploration). These data goes\nthrough the data pipeline using Google Pub\u002FSub message-oriented\nmiddleware, and every message is serialized using Avro format, which\nreduces the payload and allows transportation to be cost-effective, fast\nand reliable.",[15,62,64],{"id":63},"impact-and-results","Impact and results",[20,66,67],{},"It all allowed Dotz to have a better structure on their analytical\nplatform, previously managed inside a large MS SQL Server instance,\nwhich moved to a Data Lake with layers allowing data categorization,\ngovernance, quality and security.",[20,69,70],{},"Supporting analytical processes of users data, faster exploration and\nmonetization of their knowledge on customers' behavior.",{"title":72,"searchDepth":73,"depth":73,"links":74},"",2,[75,76,77,78],{"id":17,"depth":73,"text":18},{"id":36,"depth":73,"text":37},{"id":47,"depth":73,"text":48},{"id":63,"depth":73,"text":64},"2017-09-03T00:00:00","Serverless and cloud-managed Big Data architecture using Google's Cloud Platform (GCP) to support a 360-degree view of customers and partners of Dotz, one of the largest companies in the field of loyalty program in Brazil","md",{"duration":83,"tools":85},{"from":79,"to":84},"2018-02-20T00:00:00",[86,87,88,89,90,91],"gcp","scala","apache beam","google pub\u002Fsub","apache avro","apache spark",true,"\u002Fprojects\u002F2017\u002Fdotz-data-labs",{"title":5,"description":80},"dotz-data-labs","projects\u002F2017\u002Fdotz-data-labs",[98,99],"hybrid-cloud","data lake",null,"t7ptKvbW33aAb9OgDBrxoRwLHEporCm6P5VhJSFidt0",1778441744007]