Easynvest Data Platform

Introduction

Summary

Within Easynvest’s annual planning, an investment in the expansion of data & analytics team aimed at shortening decision making process and in delivering a higher quality to customers through a low cost operational process.

Among the main objectives of this project, we had the automation of credit analysis (executed during the approval of customer registration, using Machine Learning), a process that until then was long and manual, being handled by the back office.

Followed by a better offer of products to the client, carrying out the categorization according to the profile of each customer, allowing suggestions of more attractive products, in line with personal preferences, as well as according to the profile of each investor (conservative, moderate or aggressive).

Last but not least, the intelligent detection of money laundry and reporting to the responsible authorities.

Problem

However, there were limitations in the data tools, mainly due to the fact that they were proprietary software (with limited licenses) and designed for usage inside data centers. In addition, the analytical database was modeled for traditional Business Intelligence models (OLAP, etc), making the decision making process heavy, due to the demanding amount of interactions during ETL.

Previously for a client to be approved, the process took 10 to 15 days. Gathering all necessary information, providing a complete perspective of the profile, including credit analysis. After collecting the information, the back office generated an internal credit analysis score.

In most cases, the client was not notified of updates regarding the process and did not receive feedback at the end (if refused) unless explicitly requested (by contacting support via chat or email, for example) which made the process time consuming and costly. Not to mention the countless amounts of customers lost to the competition during this long wait.

Solution

Technical Implementation

To make it possible, we built an hybrid-cloud implementation using AWS cloud-based components (mainly AWS S3, EMR and ECS), to extend the data centers' capability, implementing a cloud-first Hadoop ecosystem (replacing the proprietary software components with open-source equivalents). Giving Easynvest the possibility to grow the Data Lake exponentially.

The Data Lake design was robust, aimed to handle the execution of heavy analytical processes through Machine Learning models, with support for data quality, metadata governance, information security and data serving (data owners could share data with consumers from other areas within the company, allowing to self-service their analytical data).

A Chatbot was also used to reduce the operational load in the environment, this bot is responsible for maintaining and updating infrastructure components. From triggering deployments to generating encryption keys on-demand for data security and governance. Implemented with the Errbot framework for Python interacting with Slack.

Going further, we implemented the best practices in DevOps, using Jenkins as a tool for CI/CD of the developed components alongside with Ansible for Configuration Management.

Impact and results

Thanks to the utilization of layers in the Data Lake and the implementation of data pipelines, we were able to reduce the data ingestion time by 78% and to include metadata and data catalog, in addition to automating much of the work that used to be done manually.

Thus bringing positive results, especially reducing the registration approval time for consumers from roughly 10 days to 1 day. It also made the data platform more democratic, providing relevant information that facilitates the analysis of areas such as risk (credit analysis) and support, without having to give up security.

Matheus Cunha
Matheus Cunha
Systems Engineer and Magician

Just a technology lover empowering business with high-tech computing to help innovation (:

comments powered by Disqus