[{"data":1,"prerenderedAt":106},["ShallowReactive",2],{"\u002Fen\u002Fprojects\u002F2017\u002Feasynvest-data-platform":3},{"id":4,"title":5,"body":6,"createdAt":83,"description":84,"extension":85,"meta":86,"navigation":97,"path":98,"seo":99,"slug":100,"stem":101,"tags":102,"website":104,"__hash__":105},"projects\u002Fprojects\u002F2017\u002Feasynvest-data-platform.md","Easynvest Data Platform",{"type":7,"value":8,"toc":75},"minimark",[9,14,19,23,26,29,32,36,39,42,45,49,53,56,59,62,65,69,72],[10,11,13],"h1",{"id":12},"introduction","Introduction",[15,16,18],"h2",{"id":17},"summary","Summary",[20,21,22],"p",{},"Within Easynvest's annual planning, an investment in the expansion of\ndata & analytics team aimed at shortening decision making process and\nin delivering a higher quality to customers through a low cost operational\nprocess.",[20,24,25],{},"Among the main objectives of this project, we had the automation of\ncredit analysis (executed during the approval of customer registration,\nusing Machine Learning), a process that until then was long and manual,\nbeing handled by the back office.",[20,27,28],{},"Followed by a better offer of products to the client, carrying out the\ncategorization according to the profile of each customer, allowing\nsuggestions of more attractive products, in line with personal preferences,\nas well as according to the profile of each investor (conservative,\nmoderate or aggressive).",[20,30,31],{},"Last but not least, the intelligent detection of money laundry and\nreporting to the responsible authorities.",[15,33,35],{"id":34},"problem","Problem",[20,37,38],{},"However, there were limitations in the data tools, mainly due to the\nfact that they were proprietary software (with limited licenses) and\ndesigned for usage inside data centers. In addition, the analytical\ndatabase was modeled for traditional Business Intelligence models\n(OLAP, etc), making the decision making process heavy, due to the\ndemanding amount of interactions during ETL.",[20,40,41],{},"Previously for a client to be approved, the process took 10 to 15 days.\nGathering all necessary information, providing a complete perspective of\nthe profile, including credit analysis. After collecting the information,\nthe back office generated an internal credit analysis score.",[20,43,44],{},"In most cases, the client was not notified of updates regarding the\nprocess and did not receive feedback at the end (if refused) unless\nexplicitly requested (by contacting support via chat or email, for\nexample) which made the process time consuming and costly. Not to\nmention the countless amounts of customers lost to the competition during\nthis long wait.",[10,46,48],{"id":47},"solution","Solution",[15,50,52],{"id":51},"technical-implementation","Technical Implementation",[20,54,55],{},"To make it possible, we built an hybrid-cloud implementation using AWS\ncloud-based components (mainly AWS S3, EMR and ECS), to extend the data\ncenters' capability, implementing a cloud-first Hadoop ecosystem\n(replacing the proprietary software components with open-source\nequivalents). Giving Easynvest the possibility to grow the Data Lake\nexponentially.",[20,57,58],{},"The Data Lake design was robust, aimed to handle the execution of heavy\nanalytical processes through Machine Learning models, with support for\ndata quality, metadata governance, information security and data serving\n(data owners could share data with consumers from other areas within the\ncompany, allowing to self-service their analytical data).",[20,60,61],{},"A Chatbot was also used to reduce the operational load in the\nenvironment, this bot is responsible for maintaining and updating\ninfrastructure components. From triggering deployments to generating\nencryption keys on-demand for data security and governance. Implemented\nwith the Errbot framework for Python interacting with Slack.",[20,63,64],{},"Going further, we implemented the best practices in DevOps, using Jenkins\nas a tool for CI\u002FCD of the developed components alongside with Ansible for\nConfiguration Management.",[15,66,68],{"id":67},"impact-and-results","Impact and results",[20,70,71],{},"Thanks to the utilization of layers in the Data Lake and the\nimplementation of data pipelines, we were able to reduce the data\ningestion time by 78% and to include metadata and data catalog, in\naddition to automating much of the work that used to be done manually.",[20,73,74],{},"Thus bringing positive results, especially reducing the registration\napproval time for consumers from roughly 10 days to 1 day. It also made\nthe data platform more democratic, providing relevant information that\nfacilitates the analysis of areas such as risk (credit analysis) and\nsupport, without having to give up security.",{"title":76,"searchDepth":77,"depth":77,"links":78},"",2,[79,80,81,82],{"id":17,"depth":77,"text":18},{"id":34,"depth":77,"text":35},{"id":51,"depth":77,"text":52},{"id":67,"depth":77,"text":68},"2017-07-20T00:00:00","Hybrid-cloud Data Lake with most of its capabilities running in AWS. Among the main objectives we had the automation of credit analysis, targeted campaigns to investors according to profile and intelligent detection of money laundry","md",{"duration":87,"tools":89},{"from":88,"to":83},"2017-02-18T00:00:00",[90,91,92,93,94,95,96],"ansible","hybrid-cloud","aws","apache kafka","apache avro","hadoop","apache nifi",true,"\u002Fprojects\u002F2017\u002Feasynvest-data-platform",{"title":5,"description":84},"easynvest-data-platform","projects\u002F2017\u002Feasynvest-data-platform",[91,103],"data lake",null,"-lmpA569F4hZL2xOe_xZ-i6meHrUM9Wl6RDRxZPEfTk",1778441744042]