[{"data":1,"prerenderedAt":670},["ShallowReactive",2],{"all-projects":3},[4,201,278,360,434,511,593],{"id":5,"title":6,"body":7,"createdAt":176,"description":177,"extension":178,"meta":179,"navigation":190,"path":191,"seo":192,"slug":193,"stem":194,"tags":195,"website":199,"__hash__":200},"projects\u002Fprojects\u002F2021\u002Ffreeletics-jenkins-redesign.md","Freeletics: Jenkins CI\u002FCD Redesign",{"type":8,"value":9,"toc":167},"minimark",[10,15,20,24,28,31,54,58,62,65,71,90,96,102,106,136,140,143],[11,12,14],"h1",{"id":13},"introduction","Introduction",[16,17,19],"h2",{"id":18},"summary","Summary",[21,22,23],"p",{},"Freeletics ran three fragmented CI\u002FCD systems in parallel: Jenkins for back-end\nand web, CircleCI for mobile, and Travis for tests. Jenkins itself was at least\nfive years out-of-date, built on a customized Jenkins Job Builder (JJB) fork\nthat didn't support Jenkins Pipelines, deployed through a Helm Chart that\nembedded secrets directly in its values file (coupling every configuration\nchange to a secrets release).",[16,25,27],{"id":26},"problem","Problem",[21,29,30],{},"Three specific failure modes drove the redesign:",[32,33,34,42,48],"ul",{},[35,36,37,41],"li",{},[38,39,40],"strong",{},"Morning build storms:"," Dependabot merged PRs in batches at the start of\nthe day, triggering simultaneous Docker image builds that overwhelmed\nJenkins master-to-slave HTTP communication and caused widespread job hangs;",[35,43,44,47],{},[38,45,46],{},"Modernization blocked:"," the outdated JJB fork rejected Pipeline definitions\nat deploy time, making it impossible to adopt any Jenkins feature released in\nthe past two years;",[35,49,50,53],{},[38,51,52],{},"Untestable, untouchable Helm Chart:"," JJB YAML was rendered inside Go\ntemplates and executed during chart install. Any change carried the risk of a\nbroken Jenkins release with no rollback path that didn't also revert secrets.",[11,55,57],{"id":56},"solution","Solution",[16,59,61],{"id":60},"technical-implementation","Technical Implementation",[21,63,64],{},"Executed in four sequential phases:",[21,66,67,70],{},[38,68,69],{},"Phase 1 - Tool evaluation (Feb-Mar\u002F2020):"," benchmarked Docker image build\ntimes across CircleCI, GitLab CI (shared and self-hosted runners with Kaniko),\nand Jenkins. Jenkins produced the fastest server-side builds due to lower\nlatency and full control over runner hardware sizing. Decision: invest in\nJenkins, redesign from scratch.",[21,72,73,76,77,81,82,85,86,89],{},[38,74,75],{},"Phase 2 - Pipeline modernization (Aug\u002F2020):"," replaced JJB with Jenkins\nConfiguration as Code (JCasC) and Job DSL templates managed through Terraform,\nmaking every job definition a reviewable pull request. Migrated all Docker image\nbuilds from Docker-in-Docker to Kaniko (running as unprivileged ephemeral\nKubernetes pods). Redesigned the Jenkins Groovy Shared Library around a\ncomposable ",[78,79,80],"code",{},"KanikoBuilder"," class, reducing per-repository Jenkinsfiles to\ndeclarative build specifications. Introduced image multi-tagging (",[78,83,84],{},"qa-\u003CSHA1>",",\n",[78,87,88],{},"qa-latest-master",") to support the QA stack's tag-based image resolution.",[21,91,92,95],{},[38,93,94],{},"Phase 3 - Authorization (Sep\u002F2020):"," implemented GitHub OAuth, mapping\nJenkins RBAC roles directly to GitHub team membership. Replaced open admin\naccess (any G-Suite account) with a reviewable, auditable access model using\nthe same workflow as the rest of the infrastructure.",[21,97,98,101],{},[38,99,100],{},"Phase 4 - Secrets decoupling (Sep\u002F2020):"," separated secrets management from\nthe Helm Chart release cycle. Static credentials (AWS IAM keys, API tokens)\nare Sops-encrypted in the repository and synced to Jenkins Credentials Store\nthrough JCasC. Runtime secrets (Kubeconfigs, Kubernetes credentials) are stored\nin AWS Secrets Manager and read on-the-fly by pipelines via the credentials\nprovider plugin. Jenkins Helm releases became configuration-only operations.",[16,103,105],{"id":104},"impact-and-results","Impact and results",[32,107,108,114,124,130],{},[35,109,110,113],{},[38,111,112],{},"Fully reproducible deployments:"," the Jenkins Helm release can be deleted\nand recreated from Terraform + JCasC with complete fidelity (no manual state,\nno out-of-band configuration);",[35,115,116,119,120,123],{},[38,117,118],{},"Build time advantage preserved:"," migrating from Docker-in-Docker to Kaniko\nmaintained Jenkins' benchmark advantage over alternatives (~2:00 for\n",[78,121,122],{},"fl-backend-rails"," vs ~10:01 on CircleCI, as of the Feb\u002F2020 evaluation);",[35,125,126,129],{},[38,127,128],{},"Unified pipelines:"," a single Groovy Shared Library now covers back-end,\nweb, coach, and tracking applications (previously each had independent\nad-hoc Jenkinsfile implementations with duplicated logic);",[35,131,132,135],{},[38,133,134],{},"Auditable secrets:"," Sops-encrypted catalog in version control provides full\nchange history for credentials, replacing opaque values embedded in a Helm\nrelease.",[16,137,139],{"id":138},"write-up","Write-up",[21,141,142],{},"The full story is documented in a three-part series:",[32,144,145,153,160],{},[35,146,147,152],{},[148,149,151],"a",{"href":150},"\u002Fposts\u002F2021\u002F01\u002Fjenkins-five-years-of-cicd-debt","Part 1: Freeletics CI\u002FCD: five years of debt (and why we kept Jenkins)","\n-- what we inherited, the benchmark data behind the decision to invest, and the\ndesign goals that shaped the rebuild.",[35,154,155,159],{},[148,156,158],{"href":157},"\u002Fposts\u002F2021\u002F01\u002Fjenkins-boring-security-by-design","Part 2: Boring security on Freeletics Jenkins, by design","\n-- authorization that doesn't require a spreadsheet, and secrets decoupled from\nthe configuration release cycle.",[35,161,162,166],{},[148,163,165],{"href":164},"\u002Fposts\u002F2021\u002F02\u002Fjenkins-rebuilding-it-phase-by-phase","Part 3: The Freeletics CI\u002FCD rebuild, phase by phase","\n-- the build system itself: Kaniko migration, Groovy Shared Library redesign,\nand the change that made Dependabot Monday mornings a non-event.",{"title":168,"searchDepth":169,"depth":169,"links":170},"",2,[171,172,173,174,175],{"id":18,"depth":169,"text":19},{"id":26,"depth":169,"text":27},{"id":60,"depth":169,"text":61},{"id":104,"depth":169,"text":105},{"id":138,"depth":169,"text":139},"2021-02-01T00:00:00","End-to-end redesign of a 5-year-old Jenkins CI\u002FCD platform: replaced Jenkins Job Builder with Pipelines as Code managed through Terraform, migrated all Docker builds to Kaniko on Kubernetes, decoupled secrets management from the Helm Chart, and unified back-end and web CI\u002FCD through a Groovy Shared Library.","md",{"duration":180,"tools":182},{"from":181,"to":176},"2020-08-01T00:00:00",[183,184,185,186,187,188,189],"jenkins","kubernetes","kaniko","terraform","groovy","aws secrets manager","helm",true,"\u002Fprojects\u002F2021\u002Ffreeletics-jenkins-redesign",{"title":6,"description":177},"freeletics-jenkins-cicd-redesign","projects\u002F2021\u002Ffreeletics-jenkins-redesign",[196,197,198],"ci-cd","platform-engineering","infrastructure-as-code",null,"asnvLELu8UgsGnhCEGdHWKoLa26quHFLWuAgs_N6Mic",{"id":202,"title":203,"body":204,"createdAt":259,"description":260,"extension":178,"meta":261,"navigation":190,"path":270,"seo":271,"slug":272,"stem":273,"tags":274,"website":199,"__hash__":277},"projects\u002Fprojects\u002F2019\u002Freclameaqui-data-lake.md","ReclameAQUI Data Lake",{"type":8,"value":205,"toc":253},[206,208,210,213,216,218,221,224,226,230,233,236,239,242,245,247,250],[11,207,14],{"id":13},[16,209,19],{"id":18},[21,211,212],{},"ReclameAQUI (Portuguese for \"complain here\") is an interesting and unique\nbusiness. They're a content aggregator for customers' experience sharing\n(especially bad experiences) about shopping (online and offline). However, it\ngoes further than a mere \"complaints website\" offering an interface for\ncompanies to answers complaints, helping customers with their issues.",[21,214,215],{},"The service is simply the biggest in this regard (worldwide) receiving 600K\nunique visitors each day, searching for a company's reputation before closing a\ndeal\u002Fpurchase.",[16,217,27],{"id":26},[21,219,220],{},"Even though they are already advanced in the digital approach to business,\nhaving most services hosted on Cloud computing and analytical culture, their\ndata lake needed some upgrades. The most relevant motivator of this project was\nthe sky-high bills from GCP especially related to BigQuery data consumption.",[21,222,223],{},"Apart from the cost-reduction tasks and data ingestion process optimization, we\ntook the opportunity to implement data cryptograph at-rest, governance, and\nobfuscation during query executions against the data lake. Making data\naccessible by everyone in the company, controlling identity access and\nmanagement through LDAP (auditing each access, to be fully compliant with\nGDPR), we could offer a self-service data lake so different business actors\ncould satisfy their needs \"drinking\" from the lake.",[11,225,57],{"id":56},[16,227,229],{"id":228},"tech-implementation","Tech implementation",[21,231,232],{},"Key objectives were cost-optimization of the existing Data Lake, improvement\n(and extension) of existing data ingestion pipelines, and security enhancements.",[21,234,235],{},"Starting from Data Lake's cost optimization, we redesigned the data ingestion,\nusing a \"landing\" area for raw data, making data transformations later to suit\nthe desired data models. Saving the results in other Data Lake layers to achieve\ngreater performance in queries.",[21,237,238],{},"We shifted away from the Streaming inserts in BigQuery by adding a step to load\ndata at the end of the ingestion pipeline. Apache NiFi was the main software\nresponsible for orchestrating and executing the pipeline, covering also the\nimprovements in data ingestion through processes re-engineering.",[21,240,241],{},"Auditing in the Data Lake was managed through Apache Ranger. In order to have\nit fully supported we implemented a JDBC driver using a component from Apache\nCalcite called Avatica. Authentication for Apache Ranger went through a custom\nplugin (also developed during the project) for LDAP consuming user info from\nGoogle Cloud Identity, reflecting the existing organization's users and groups\nfrom Google Suite.",[21,243,244],{},"To make the game more interesting, we containerized the workflow and heavily\nused Kubernetes (GKE) to manage these components. Most of the Apache projects\ndidn't have Helm Charts at the time and we developed and made some\nof them open-source.",[16,246,105],{"id":104},[21,248,249],{},"During project time we could measure an estimative of roughly 56% in Data Lake\ncost-optimization through reengineering of processes and resources, especially\nthe removal of streaming inserts to BigQuery.",[21,251,252],{},"We made relevant progress in security and governance during the project with the\nintroduction of Apache Ranger and Data Lake auditing for access and usage,\nproviding advanced security capabilities to ReclameAQUI, which anticipated itself\ntowards GDPR and data privacy concerns.",{"title":168,"searchDepth":169,"depth":169,"links":254},[255,256,257,258],{"id":18,"depth":169,"text":19},{"id":26,"depth":169,"text":27},{"id":228,"depth":169,"text":229},{"id":104,"depth":169,"text":105},"2019-10-02T00:00:00","Containerized Data Lake running on GCP, using Kubernetes (GKE) to orchestrate Apache ecosystem components, with GCS for data storage and BigQuery as the analytical interface.\nGovernance and security fully implemented using existing Google Suite groups and users through LDAP, giving stakeholders full autonomy to consume data from the Lake (with auditing).",{"duration":262,"tools":265},{"from":263,"to":264},"2019-05-01T00:00:00","2019-09-30T00:00:00",[266,184,267,268,269],"apache spark","python","google bigquery","apache nifi","\u002Fprojects\u002F2019\u002Freclameaqui-data-lake",{"title":203,"description":260},"reclameaqui-data-lake","projects\u002F2019\u002Freclameaqui-data-lake",[275,276],"cloud-native","data lake","QKyuci8jk1a_mXWiDZPW8IrSycgLC-_Ho1g0ydP1aL8",{"id":279,"title":280,"body":281,"createdAt":340,"description":341,"extension":178,"meta":342,"navigation":190,"path":352,"seo":353,"slug":354,"stem":355,"tags":356,"website":199,"__hash__":359},"projects\u002Fprojects\u002F2018\u002Fclipping-service-news-ocr.md","Clipping Service News OCR",{"type":8,"value":282,"toc":334},[283,285,287,290,293,296,298,301,304,306,308,311,314,317,320,323,326,328,331],[11,284,14],{"id":13},[16,286,19],{"id":18},[21,288,289],{},"As the owner of the biggest Brazilian media data set, the media monitoring\nleader Clipping Service was having issues with scalability, getting close to\ntheir data center maximum capacity.",[21,291,292],{},"Clipping Service operates on a huge scale, receiving around ~4.5K media press\npages per day from roughly 300 newspapers in both digital and printed versions.\nPreviously employees called \"readers\" were responsible for reading and clipping\n(adding highlight into the targeted content) to be later passed onto the\n\"reviewers\" team.",[21,294,295],{},"As if the burden of reading countless pages a day were not enough, the readers'\noperation begins around 4:30 a.m. when the \"first reading\" begins (i.e., the\ndelivery of the morning papers).",[16,297,27],{"id":26},[21,299,300],{},"For over 20 years this content has been ingested by so-called \"readers\". But due\nto the advent of the internet and the digital press boom at the end of the 90's,\nand nowadays of social media, companies are transferring their clipping\ninvestments to monitoring other areas. Therefore requiring a Clipping Service\naction to remain competitive in the market.",[21,302,303],{},"Through news reading automation using OCR, NLP, and artificial intelligence to\ncategorize media, the plan was to achieve a higher throughput during ingestion,\ngiving readers more free time to review the content. Consequently achieving a\nhigher quality in the content, since we as humans aren't good at doing\nrepetitive tasks, especially when it comes to reading endless pages searching\nfor names and words.",[11,305,57],{"id":56},[16,307,229],{"id":228},[21,309,310],{},"After spending some time researching and benchmarking the alternatives at hand\nwe decided to use Python as the implementation language for handling texts,\nOCR, and NLP (using NLTK). Given its extended API and libraries for NLP and\nimage processing.",[21,312,313],{},"As the cloud provider we choose AWS due to it's stability and consistency over\nother vendors, the conclusion at the time was: AWS price estimative 14.67%\ngreater than GCP. However, AWS's popularity is greater than GCP and proven in\nterms of stability, support, and integrity. Making a safer choice for a\nslightly higher price.",[21,315,316],{},"The tech stack was: Python 3 using Dramatiq as the task processing library,\nrunning Tesseract OCR jobs, processing text with NLTK and images with Pillow\n(ImageMagick wrapper). Redis was the message broker for Dramatiq, a simple\nPostgres database stored metrics regarding the execution and we had an\nElasticsearch storing the processed content.",[21,318,319],{},"Requests coming from the data center reached an API Gateway, responsible for\nexecuting a Lambda function, and delivering the content result.",[21,321,322],{},"The best part of the design? We stored and served the content via AWS S3. Each\npart was designed with fault tolerance, and we simply turned off the entire\ncloud infrastructure after the operation, to turn on only the next day.",[21,324,325],{},"Operating only from 4am to 2pm, a \"serverless\" and ephemeral project benefiting\nfrom an aggressive cloud cost reduction.",[16,327,105],{"id":104},[21,329,330],{},"Clipping Service reduced its reading team workforce by ~78%, offering internal\nhiring for other areas of the company and a voluntary dismissal plan with\nbenefits, making the process as human as possible for the former employees.",[21,332,333],{},"Using automation for reading tasks, Clipping Service could reach considerable\nimprovements in the media press ingestion throughput (around 20 times faster),\noffering higher quality in press clipping service for its customers and saw the\nopportunity in creating a self-service press clipping service later, since the\noperational cost decreased significantly.",{"title":168,"searchDepth":169,"depth":169,"links":335},[336,337,338,339],{"id":18,"depth":169,"text":19},{"id":26,"depth":169,"text":27},{"id":228,"depth":169,"text":229},{"id":104,"depth":169,"text":105},"2018-09-11T00:00:00","Media monitoring and news clipping service automation through artificial intelligence, OCR and NLP. Delivering a higher throughput to the operation and creating a true serverless infrastructure to extend its Data Center capabilities.",{"duration":343,"tools":346},{"from":344,"to":345},"2018-03-31T00:00:00","2018-09-12T00:00:00",[347,267,348,349,350,351],"tesseract ocr","dramatiq","aws","aws ecs","noops","\u002Fprojects\u002F2018\u002Fclipping-service-news-ocr",{"title":280,"description":341},"clipping-service-news-ocr","projects\u002F2018\u002Fclipping-service-news-ocr",[275,357,358],"serverless","nlp","0RGWaWssukzcQPvJ5JqnWqzV2x7lSEMHVrMQ83xlGbE",{"id":361,"title":362,"body":363,"createdAt":416,"description":417,"extension":178,"meta":418,"navigation":190,"path":427,"seo":428,"slug":429,"stem":430,"tags":431,"website":199,"__hash__":433},"projects\u002Fprojects\u002F2017\u002Fdotz-data-labs.md","Dotz Data Labs",{"type":8,"value":364,"toc":410},[365,367,369,372,381,383,386,388,390,393,396,399,402,404,407],[11,366,14],{"id":13},[16,368,19],{"id":18},[21,370,371],{},"In order to be able to innovate and maintain itself in a highly changing\nand evolving market, Dotz went through a process of digital\ntransformation and had the help of some consultants along the way.",[21,373,374,375,380],{},"Among the steps to get closer to a digital model, the implementation of\na Data Lake emerged, with the requirements of being\n",[148,376,357],{"href":377,"rel":378},"https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FServerless_computing",[379],"nofollow"," and\ncloud-native to support the decision-making process and shorten\ntime-to-market during the launch of products.",[16,382,27],{"id":26},[21,384,385],{},"Dotz is one of the largest companies in the field of loyalty program in\nBrazil and they'd face a high number of issues with data disconnection\nmaking it difficult to analyze their users' behaviors. Since they\nreceived data from numerous supermarkets and stores, it's difficult to\nclusterizate products, since the name is different depending on the\nsource. To help with this analysis, they decided to build a Data Lake.",[11,387,57],{"id":56},[16,389,61],{"id":60},[21,391,392],{},"We built and deployed a Big Data's managed architecture using Google's\nCloud Platform (GCP) to support this strategy and allow a 360-degree\nview of customers (users with points a.k.a. Dotz) and partners (the\nsupermarkets offering the loyalty program).",[21,394,395],{},"The design was focused on cloud-managed services and serverless\ncomputing offered by Google, serving the core competencies of a Data\nLake such as scalable storage using Google Cloud Storage, and Google\nBigQuery. With part of the process running inside Kubernetes,\nresponsible for data cleansing ETL flow management.",[21,397,398],{},"We streamed data with Apache Beam running under Google DataFlow,\nparallel mass processing with Apache Spark jobs running on Google\nDataProc, exploratory analysis with Google DataLab, Machine Learning\nAnalysis with Google ML and Data visualization in Google Data Studio.",[21,400,401],{},"Data is transported using an event-driven model, where every data is\ncollected using a streaming model, even the ETL (which runs on a\nmicro-batch, to enable near real-time exploration). These data goes\nthrough the data pipeline using Google Pub\u002FSub message-oriented\nmiddleware, and every message is serialized using Avro format, which\nreduces the payload and allows transportation to be cost-effective, fast\nand reliable.",[16,403,105],{"id":104},[21,405,406],{},"It all allowed Dotz to have a better structure on their analytical\nplatform, previously managed inside a large MS SQL Server instance,\nwhich moved to a Data Lake with layers allowing data categorization,\ngovernance, quality and security.",[21,408,409],{},"Supporting analytical processes of users data, faster exploration and\nmonetization of their knowledge on customers' behavior.",{"title":168,"searchDepth":169,"depth":169,"links":411},[412,413,414,415],{"id":18,"depth":169,"text":19},{"id":26,"depth":169,"text":27},{"id":60,"depth":169,"text":61},{"id":104,"depth":169,"text":105},"2017-09-03T00:00:00","Serverless and cloud-managed Big Data architecture using Google's Cloud Platform (GCP) to support a 360-degree view of customers and partners of Dotz, one of the largest companies in the field of loyalty program in Brazil",{"duration":419,"tools":421},{"from":416,"to":420},"2018-02-20T00:00:00",[422,423,424,425,426,266],"gcp","scala","apache beam","google pub\u002Fsub","apache avro","\u002Fprojects\u002F2017\u002Fdotz-data-labs",{"title":362,"description":417},"dotz-data-labs","projects\u002F2017\u002Fdotz-data-labs",[432,276],"hybrid-cloud","t7ptKvbW33aAb9OgDBrxoRwLHEporCm6P5VhJSFidt0",{"id":435,"title":436,"body":437,"createdAt":496,"description":497,"extension":178,"meta":498,"navigation":190,"path":505,"seo":506,"slug":507,"stem":508,"tags":509,"website":199,"__hash__":510},"projects\u002Fprojects\u002F2017\u002Feasynvest-data-platform.md","Easynvest Data Platform",{"type":8,"value":438,"toc":490},[439,441,443,446,449,452,455,457,460,463,466,468,470,473,476,479,482,484,487],[11,440,14],{"id":13},[16,442,19],{"id":18},[21,444,445],{},"Within Easynvest's annual planning, an investment in the expansion of\ndata & analytics team aimed at shortening decision making process and\nin delivering a higher quality to customers through a low cost operational\nprocess.",[21,447,448],{},"Among the main objectives of this project, we had the automation of\ncredit analysis (executed during the approval of customer registration,\nusing Machine Learning), a process that until then was long and manual,\nbeing handled by the back office.",[21,450,451],{},"Followed by a better offer of products to the client, carrying out the\ncategorization according to the profile of each customer, allowing\nsuggestions of more attractive products, in line with personal preferences,\nas well as according to the profile of each investor (conservative,\nmoderate or aggressive).",[21,453,454],{},"Last but not least, the intelligent detection of money laundry and\nreporting to the responsible authorities.",[16,456,27],{"id":26},[21,458,459],{},"However, there were limitations in the data tools, mainly due to the\nfact that they were proprietary software (with limited licenses) and\ndesigned for usage inside data centers. In addition, the analytical\ndatabase was modeled for traditional Business Intelligence models\n(OLAP, etc), making the decision making process heavy, due to the\ndemanding amount of interactions during ETL.",[21,461,462],{},"Previously for a client to be approved, the process took 10 to 15 days.\nGathering all necessary information, providing a complete perspective of\nthe profile, including credit analysis. After collecting the information,\nthe back office generated an internal credit analysis score.",[21,464,465],{},"In most cases, the client was not notified of updates regarding the\nprocess and did not receive feedback at the end (if refused) unless\nexplicitly requested (by contacting support via chat or email, for\nexample) which made the process time consuming and costly. Not to\nmention the countless amounts of customers lost to the competition during\nthis long wait.",[11,467,57],{"id":56},[16,469,61],{"id":60},[21,471,472],{},"To make it possible, we built an hybrid-cloud implementation using AWS\ncloud-based components (mainly AWS S3, EMR and ECS), to extend the data\ncenters' capability, implementing a cloud-first Hadoop ecosystem\n(replacing the proprietary software components with open-source\nequivalents). Giving Easynvest the possibility to grow the Data Lake\nexponentially.",[21,474,475],{},"The Data Lake design was robust, aimed to handle the execution of heavy\nanalytical processes through Machine Learning models, with support for\ndata quality, metadata governance, information security and data serving\n(data owners could share data with consumers from other areas within the\ncompany, allowing to self-service their analytical data).",[21,477,478],{},"A Chatbot was also used to reduce the operational load in the\nenvironment, this bot is responsible for maintaining and updating\ninfrastructure components. From triggering deployments to generating\nencryption keys on-demand for data security and governance. Implemented\nwith the Errbot framework for Python interacting with Slack.",[21,480,481],{},"Going further, we implemented the best practices in DevOps, using Jenkins\nas a tool for CI\u002FCD of the developed components alongside with Ansible for\nConfiguration Management.",[16,483,105],{"id":104},[21,485,486],{},"Thanks to the utilization of layers in the Data Lake and the\nimplementation of data pipelines, we were able to reduce the data\ningestion time by 78% and to include metadata and data catalog, in\naddition to automating much of the work that used to be done manually.",[21,488,489],{},"Thus bringing positive results, especially reducing the registration\napproval time for consumers from roughly 10 days to 1 day. It also made\nthe data platform more democratic, providing relevant information that\nfacilitates the analysis of areas such as risk (credit analysis) and\nsupport, without having to give up security.",{"title":168,"searchDepth":169,"depth":169,"links":491},[492,493,494,495],{"id":18,"depth":169,"text":19},{"id":26,"depth":169,"text":27},{"id":60,"depth":169,"text":61},{"id":104,"depth":169,"text":105},"2017-07-20T00:00:00","Hybrid-cloud Data Lake with most of its capabilities running in AWS. Among the main objectives we had the automation of credit analysis, targeted campaigns to investors according to profile and intelligent detection of money laundry",{"duration":499,"tools":501},{"from":500,"to":496},"2017-02-18T00:00:00",[502,432,349,503,426,504,269],"ansible","apache kafka","hadoop","\u002Fprojects\u002F2017\u002Feasynvest-data-platform",{"title":436,"description":497},"easynvest-data-platform","projects\u002F2017\u002Feasynvest-data-platform",[432,276],"-lmpA569F4hZL2xOe_xZ-i6meHrUM9Wl6RDRxZPEfTk",{"id":512,"title":513,"body":514,"createdAt":579,"description":580,"extension":178,"meta":581,"navigation":190,"path":585,"seo":586,"slug":587,"stem":588,"tags":589,"website":199,"__hash__":592},"projects\u002Fprojects\u002F2016\u002Fmovida-rent-a-devops.md","Movida Rent A DevOps",{"type":8,"value":515,"toc":573},[516,518,520,523,526,528,531,534,549,551,553,556,559,562,565,567,570],[11,517,14],{"id":13},[16,519,19],{"id":18},[21,521,522],{},"JSL Holdings Ltd, holder of Julio Simões Logistica (biggest logistics\nplayers in LATAM) bought Movida Rent a Car in 2013 to expand the\nportfolio and open new market opportunities on car rental and selling\nmarkets.",[21,524,525],{},"JSL invested around R$1.8 billion in Movida, and multiplied its annual\nrevenues by 21 times from BRL 58m to BRL 1.2b, in 3 years. Based on\nthese successful results, JSL Holdings Ltd planned an IPO for Movida.",[16,527,27],{"id":26},[21,529,530],{},"In order to be public traded, Movida had to pass through an audition.\nHowever, the software solution did not comply with some security\nstandards.",[21,532,533],{},"The project started on December 2016, planning to implement an automated\nsoftware release process adopting DevOps on their data center. With the\ngoals:",[535,536,537,543],"ol",{},[35,538,539,542],{},[38,540,541],{},"security","; no person would need to access the Linux servers. and",[35,544,545,548],{},[38,546,547],{},"productivity","; releasing features faster to shorten their\ntime-to-market.",[11,550,57],{"id":56},[16,552,61],{"id":60},[21,554,555],{},"Our first goal was to implement the CI\u002FCD pipeline using Jenkins,\nresponsible to pack new features, create a release, and deploy it on\ntheir data center. Apart from the production deployment, the pipeline\nalso supported the creation of ephemeral on-demand environments for\nfeature homologation and feedback retrieval from users.",[21,557,558],{},"To have a faster and more controlled release cycle, we migrated the Git\nserver from a cloud-hosted to the data center. Through this action we\nreduced in 5 minutes the deployment overall time and increased the\ncontrol over accesses in their repositories.",[21,560,561],{},"The CI\u002FCD implementation used Jenkins to control the CI\u002FCD flow, GitLab\nwith LDAP authentication, and Ansible as a Configuration Manager. A\ncomplete deployment took around 2 minutes from the git push to having\ncode running on production.",[21,563,564],{},"Apart from the CI\u002FCD deployment process, we also had to work in a\nself-service strategy for running jobs without directly SSH access to\nservers. Rundeck came into place, with RBAC configurations and\nvisibility over the history of executed jobs.",[16,566,105],{"id":104},[21,568,569],{},"Movida went through audition early on January 2017, by the end of\nJanuary 2017 they received the approval.",[21,571,572],{},"Two weeks later, in February 2017 Movida launched their IPO, marked as\nthe first Brazilian IPO of 2017. Movida went public on the 8th of\nFebruary, 2017, raising BRL 645m.",{"title":168,"searchDepth":169,"depth":169,"links":574},[575,576,577,578],{"id":18,"depth":169,"text":19},{"id":26,"depth":169,"text":27},{"id":60,"depth":169,"text":61},{"id":104,"depth":169,"text":105},"2016-12-21T00:00:00","Movida DevOps initial project, responsible for implementing the base for Continuous Deployment, configuration management and improve servers' security.",{"duration":582,"tools":584},{"from":579,"to":583},"2017-02-06T00:00:00",[502,183],"\u002Fprojects\u002F2016\u002Fmovida-rent-a-devops",{"title":513,"description":580},"movida-rent-a-devops","projects\u002F2016\u002Fmovida-rent-a-devops",[432,590,591],"devops","ci\u002Fcd automation","Jd1O7lN5tDP70bJsNw9gHXBmYxtiohvyXpcYYPzFESw",{"id":594,"title":595,"body":596,"createdAt":653,"description":654,"extension":178,"meta":655,"navigation":190,"path":663,"seo":664,"slug":665,"stem":666,"tags":667,"website":199,"__hash__":669},"projects\u002Fprojects\u002F2016\u002Fnextel-digital-release.md","Nextel Digital Release",{"type":8,"value":597,"toc":647},[598,600,602,605,608,611,613,616,619,622,624,627,630,633,636,639,641,644],[11,599,14],{"id":13},[16,601,19],{"id":18},[21,603,604],{},"Nextel planed to develop a mobile application to reduce their\ncontact rate and operational costs with call centers. \"Nextel Digital\"\nwas the name given to the project responsible for releasing this\napplication.",[21,606,607],{},"\"Nextel Digital\" absorbed more goals like improving the User\nexperience, and turned into a new product called \"Happy\", a digital\ncell phone operator. Nextel Happy allows users to manage their plans and\ndata entirely from the mobile app, from activating your SIM to managing\nyour family plan.",[21,609,610],{},"This project helped Nextel to increase their customers base, improved\nthe users' experience, and decrease operational costs (in 16%) all at\nonce.",[16,612,27],{"id":26},[21,614,615],{},"Nextel Brazil executive team decided to work with outsourcing on the\ndevelopment of this product to absorb the knowledge from digital\ncompanies and to complement their internal capabilities. Also to bring\ndifferent perspectives into play, improving the creative process.",[21,617,618],{},"Our team assumed the responsibility to architect and to implement the\nCloud infrastructure ensuring high-availability, resilience and\nconsistency of the software.",[21,620,621],{},"We were also responsible for the data synchronization between Nextel\ndata center and the cloud. Securely moving tons of GB of users' data to\nthe cloud daily without data loss or duplication.",[11,623,57],{"id":56},[16,625,626],{"id":60},"Technical implementation",[21,628,629],{},"We choose GlusterFS to ensure consistency, installed between Nextel Data\nCenter and AWS. Users' data (e.g. data plan consumption, minutes of\ncall) synchronization went through GlusterFS to AWS.\nNextel IT operations inserted data into GlusterFS directly from cell\nphone towers in near-real-time.",[21,631,632],{},"Once the data is available at AWS mounted volumes, the Celery\nimplementation comes into play. At the core of the architecture, Celery\n(implemented in Python 3) using Redis as the message broker, running\nasynchronous jobs inspects events on the GlusterFS. When Celery detects\na new file it parses the content and starts the multi-part upload to AWS\nS3, then compares the checksums to ensure consistency (and retries in\ncase it's inconsistent).",[21,634,635],{},"After reaching AWS S3 the object event triggers a AWS Lambda function to\nparse the content and index it on Elasticsearch, whose are later served\nto clients through an REST API.",[21,637,638],{},"The entire infrastructure setup was immutable, to facilitate the\nevolution and reliability, using Ansible as a Configuration Manager and\nAWS CloudFormation as the Cloud Provisioner. In just a couple minutes it\nis possible to recreate everything with minimum effort.",[16,640,105],{"id":104},[21,642,643],{},"The entire process of making data from a cell phone tower available to\nend users time went down from 1 day to 5 minutes. This reduced\nin ~56% the contact rate on Nextel call centers, due to a self-service\nalternative provided in the mobile app.",[21,645,646],{},"In addition, users can manage their call history and\nplan consumption directly on the mobile phone, with updates in\nnear-real-time. Providing consistent and interactive feedback.",{"title":168,"searchDepth":169,"depth":169,"links":648},[649,650,651,652],{"id":18,"depth":169,"text":19},{"id":26,"depth":169,"text":27},{"id":60,"depth":169,"text":626},{"id":104,"depth":169,"text":105},"2016-11-30T00:00:00","Digital transformation project at Nextel Brazil, which evolved into a product called \"Happy\", a digital telephony operator. Improving the user experience and reducing operating costs.",{"duration":656,"tools":658},{"from":657,"to":653},"2016-09-01T00:00:00",[349,659,267,660,661,662],"aws cloudformation","celery","glusterfs","elasticsearch","\u002Fprojects\u002F2016\u002Fnextel-digital-release",{"title":595,"description":654},"nextel-digital-release","projects\u002F2016\u002Fnextel-digital-release",[590,668],"digital transformation","RP16P4zonHhqKHpMptt4Ypf3hbRo5gqxtMUZZeZ5IBo",1778441743697]