The access control nobody chose

At some point during Jenkins' original setup at Freeletics, the Google OAuth plugin was configured in development mode. That mode grants every account in the G-Suite domain admin access to Jenkins. It was left in production. By the time I ran an access review in 2020, every Freeletics employee (regardless of role) could open Jenkins, modify jobs, trigger deployments to any environment, and change system configuration.

Nobody intended this. It was a default.

That distinction matters: security debt rarely accumulates through deliberate choices. It accumulates through configuration that worked well enough to ship and was never revisited. Each individual default is harmless; the pattern is not. A CI system that any employee can reconfigure is an audit finding waiting to happen (and, more practically, a deployment pipeline that can be triggered or altered by someone who has no business doing so).

The fix wasn't just "restrict access." That framing leads to the wrong design. The real question was: how do you build access control that stays accurate over time without requiring manual maintenance? Access lists that require human upkeep drift. They drift because people change roles, people leave, and nobody has time to be the steward of a permissions spreadsheet.

Authorization as an engineering problem

When we decided to rebuild Jenkins, the design goals included making every aspect of the system reproducible from code. Authorization was no exception.

Two options were evaluated.

Google OAuth with groups would restrict access to specific G-Suite groups rather than the entire domain. The off-boarding story is clean: deactivate a G-Suite account and that person loses Jenkins access immediately. The problem is the authorization model itself. G-Suite groups are designed for e-mail distribution, not access control. A Jenkins permission group built on top of G-Suite groups would live outside any existing engineering workflow: group membership changes happen in the G-Suite admin console, not through pull requests; there's no audit trail a developer can query; and keeping groups synchronized with actual team structure requires manual intervention every time someone changes teams.

GitHub OAuth maps Jenkins authorization directly to GitHub team membership. At Freeletics, GitHub teams already reflected engineering structure (back-end, web, ops, coach) and were already managed through pull requests. Using them as the Jenkins authorization source meant zero new model to maintain: one source of truth, one place to make changes, one place to audit them.

The off-boarding trade-off is real: deactivating a G-Suite account doesn't automatically remove someone from a GitHub team, so there's a separate removal step required. That gap was consciously accepted.

Q: Is a manual off-boarding step a deal-breaker? A: Only if the alternative is more reliable in practice. A model requiring constant maintenance to stay accurate has a short mean-time-to-drift; a model requiring one deliberate step at a specific moment has a narrow, bounded failure mode. GitHub OAuth's gap is known, documented, and owned. Google group drift is open-ended and invisible until an audit surfaces it.

The outcome: GitHub OAuth with team-based RBAC. Access requests happen through pull requests on the GitHub teams configuration. New engineers, role changes, contractor access: all of it reviewable, auditable, and executed through a workflow that already exists. Any access review is a git log away from being done.

The Helm Chart had a dirty secret

The secrets problem was a separation of concerns violation wearing a Helm chart costume.

The Jenkins Helm Chart stored secrets encrypted via helm-secrets inside the Helm values file. Encrypted at rest, available at deploy time (it worked). The problem surfaced the moment you needed to change anything else.

Every Helm release (every plugin update, every executor count adjustment, every configuration change) carried the full secrets bundle along for the ride. Rolling back a Helm release because a plugin update broke something also rolled back the secrets version. Two completely independent concerns (configuration and secrets) shared one release train, which meant neither could change independently.

Q: What's the actual cost of that coupling? A: Imagine a plugin update that breaks the build pipeline on a Monday morning. You roll back the Helm release. The rollback also silently reverts a credential rotation that happened the previous week. You now have live pipelines running against rotated credentials that your Jenkins hasn't been told about. The blast radius of a configuration change includes your secrets state, and there's no way to disentangle the two without a full re-deploy.

This is Muda at the infrastructure level: work performed to fix a configuration change bleeds into your secrets state. Every deployment carries overhead that has nothing to do with what you're actually trying to change. That overhead is invisible until something breaks.

Splitting the concerns

The redesign cut the coupling entirely. Secrets and configuration got independent release cycles and separate ownership.

Secrets fell into two categories, and the distinction drove the design:

Runtime secrets (Kubeconfigs, Kubernetes service account credentials, anything that rotates regularly or whose value pipelines read at execution time) were moved to AWS Secrets Manager. The AWS Secrets Manager Credentials Provider plugin extends the Jenkins credentials API to read from Secrets Manager on the fly. Rotation happens in Secrets Manager; pipelines pick up the new value on the next run with no Helm release required.

Static credentials (AWS IAM keys, API tokens, service credentials that change infrequently) stayed Sops-encrypted in the infrastructure repository and were synced to the Jenkins Credentials Store through JCasC on each Helm release. The repository is the audit log: every change goes through a pull request, every reviewer can see exactly what changed, and git history is the versioning.

AWS SSM Parameter Store was ruled out early. The Jenkins plugin for SSM doesn't obfuscate secret values from build log output, meaning credentials can appear in plaintext in job logs. HashiCorp Vault covers all three requirements and more, but running and operating a Vault cluster is itself a non-trivial platform engineering task. The operational overhead outweighs the marginal benefit over AWS Secrets Manager at this team size and scope.

Boring security, by design

The through-line across both changes is the same: the right security posture is the one that stays correct without requiring ongoing effort to maintain.

Authorization that maps to existing GitHub teams doesn't drift because it inherits all the maintenance work that already happens around GitHub teams. When a new engineer joins, they get added to a GitHub team (a step that was already happening) and Jenkins access follows. When someone changes teams, their Jenkins permissions change with it. The security model is a side effect of normal engineering operations, not a parallel task stacked on top of them.

Secrets with independent release cycles don't get silently rolled back as collateral damage from configuration changes. Rotation happens in one place and propagates without a deployment. The audit trail lives in the repository where anyone can query it.

Neither of these is "hardening" in the traditional sense. They're design choices that make the system easier to operate correctly than incorrectly (which is what a boring platform actually means). Not zero incidents, but no surprises: the kind of security that doesn't require emergency firefighting because the defaults are already aligned with the intent.

With authorization and secrets on solid footing, the next phase was the build system itself: replacing Docker-in-Docker with Kaniko, redesigning the Groovy Shared Library, and making the Dependabot Monday mornings a non-event. Part 3: the execution.