Delivering a Great Developer Experience

with Platform Engineering

About Us

John Burns

Sr. Staff Platform Engineer @ GrubHub

CKUG Co-Organizer

ktlint-gradle Maintainer

github logo wakingrufus

fediverse logo @wakingrufus@bigshoulders.city

About Us

Sam Raghunath

Ex-Sr. Principal Platform Engineer @ GrubHub

github logo onewaysidewalks

GrubHub logo
  • Unlimited PTO
  • 8-16 weeks of parental leave
  • 4.5 day work week
  • Practice Platform Engineering

Agenda

  • History: DevOps -> Platform Engineering
  • What is Platform Engineering?
  • What makes up a platform?
  • How does Platform Engineering effect Developer Experience?
  • Case Study @ Grubhub

Where we started: DevOps


  • Too much variety
    -> difficult to maintain

  • Duplicate Solutions
    -> wasted effort across teams

  • Siloed teams
    -> lack of commodities slows down product development

What about Platform Engineering?


  • Centralize Devops Expertise
  • Expand Scope past operations into streamlined runtime solutions


What does platform engineering encompass?

Platform Engineering

Infrastructure

  • CI/CD tooling
  • IaaS
  • Networking
  • Service Mesh
  • Observability

Service

  • Build Tooling
  • Platform Services
  • Runtime libraries
  • "DevSecOps"

Frontend

  • Web
  • Android
  • iOS

Data

  • ETL
  • Data Lake
  • ML Model Training
  • Operational Data stores

Documentation

Platform Elements of DX

CI/CD

  • Build / Test / Integrate Code
  • View deployment state
  • Trigger pipelines
  • GrubHub: Jenkins + Spinnaker + Busboy
  • Evolving toward k8s

Platform Elements of DX

Development Environments

  • Repeatable local environments
  • Use whats in production
  • Practice using runtime tooling
  • Grubhub: Tilt, Single CLI to interact with environments

Platform Elements of DX

Build Tooling

  • Gradle Plugins
  • Generating deployable unit is table stakes
  • Most direct line of communication to the developer
  • Collect information for reporting back to the IDP

Platform Elements of DX

Runtime Libraries / Frameworks

  • Out of the box platform integration
  • Clients for platform services
  • Reduce boilerplate
  • Standardize features
  • Roux
  • Based on Spring Boot

Developer Experience

If We...

  • Build for team autonomy
  • Clear potential blockers
  • Reduce learning curve

Then we will see...

  • Better Retention
  • Higher Productivity

Guiding Principles

Golden path

  • Not "golden cage" or "walled garden"
  • Allows innovation
  • Reduces frustration
  • When should team "off-road"?
  • When should path be "widened"?

Guiding Principles

Carrot vs Stick

  • Build a great product
  • Teams will adopt it voluntarily
  • Force adoption only as a last resort

Guiding Principles

Communication / Feedback Loops

  • Enable informed decisions
  • Enable informed planning

Technical Philosophy

Technical Philosophy

Don't break the build

  • Don't disrupt productivity
  • Anticipate breaking changes
  • Help teams migrate
  • Track and report progress

Technical Philosophy

Monorepo vs Polyrepo

Technical Philosophy

Cost of Monorepo

  • Requires dedicated build team
  • Large git repos have scaling limitations
  • Reduces Autonomy
  • Increases Coupling

Technical Philosophy

Principles of Polyrepo

  • Keep related work together (iaas definitions, docs, etc)
  • Deliver shared tooling by plugging into CI and build systems

IDP

Internal Developer Portal

  • IDP: "Platform" vs "Portal"
  • Deployment state
  • Adoption
  • Service Scorecard

IDP

Service Scorecard

  • Security
  • Modernization
  • Cost
  • Complexity

IDP

Service Scorecard

IDP

IDP

Service Scorecard

IDP

Case Study: Spring Boot 2.5 Upgrade

Case Study: Spring Boot 2.5 Upgrade

Deprecated Dependency

Deprecated dependencies

Case Study: Spring Boot 2.5 Upgrade

Deprecated Dependency

Deprecated dependencies

Case Study: Spring Boot 2.5 Upgrade

ArchUnit

ArchUnit CLI

Case Study: Spring Boot 2.5 Upgrade

ArchUnit

ArchUnit UI

Case Study: Spring Boot 2.5 Upgrade

ArchUnit

ArchUnit Summary

Case Study: Spring Boot 2.5 Upgrade

OpenRewrite

Evolving a platform with confidence

Upgrade platform(s) without disruption

  • @Deprecated in Roux (core java frameworks): 600 -> 200
  • 0 rollbacks, 0 failed upstream builds

Evolving a platform with confidence

Ensure efficiency in processes

  • DORA Metrics @ Grubhub 2021-2024
    • Deployment Frequency - avg. ~374 releases/month
    • Time to Restore Service - P95: ~10 minutes
    • Lead time for changes - P95: ~4 days for prod, ~30minutes for preprod
    • Change Failure Rate - 3.3% deployment failure rate
    • Reliability Targets met - 97.4% SLOs met in 2024

Resources

platformengineering.org community hub + slack

Netflix TechBlog

Developer Success Lab

References

Developer Experience is Dead: Long Live Developer Experience! - Justin Reock

Psychological Affordances Can Provide a Missing Explanatory Layer for Why Interventions to Improve Developer Experience Take Hold or Fail - Cat Hicks

DORA Metrics - five key metrics that indicate the performance of a software development team - DORA team @ Google

Developer Thriving Whitepaper