ABOUT THE ROLE
Peloton to move faster and scale safely with minimal effort. Core to this mission is creation of the best developer experience in the tech industry for the entire spectrum of Peloton's technology. We work across an incredible range of technology domains: hardware, firmware, web, mobile, backend, data, messaging, content, streaming, and machine learning. We get to apply these to create a platform of products loved by millions of members all over the world. Peloton is looking for a Senior Site Reliability Engineer with a focus on Kubernetes operations to work with teams across the organization to help build and maintain a monitorable, performant, reliable and highly-scalable deployment platform.
We are a growing team of engineers tackling challenging problems scaling Kubernetes to handle thousands of nodes and pods spread across many deployments to support millions of engaged Peloton members.
Software and systems engineers with interest and/or experience in system automation and Kubernetes are encouraged to apply for this position.
YOUR DAILY IMPACT AT PELOTON
- Evangelize best practices for building and operating highly scalable and reliable systems
- Serve as subject matter expert in observability and monitoring
- Consult in system design to meet reliability and capacity requirements
- Automate everything, from infrastructure down to day-to-day tasks
- Conduct retrospectives of infrastructure incidents
- Seek out potential threats to security and reliability and advocate for modern solutions.
- Automatic, fast auto scaling for live rides and special large events
- Host a critical infrastructure that ensures that our members have the best experience possible on tens of thousands of pods across multiple clusters
- Work with Kubernetes, Amazon Web Services, Golang, ArgoCD, Python, Github Actions, Kong, Istio & CloudFlare
- Be a part of a positive and inclusive culture that is hyper-focused on solving developer challenges during hyper-growth
- Mentor engineers on standard methodologies and work across teams to help build longer-term roadmaps and build our technical vision.
YOU BRING TO PELOTON
- Experience maintaining scalable and stable Kubernetes clusters
- Deep experience building products and tools with programming languages like Python, Golang, Kotlin, Ruby, etc.
- Knowledge of best practices when it comes to the observability and monitoring required to run Kubernetes at scale
- Experience with CI/CD Systems such as for example: ArgoCD, Harness, Tekton, etc.
- Experience deploying infrastructure using Infrastructure as Code utilities such as Terraform, Cloudformation/CDK, or Pulumi
- Know when to triage and when to dive down into a root-cause analysis
- Passion for reliable, scalable, observable software with a strong sense of ownership
- Collaborate well with engineering managers, product managers, designers, and other developers
- Deep experience with Linux system administration
- Experience writing operators to automate tasks within Kubernetes clusters
- Previous experience working in a multi-region environment, with clusters deployed globally
- Contributions to opensource projects within the ecosystem
- Good at nautical-related puns