Senior Site Reliability Engineer
Our client is a profitable developer-tooling company whose product is used by engineering teams at thousands of software companies for application monitoring and incident management. Their infrastructure runs across three AWS regions with strict 99.95% availability commitments and the SRE team is a senior, well-resourced group of nine.
As Senior SRE you will lead reliability initiatives across the platform — from defining and driving SLOs and error budgets, to running incident command for major outages, to building self-service infrastructure that lets product teams ship safely. You will work in Go and Python, with Terraform for IaC, and a Kubernetes-based deployment platform.
The role offers genuine technical scope, a thoughtful on-call rotation (compensated, capped weeks, frequent retros), and the chance to shape reliability culture at a company whose customers are themselves SRE and DevOps practitioners.
Requirements
- 6+ years of SRE, infrastructure engineering, or production operations experience
- Strong proficiency in at least one of Go, Python, or Rust
- Deep experience with Kubernetes, Terraform, and AWS (or GCP) at production scale
- Demonstrated ownership of SLOs, error budgets, and incident response programs
- Comfort writing public-facing postmortems and presenting reliability data to executives
Job details
- Salary
- $175,000 – $220,000 + Equity
- Location
- Denver, CO
- Contract type
- Permanent
- Sector
- Technology
Or email hello@kovoro.com