Site Reliability Engineer (Devops)
Job Description:
Location: Remote
Type: Full-time
Comp: 140-180k
About the Role
The client is looking for a Site Reliability Engineer (SRE) who is passionate about ensuring the performance and reliability of complex systems. Youll work behind the scenes to optimize infrastructure, automate processes, and monitor system health, making sure our platforms are fast, stable, and available to users around the clock.
Experience at trading shop/exchange is a must, avoid ex nyse/nasdaq
What were looking for:
Experience designing and operating large-scale production systems.
Experience working with AWS or other cloud providers.
Deep Kubernetes expertise: building operators, custom schedulers etc. Not required but would be super handy.
Experience with Infrastructure-as-Code (e.g. Terraform, Ansible, AWS CDK, CDKTF would be a plus).
Experience with observability best practices and tooling (we use Prometheus/Loki/Tempo/Pyroscope).
Experience building deployment pipelines leveraging common CI/CD tools (we use ArgoCD and GitHub Actions).
A good understanding of software engineering principles and an ability to write clean code in any programming language (we use a mix of Python and Go).
A good understanding of web applications and architecture.