
Website Grafana Labs
Backend Software Engineer, AI & Machine Learning
Grafana Ops helps engineers detect, respond and learn from system incidents with minimal toil. We develop alerting systems, on-call software, and incident management products that help teams keep their systems stable and available. Our products are built on top of Grafana’s open source observability platform, and it’s important that as we grow, we improve our performance, increase our reliability and delight our users every step of the way. We are growing our team with passionate developers like you to drive innovation in the Incident Management and AI Ops space.
We are developing brand new ML and data driven applications to address real problems in the observability and operations space. Engineers and data scientists are empowered and given the space to explore, innovate and deliver new and exciting features.
Backend engineering roles at Grafana require engineers focused on performance and reliability and who enjoy taking projects from conception to production. Since we deploy production services, we have on-call rotations to ensure the health of the system. We use all of our own products for operations, so being on-call is an important way to understand our system and how to use the tools we create.
Our culture is one of remote-first, and our engineering organization is entirely remote. We provide guidance and meet regularly using video calls. We are looking for people who are independent and excellent communicators.
Requirements:
- You are familiar with programming languages like Python, Go, C, C#, C++, Java or Rust
- You can write clean and performant software
- You have experience engineering solutions using data to solve problems, guide development, or derive insights
- You have experience using data to help teams improve their services
Nice to haves:
- Familiarity with operations/SRE
- Experience with the monitoring space in general (metrics, logging, tracing, and observability)
- You have some experience with distributed systems development
- Familiarity with Incident Response tools
- Experience with Kubernetes / Prometheus / Bigtable / Open Telemetry technologies
To apply for this job please visit boards.greenhouse.io.