Lead Site Reliability Engineer
EPAM Systems
Lead Site Reliability Engineer Description
Join our dynamic team as a Lead Site Reliability Engineer! If you have a substantial background in software and systems engineering and a focus on reliability and scalability in cloud environments, your expertise is needed in managing and communicating with IoT devices via our platform. You will have a critical role in duties such as device registration and connection, bi-directional messaging between devices and the cloud, device state tracking and data storage, issuing alerts and notifications for device state changes, and integrating other cloud services like Device Registry and Firmware Upgrade.
This position offers hybrid setup with the flexibility to work from any location in Latvia, whether it's your home or our office in Riga.
#LI-DNI#LI-VA2
Responsibilities
- Design, implement, and maintain highly scalable and available systems across Azure cloud architectures
- Regularly test and implement disaster recovery (DR) plans
- Configure and enhance monitoring and alerting processes using Prometheus, Grafana, and OpsGenie
- Develop dashboards to visualize system performance and reliability metrics
- Use Terraform for infrastructure provisioning and management
- Support the development team in ongoing projects
- Communicate with the customer’s DevOps team to discuss requirements and collaborate on implementations
- Enhance release management and CI/CD processes
- Improve system security based on security team recommendations
- Document system support processes and design, write and test runbooks for operational tasks and incident response
Requirements
- Minimum 5 years of experience as a DevOps or SRE engineer
- Proven experience with Azure cloud architectures
- Proficiency in Kubernetes and Docker/Linux services
- Familiarity with monitoring tools: Prometheus, Grafana, OpsGenie
- Experience with .NET Core and ASP.NET Core applications
- Strong knowledge of Cosmos DB (both Mongo API & SQL API) and MS SQL Server
- Expertise in Terraform
- Experience with CI/CD tools and Azure Networking concepts
- Excellent communication skills, ability to manage tasks and projects independently
- Experience with Azure IoT Hub and EventHub is an added advantage
We offer
- Engineering Heritage: Best-in-class experts sharing a culture of engineering excellence and tackling complex engineering challenges for over 30 years
- Advanced Tech Stack: Innovative projects where you can apply or enhance your expertise in Cloud, Data, AI, and other emerging technologies
- World-Class Clients: Work closely with 295+ of the Forbes Global 2000 on creating disruptive solutions that make a global impact
- Professional Growth: Exceptional support for career development with comprehensive resources for upskilling or reskilling in pioneering practices
- GenAI Community: Strong AI competencies with 600+ experts across 55+ locations driving GenAI-enabled transformation journeys
- Entrepreneurial Culture: If you're passionate and dedicated to improving business transformation, we provide the support you need to bring your ideas to life
- Hybrid Setup: The flexibility to work from any location in Latvia, whether it's your home or our office in Riga
- Other Benefits: Additional vacation and trust days, private health insurance, Employee Stock Purchase Plan and more
Salary range €4K-€5.9K gross, based on your experience and interview results.
About EPAM
EPAM is a leading global provider of digital platform engineering and development services. For over 30 years, our team has helped leading brands navigate the waves of digital transformation, building solutions that help them stay competitive through constant market disruption.
With offices in 55+ countries, EPAM has grown in Latvia to over 150+ talented innovators in 3 years. We foster creativity and unconventional ways of doing things, welcoming like-minded professionals to join us.