Site Reliability Engineering
EPAM Systems
Software Engineering, Other Engineering
Ukraine
Posted on Jan 16, 2026
Responsibilities
- Provide L3 on-call support as needed
- Design and develop monitoring systems for infrastructure and products
- Define and implement SLI/SLOs for system reliability tracking
- Conduct thorough root cause analyses for incidents
- Lead postmortem procedures and drills for continuous improvement
- Analyze product performance, scalability, and reliability
- Automate operational tasks to enhance efficiency
- Implement and manage CI/CD pipelines following "as-Code" practices
- Oversee cloud infrastructure and configuration management using Infrastructure-as-Code principles
- Collaborate closely with cross-product teams and business stakeholders to align reliability objectives
Requirements
- 5+ years of relevant experience, including 1 year in a leadership role
- Advanced knowledge of scripting languages such as Python, Go, Bash, or Powershell
- Expertise in any major cloud platform (AWS, GCP, or Azure)
- Proficient in optimizing monitoring and logging tools like DataDog, Dynatrace, Prometheus, Grafana, Zabbix, or ELK
- Capability to manage cloud infrastructure using tools like Terraform and command-line interfaces (gcloud, az, aws)
- Competency in configuration management using Ansible
- Background in CI/CD toolchains such as Jenkins (Groovy SDK, Jenkinsfile), GitLab-CI, or Azure DevOps
- Understanding of containerization technologies such as Docker and Kubernetes
- Exceptional troubleshooting and problem-solving abilities, including reconstructing incident conditions and flows based on root cause analysis
- B2-level English proficiency, both in speaking and writing
Nice to have
- Familiarity with multiple cloud-native monitoring tools
- Showcase of leading cross-functional team collaborations
- Proficiency in advanced Kubernetes configurations
We offer/Benefits
With us you can:
- Work on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in Ukraine
- Receive the necessary equipment to perform your work tasks
- Change projects and technology stacks within EPAM
- Gain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)
- Relocation opportunities may be available for eligible candidates, depending on the role and openings at other EPAM locations
- Participate in volunteer, charity programs and communities (both technical and interest-based)
We focus on your professional growth:
- You can plan your individual career path together with your manager
- Receive regular feedback from colleagues
- Improve your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)
- Get the opportunity to undergo free training and certification in AWS, GCP, or Azure Clouds
- Use the internal E-learn training program (18,200+ specialized training and mentoring programs)
- Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resources
- Study at EPAM Solution Architecture School with the instructors who are practicing architects
- Develop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and more
- Participate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually)
What we offer:
- Vacation and sick leave (including a sick leave without a medical certificate)
- A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)
- Medical insurance for family members at corporate rates
- Company support during significant life events (childbirth or adoption, marriage, etc.)
- Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic training
- E-kids program - a free programming language training program for EPAMers' children