Site Reliability Engineering & 1 others
EPAM Systems
Software Engineering
Colombia · Remote
Posted on Jan 10, 2026
Responsibilities
- Guarantee the reliability and efficiency of production environments
- Partner with engineering teams to enhance system robustness
- Build and sustain CI/CD pipelines and automation frameworks
- Apply infrastructure as code principles using Terraform, CloudFormation, or equivalent
- Oversee containerization and orchestration with Docker and Kubernetes
- Supervise system monitoring, logging, and incident management
- Enhance networking and Linux system performance
- Support scalability across cloud platforms like AWS, Azure, and GCP
- Collaborate with teams to implement DevOps best practices
- Aid in diagnosing and resolving production incidents
- Record system setups and operational procedures
- Engage in on-call duties and incident response
- Assess and deploy monitoring and alerting tools
- Ensure adherence to security best practices in system operations
Requirements
- Proven experience with cloud providers such as AWS, Azure, or GCP (3+ years)
- Expertise in developing CI/CD pipelines and automation solutions
- Hands-on skills with infrastructure as code tools like Terraform or CloudFormation
- Experience with Docker and Kubernetes for container orchestration
- Comprehensive knowledge of Linux, networking, monitoring, logging, and incident handling
- Strong communication and teamwork abilities
- Upper-Intermediate English proficiency (B2)
Nice to have
- Understanding of SRE methodologies including SLIs, SLOs, and error budgets
- Familiarity with scripting languages such as Python, Go, or Bash
- Experience with observability platforms like Prometheus, Grafana, ELK, or Datadog
- Knowledge of security best practices and DevSecOps approaches
- Background in supporting high availability and large-scale production systems
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn