Site Reliability Engineering & 1 others
EPAM Systems
Software Engineering
Colombia · Remote
Posted on Jan 10, 2026
Responsibilities
- Ensure reliability and performance of production systems
- Collaborate with engineering teams to improve system stability
- Develop and maintain CI/CD pipelines and automation tools
- Implement infrastructure as code using Terraform, CloudFormation, or similar technologies
- Manage containerization and orchestration with Docker and Kubernetes
- Monitor systems and handle logging and incident management
- Optimize networking and Linux system performance
- Support scalability of cloud platform environments (AWS, Azure, GCP)
- Coordinate with teams to apply DevOps best practices
- Assist in troubleshooting and resolving production issues
- Document system configurations and procedures
- Participate in on-call rotations and incident response
- Evaluate and implement monitoring and alerting solutions
- Maintain security best practices in system operations
Requirements
- Strong experience with cloud platforms such as AWS, Azure, or GCP (2+ years)
- Proficient in CI/CD pipeline development and automation tools
- Skilled in infrastructure as code using Terraform, CloudFormation, or similar
- Experience with containerization and orchestration using Docker and Kubernetes
- Solid understanding of Linux systems, networking, monitoring, logging, and incident management
- Good communication and collaboration skills
- Upper-Intermediate English language proficiency (B2)
Nice to have
- Knowledge of SRE practices including SLIs, SLOs, and error budgets
- Familiarity with scripting languages such as Python, Go, or Bash
- Experience with observability tools like Prometheus, Grafana, ELK, or Datadog
- Exposure to security best practices and DevSecOps
- Experience supporting high availability and large-scale production systems
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn