Site Reliability Engineering & 10 others
EPAM Systems
Software Engineering
Mexico · Remote
Posted on Nov 19, 2025
Responsibilities
- Support the product monitoring system
- Assist in troubleshooting and resolving incidents
- Contribute to automation tasks for log analysis and alerts
- Monitor system health and performance metrics
- Document procedures and incident-related information
- Work with operations and development teams on reliability improvements
- Learn and use tools such as ELK/Kibana, Prometheus, and Grafana
- Use scripting languages like Python and Bash for basic automation
Requirements
- Some experience or internships in systems reliability or software engineering
- Understanding of cloud technologies and scripting in Python or Bash
- Basic knowledge of monitoring tools such as Prometheus or Grafana
- Familiarity with logging tools such as ELK/Kibana
- Knowledge of version control using Git
- Proficient in communication with English skills at B2 level
Nice to have
- Foundation in databases like SQL or MongoDB
- Exposure to tools such as PagerDuty
- Basic knowledge of C# programming
- Interest in Oil & Gas operations
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn