Site Reliability Engineering & 8 others
EPAM Systems
This job is no longer accepting applications
See open jobs at EPAM Systems.See open jobs similar to "Site Reliability Engineering & 8 others" FinTech Australia.Software Engineering
Amp. Gabriel Hernández, Ciudad de México, CDMX, Mexico · Armenia · Remote
Posted on Nov 19, 2025
Responsibilities
- Manage and maintain Kubernetes clusters including deployment, scaling, and troubleshooting
- Develop and optimize Jenkins CI/CD pipelines
- Implement and utilize Instana for observability and monitoring
- Handle ELK stack for log management, alerting, and dashboard creation
- Provide production support including incident management and root cause analysis
- Perform performance tuning to enhance system reliability and availability
- Ensure adherence to site reliability engineering best practices
- Work independently as the sole SRE/DevOps specialist in the team
- Collaborate with development and operations teams to improve system performance
- Automate deployment and monitoring processes where possible
- Monitor system health and respond to alerts promptly
- Document processes and share knowledge with team members
- Continuously evaluate tools and technologies to improve operational efficiency
- Participate in on-call rotation to support production systems
Requirements
- Strong hands-on experience with Kubernetes including deployment, troubleshooting, scaling, and monitoring with 3+ years of experience
- Proficiency in Jenkins for CI/CD pipeline development and optimization
- Experience with Instana for observability, tracing, and monitoring
- Background in using ELK stack for log management, alerting, and dashboarding
- Solid application production support skills including incident management and root cause analysis
- Strong understanding of site reliability engineering principles including reliability, availability, monitoring, and observability
- Ability to work independently as the only site reliability engineer or DevOps specialist on a team
- Experience with performance tuning in production environments
- Strong written and verbal English communication skills (B2+)
Nice to have
- Experience with Amazon Web Services infrastructure setup and service integration
- Knowledge of Terraform for infrastructure as code
- Skills in Helm charts, templating, and deployment automation
- Proficiency in scripting languages such as Python, Bash, or Groovy
- Familiarity with Apache Kafka operations, monitoring, and troubleshooting
We offer/Benefits
We connect like-minded people
- Delivering innovative solutions to industry leaders, making a global impact
- Enjoyable working environment, whether it is the vibrant office or the comfort of your home
- Opportunity to work abroad for up to two months per year
- Relocation opportunities within our offices in 55+ countries
- Corporate and social events
We invest in your growth
- Leadership development, career advising, soft skills and well-being programs
- Certifications, including GCP, Azure and AWS
- Unlimited access to LinkedIn Learning and Get Abstract
- Free English classes with certified teachers
We cover it all
- Participation in the Employee Stock Purchase Plan
- Monetary bonuses for engaging in the referral program
- Comprehensive medical & family care package
- Four trust days per year for personal needs
- Discounts for fitness clubs
- Benefits package (hotels, restaurants, stores and services)
This job is no longer accepting applications
See open jobs at EPAM Systems.See open jobs similar to "Site Reliability Engineering & 8 others" FinTech Australia.