Application Support & 4 others
EPAM Systems
Customer Service
Argentina · Amp. Gabriel Hernández, Ciudad de México, CDMX, Mexico · Remote
Posted on Nov 19, 2025
Responsibilities
- Maintain a stable, secure, and performant enterprise data platform (Snowflake, AWS data stack, dbt, orchestration tools, BI/analytics, etc.)
- Provide operational coverage within an 8/5 support model and participate in a 24/7 on-call rotation for critical incidents
- Implement robust monitoring, alerting, and observability solutions to facilitate proactive incident detection and resolution
- Perform platform upgrades, patching, and configuration management in alignment with security and compliance requirements
- Continuously tune system performance to meet evolving business needs
- Use holistic observability frameworks covering infrastructure, data pipelines, and platform services to execute monitoring activities
- Deliver actionable operational insights through monitoring dashboards and reporting
- Identify and execute process automation to improve efficiency and reduce manual interventions
- Propose and implement continuous improvements to advance platform resilience, scalability, and cost-effectiveness
- Contribute to infrastructure-as-code and configuration-as-code practices for consistent, repeatable operations
Requirements
- Background in managing cloud-native data platforms for over 3 years (e.g., Snowflake, Databricks, BigQuery, or similar)
- Expertise in cloud infrastructure (AWS) with emphasis on operations, automation, and cost governance
- Skills in monitoring and observability tools (Datadog, Prometheus, Grafana, ELK, CloudWatch, etc.)
- Knowledge of Infrastructure as Code (Terraform, Pulumi, Ansible) and configuration management practices
- Understanding of networking, security, and compliance in cloud environments
- Competency in problem-solving with a proactive, service-oriented mindset
- Flexibility to work in a global operations environment with on-call responsibilities
- Qualifications in clear communication and collaboration with engineering, data, and business stakeholders
- Commitment to continuous improvement and operational excellence
- Proficiency in English language at an Upper-Intermediate level (B2) or higher
Nice to have
- Showcase of implementing FinOps frameworks and cost optimization practices
- Background in working within regulated industries (pharma, healthcare, finance) in compliance-driven environments
- Familiarity with modern data stack tools (dbt, Dagster/Airflow, ThoughtSpot, Tableau, Power BI)
- Understanding of SRE (Site Reliability Engineering) principles and practices
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn