Platform Engineering & 11 others
EPAM Systems
Software Engineering
Mexico · Colonia Cuauhtémoc Mexico City, CDMX, Mexico
Posted on Dec 17, 2025
Responsibilities
- Architect and deploy scalable Databricks platform solutions for analytics, machine learning, and GenAI workflows across multiple environments
- Manage and enhance Databricks workspaces, including cluster policies, autoscaling, GPU compute, and job clusters
- Oversee Unity Catalog governance by managing metastores, catalogs, schemas, data sharing, masking, lineage, and access control
- Develop and maintain Infrastructure as Code with Terraform to enable automated, consistent platform provisioning
- Establish CI/CD pipelines for notebooks, libraries, DLT processes, and ML assets using GitHub Actions and Databricks APIs
- Standardize experiment tracking and model registry workflows with MLflow and manage model serving endpoints with monitoring and rollback
- Optimize Delta Lake batch and streaming pipelines using Auto Loader, Structured Streaming, and DLT while ensuring data quality and SLA compliance
- Collaborate with cross-functional teams to integrate platform features and deliver an exceptional developer experience
- Monitor system performance, troubleshoot issues, and implement enhancements to guarantee platform reliability and scalability
- Document platform operations and maintain automation runbooks for governance and support
- Coordinate with security teams to enforce data governance, encryption, and compliance standards
- Champion best practices in coding, testing, and deployment across the platform engineering team
- Drive ongoing improvements in automation and operational efficiency for the platform
- Engage stakeholders to capture requirements and provide expert technical guidance
- Lead and mentor junior engineers, sharing expertise in platform technologies
Requirements
- Proven expertise administering Databricks on AWS including Unity Catalog governance and enterprise integrations with at least 5 years in platform engineering
- Comprehensive knowledge of AWS services such as VPC, IAM, KMS, S3, CloudWatch, and network architecture
- Advanced skills with Terraform including the Databricks provider and experience with Infrastructure as Code for cloud environments
- Strong proficiency in Python and SQL, including packaging libraries and managing notebooks and repositories
- Experience using MLflow for experiment tracking, model registry, and model serving endpoints
- Familiarity with Delta Lake, Auto Loader, Structured Streaming, and DLT technologies
- Solid experience implementing DevOps automation, CI/CD pipelines, and using GitHub Actions or similar tools
- Expertise in Git and GitHub, including code review processes and branching strategies
- Working knowledge of REST APIs, Databricks CLI, and automation scripting
- Excellent communication and stakeholder management abilities
- Capacity to work autonomously and within distributed teams
- Detail-focused with strong problem-solving and organizational skills
- English language proficiency at B2 (Upper-Intermediate) level or above
Nice to have
- Hands-on experience with AWS EKS and Kubernetes
- Understanding of MLOps methodologies and pipeline automation
- Knowledge of attribute-based access control and enhanced data governance frameworks
- Experience with Secrets management and SSO/SCIM provisioning
- Relevant certifications in AWS or Databricks platform engineering
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn