Platform Engineering & 11 others
EPAM Systems
Software Engineering
Mexico · Colonia Cuauhtémoc Mexico City, CDMX, Mexico
Posted on Dec 17, 2025
Responsibilities
- Design and implement scalable Databricks platform solutions for analytics, ML, and GenAI workflows across development, testing, and production environments
- Administer and optimize Databricks workspaces including cluster policies, pools, job clusters, autoscaling, and GPU/accelerated compute
- Implement and manage Unity Catalog governance including metastores, catalogs, schemas, data sharing, masking, lineage, and access controls
- Build and maintain Infrastructure as Code using Terraform for reproducible platform provisioning and configuration
- Implement CI/CD pipelines for notebooks, libraries, DLT pipelines, and ML assets using GitHub Actions and Databricks APIs
- Standardize experiment tracking, model registry workflows, and deploy model serving endpoints with monitoring and rollback capabilities
- Develop and optimize Delta Lake batch and streaming pipelines using Auto Loader, Structured Streaming, and DLT, enforcing data quality and SLAs
- Collaborate with cross-functional teams to integrate platform capabilities and ensure best-in-class developer experience
- Monitor platform performance, troubleshoot issues, and implement improvements to ensure reliability and scalability
- Maintain documentation and automation runbooks for platform operations and governance
- Coordinate with security teams to enforce data governance, encryption, and compliance policies
- Promote best practices for coding, testing, and deployment within the platform engineering team
- Drive continuous improvement in platform automation and operational efficiency
- Engage with stakeholders to gather requirements and provide technical guidance
- Mentor junior engineers and share knowledge of platform technologies
Requirements
- Proven hands-on experience administering Databricks on AWS including Unity Catalog governance and enterprise integrations, with 3+ years in platform engineering
- Strong foundation in AWS services such as VPC, IAM, KMS, S3, CloudWatch, and network architecture
- Proficiency with Terraform including databricks provider, and experience with Infrastructure as Code for cloud resources
- Advanced Python and SQL skills with experience packaging libraries and managing notebooks and repos
- Experience with MLflow for experiment tracking, model registry, and familiarity with model serving endpoints
- Knowledge of Delta Lake, Auto Loader, Structured Streaming, and DLT
- Experience implementing DevOps automation, CI/CD pipelines, and using GitHub Actions or similar tools
- Strong Git and GitHub proficiency including code review and branching strategies
- Familiarity with REST APIs, Databricks CLI, and scripting for automation
- Excellent communication and stakeholder management skills
- Ability to work independently and within a distributed team environment
- Detail-oriented with strong problem-solving and organizational skills
- English proficiency at B2 (Upper-Intermediate) level or higher
Nice to have
- Experience with AWS EKS and Kubernetes
- Familiarity with MLOps practices and pipeline automation
- Knowledge of attribute-based access control and advanced data governance concepts
- Experience with Secrets management and SSO/SCIM provisioning
- Certification in AWS or Databricks platform engineering
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn