Generative AI Operations & 4 others
EPAM Systems
Software Engineering, Operations, Data Science
Portugal · Remote
Posted on Nov 19, 2025
Responsibilities
- Design scalable AI and ML workloads that align with company objectives
- Develop and sustain reproducible machine learning pipelines
- Deploy AI models into production utilizing model serving infrastructures
- Implement monitoring and logging frameworks for AI services observability
- Define infrastructure needs for MLOps pipelines and related components
- Collaborate with infrastructure engineers to facilitate infrastructure deployment
- Mentor and guide team members to encourage best practices and ongoing improvement
- Coordinate activities with cross-functional teams including data scientists and engineers
- Optimize ML workloads to enhance performance and scalability
- Ensure adherence to security protocols and data privacy regulations
- Assess new tools and technologies to improve AI service delivery
- Document system designs and workflows to support knowledge sharing
- Troubleshoot and resolve AI service production issues
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or related discipline
- Minimum 7 years’ experience in AI, machine learning, data engineering, software development, or cloud infrastructure
- Strong expertise in Python and familiarity with AI/ML frameworks such as PyTorch, TensorFlow, HuggingFace, or Scikit-learn
- Experience with model inference runtimes like vLLM, MLServe, or Torch Serve
- Proficiency with containerization and orchestration tools including Docker and Kubernetes
- Experience specifying and implementing infrastructure requirements for ML pipelines
- Strong analytical and problem-solving skills with ability to operate within agile, cross-functional teams
- Effective communication and mentoring abilities to support team growth
- English language proficiency at B2 level or higher
Nice to have
- Experience working with cloud platforms such as Azure, AWS, or Google Cloud
- Understanding of Infrastructure as Code (IaC) methodologies
- Familiarity with experiment tracking tools and pipeline orchestration systems
We offer/Benefits
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn