Generative AI Operations
EPAM Systems
Software Engineering, Operations, Data Science
Ukraine
Posted on Jan 16, 2026
Responsibilities
- Build and Manage CI/CD Pipelines: Design, implement, and maintain robust, automated CI/CD pipelines for training, evaluating, and deploying large language models (LLMs) and AI agents
- Orchestrate Agentic AI Workflows: Design, deploy, and manage sophisticated, multi-agent systems. Ensure seamless Agent-to-Agent (A2A) communication and collaboration between specialized agents to automate complex business processes
- Manage Tool Integration: Implement and manage secure, scalable integrations between AI agents and external tools/APIs, leveraging open standards like the Model Context Protocol (MCP) to ensure interoperability
- Leverage AI-Powered Development: Utilize AI-powered development tools to accelerate the entire software development lifecycle, from writing infrastructure code and tests to troubleshooting operational issues in cloud environments
- Infrastructure as Code (IaC): Utilize cloud-native IaC services or cloud-agnostic tools like Terraform to define and manage the infrastructure required for GenAI workloads
- Model Monitoring and Observability: Implement comprehensive monitoring and logging solutions to track model and agent performance, resource utilization, and system health. For agentic systems, this includes tracing the agent's actions and logging the multi-step conversational flow
- Scalability and Performance Optimization: Design and implement scalable architectures for model serving and inference. Continuously optimize the performance and cost-effectiveness of our GenAI services
- Security and Compliance: Implement and enforce security best practices for our GenAI infrastructure and data. Ensure compliance with industry standards and regulations
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
- 3+ years of experience in a DevOps, SRE, or MLOps role with a focus on cloud infrastructure
- Proven experience with cloud services from major providers like AWS, Google Cloud, or Azure
- Strong experience building and managing CI/CD pipelines using tools like Jenkins, GitLab CI, or cloud-native services
- Proficiency in at least one scripting language (e.g., Python, Bash)
- Hands-on experience with Infrastructure as Code (IaC) tools such as AWS CDK, CloudFormation, or Terraform
- Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes)
- Fluent English communication skills at a B2+ level
Nice to have
- Master's degree or PhD in Computer Science, AI, Machine Learning, or a related field
- Experience with cloud-native GenAI services like AWS Bedrock, Azure AI Foundry, or Google Vertex AI
- Familiarity with the architecture and operational challenges of Large Language Models (LLMs)
- Experience designing or managing multi-agent systems or complex, orchestrated workflows
- Knowledge of monitoring and observability tools like Prometheus, Grafana, or Datadog
- Relevant cloud or DevOps certifications
- Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment
We offer/Benefits
With us you can:
- Work on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in Ukraine
- Receive the necessary equipment to perform your work tasks
- Change projects and technology stacks within EPAM
- Gain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)
- Relocation opportunities may be available for eligible candidates, depending on the role and openings at other EPAM locations
- Participate in volunteer, charity programs and communities (both technical and interest-based)
We focus on your professional growth:
- You can plan your individual career path together with your manager
- Receive regular feedback from colleagues
- Improve your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)
- Get the opportunity to undergo free training and certification in AWS, GCP, or Azure Clouds
- Use the internal E-learn training program (18,200+ specialized training and mentoring programs)
- Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resources
- Study at EPAM Solution Architecture School with the instructors who are practicing architects
- Develop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and more
- Participate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually)
What we offer:
- Vacation and sick leave (including a sick leave without a medical certificate)
- A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)
- Medical insurance for family members at corporate rates
- Company support during significant life events (childbirth or adoption, marriage, etc.)
- Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic training
- E-kids program - a free programming language training program for EPAMers' children