Performance Testing & 3 others
EPAM Systems
New York, NY, USA · Remote
Posted on May 4, 2026
Responsibilities
- Design and manage chaos engineering tests using Azure Chaos Studio, analyze platform architecture to identify failure domains and strengthen system resilience
- Maintain and enhance existing LitmusChaos test suites across Kubernetes environments, ensure consistent coverage and accuracy across all platforms
- Build comprehensive testing suites by integration of LitmusSDK, Azure Management SDK, Chaos SDK and Kubernetes SDK to automate and scale chaos experiments
- Lead HA/DR testing initiatives across all environments, operate independently to validate high availability and disaster recovery readiness
- Establish and standardize chaos engineering frameworks across AKS and EKS platforms, enable scalable and repeatable resilience practices organization-wide
- Integrate AI-driven capabilities into the chaos engineering pipeline to enable touchless experiment creation, automated execution and continuous validation
Requirements
- Hands-on experience with Kubernetes orchestration platforms including AKS or EKS, with deep understanding of container-based infrastructure and cloud-native architecture
- Proficiency in chaos engineering tools including LitmusChaos and Azure Chaos Studio, with demonstrated experience to build and maintain structured test suites
- Experience with Istio service mesh for traffic management, observability and resilience configuration within microservices environments
- Practical experience with LitmusSDK, Azure Management SDK, Chaos SDK and Kubernetes SDK
- Proven ability to conduct HA/DR testing and work autonomously with minimal oversight across complex multi-environment cloud platforms
We offer/Benefits
- Health benefits: High Deductible Health Plan with an attached HSA (Health Savings Account) which includes Pharmacy coverage, after 60 days from start of employment
- Condition Management resources
- Family Planning resources
- Dental Plan
- Vision Plan