Site Reliability Engineer
EPAM Systems
Site Reliability Engineer Description
We are looking for a dedicated Site Reliability Engineer to enhance our infrastructure reliability and automation processes.
The ideal candidate is proficient in resolving platform issues, skilled in development and deployment automation tasks, and capable of in-depth troubleshooting. This role involves contributing to sprint planning, story grooming, and engaging in technical discussions to improve our application and deployment methods.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
#LI-DNI
Technologies
- Node.js/NestJS
- React Native
- Python/Django
- PostgreSQL, Redis
- CircleCI, Spinnaker, Expo
- AWS
- Datadog
Responsibilities
- Investigate and troubleshoot issues across the platform
- Analyze, develop and enhance automation deployments independently
- Write scripts to automate tasks
- Actively participate in sprint meetings and contribute to technical discussions
- Monitor an APM in production, like Datadog, and communicate important insights to the team
- Manage application log collection and analysis
- Handle application and instance alerts related to site reliability
- Discuss infrastructure architecture and contribute during technical discussions
- Maintain platform required applications and libraries
- Oversee application code deployment servers and methods
- Mentor and assist other engineers
- Perform code reviews
Requirements
- 2+ years of experience running production applications workload in AWS Cloud
- Understanding of public Cloud networks, VPC peering
- Skills in Cloud computing including EC2, SNS/SQS, and RDS
- Proficiency in containers and orchestration technologies such as Docker, Kubernetes, EKS
- Background in administrating technologies at scale including Elasticsearch, PostgreSQL, Redis
- Proficiency in provisioning and configuration management using Terraform, Ansible
- Competency in Linux or Windows server administration
- Knowledge of scripting languages including Python, Groovy, PowerShell, or Ruby
- Flexibility to integrate monitoring, logging, and alerting seamlessly into development process
- Capability to debug complicated issues in collaboration with peers
- Ability to adapt quickly to changing requirements and priorities
- Fluent English communication skills at a B2+ level
Nice to have
- Expertise in using monitoring tools like Datadog
- Experience in upholding compliance with HIPAA and other standards
We offer
- Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept)
- Medicina Prepaga (It covers the collaborator and direct family group)
- Paternity Leave (Two additional days are added to what is established by law, total of 4 days)
- Discounts card
- English Training (English lessons, twice per week)
- Training Program (Access to multiple customized training plans according to the needs of each role within the company)
- Marriage bonus (The company doubles the allowance established by law that ANSES offers)
- Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company)
- External Agreements and Discounts
- Vacations: 14 calendar days a year
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAMĀ“s Privacy Notice and Policy.