Senior Site Reliability Engineer
EPAM Systems
Senior Site Reliability Engineer Description
We are in search of a committed Senior Site Reliability Engineer to improve the dependability and automation processes of our infrastructure.
The perfect candidate excels at resolving issues on platforms, adept in automating development and deployment tasks, and has strong troubleshooting skills. Responsibilities include participation in sprint planning, story grooming, and engaging in technical discussions aimed at enhancing our application and deployment methods.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
#LI-DNI
Technologies
- Node.js/NestJS
- React Native
- Python/Django
- PostgreSQL, Redis
- CircleCI, Spinnaker, Expo
- AWS
- Datadog
Responsibilities
- Investigate and address issues across our platform
- Develop, analyze, and boost automation of deployments independently
- Craft scripts that automate various tasks
- Participate actively in sprint meetings and partake in technical conversations
- Monitor a production-level APM, like Datadog, and relay critical insights to the team
- Oversee the collection and analysis of application logs
- Address application and instances alerts regarding site reliability
- Engage in infrastructure architecture discussions during technical meetings
- Maintain essential applications and libraries for the platform
- Manage servers and methods for application code deployment
- Guide and support other engineers
- Conduct code reviews
Requirements
- 3+ years of managing production applications workload in AWS Cloud
- Understanding of public Cloud networks and VPC peering
- Skills in cloud computing, including EC2, SNS/SQS, and RDS
- Proficiency using container and orchestration technologies such as Docker, Kubernetes, EKS
- Background in managing technologies at scale like Elasticsearch, PostgreSQL, Redis
- Proficiency in provisioning and managing configurations using Terraform, Ansible
- Competency in administration of Linux or Windows server
- Knowledge of various scripting languages including Python, Groovy, PowerShell, or Ruby
- Flexibility in integrating monitoring, logging, and alerting into development processes
- Ability to troubleshoot complex issues in collaboration with peers
- Ability to adapt swiftly to changing requirements and priorities
- Fluent English communication skills at a B2+ level
Nice to have
- Experience with monitoring tools like Datadog
- Background in maintaining compliance with HIPAA and other regulatory standards
We offer
- Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept)
- Medicina Prepaga (It covers the collaborator and direct family group)
- Paternity Leave (Two additional days are added to what is established by law, total of 4 days)
- Discounts card
- English Training (English lessons, twice per week)
- Training Program (Access to multiple customized training plans according to the needs of each role within the company)
- Marriage bonus (The company doubles the allowance established by law that ANSES offers)
- Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company)
- External Agreements and Discounts
- Vacations: 14 calendar days a year
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAMĀ“s Privacy Notice and Policy.