Senior Site Reliability Engineering (Microsoft Azure)
EPAM Systems
Senior Site Reliability Engineering (Microsoft Azure) Description
We are seeking an experienced Senior Site Reliability Engineer who will focus on maintaining a large Data Platform on Microsoft Azure.
The ideal candidate will possess strong analytical skills and should have a background in managing system reliability, availability, and scalability in a demanding production environment.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
#LI-DNI
Responsibilities
- Set up and configure monitoring tools such as Data Dog
- Monitor system availability, reliability, and capacity
- Track application usage and its impact on production systems
- Scale the environment to meet evolving demands
- Create run books and other system troubleshooting documentation
- Oversee release and deployment of applications/components
- Manage incidents and adhere to change management processes
- Administer and deploy CI/CD tools including Git, Jira, GitLab, and Jenkins
- Develop infrastructure scripting solutions using PowerShell or Python
- Present and communicate architecture visually to stakeholders
- Maintain and enhance the Microsoft Azure platform to ensure optimal performance
Requirements
- Minimum of 3 years of experience as a Site Reliability Engineer
- Proficiency in setting up and configuring monitoring tools
- Expertise in capacity planning, scaling, and system troubleshooting
- Background in Release and Deployment management
- Familiarity with Incident and Change Management processes
- Competency in administering and deploying CI/CD tools
- Skills in infrastructure scripting with PowerShell or Python
- In-depth knowledge of Microsoft Azure
- Ability to effectively communicate technical concepts and architecture visually
- Excellent interpersonal skills with high emotional intelligence
Nice to have
- Knowledge of Microsoft Azure Data Factory and Databricks
We offer
- Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept)
- Medicina Prepaga (It covers the collaborator and direct family group)
- Paternity Leave (Two additional days are added to what is established by law, total of 4 days)
- Discounts card
- English Training (English lessons, twice per week)
- Training Program (Access to multiple customized training plans according to the needs of each role within the company)
- Marriage bonus (The company doubles the allowance established by law that ANSES offers)
- Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company)
- External Agreements and Discounts
- Vacations: 14 calendar days a year
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM“s Privacy Notice and Policy.