Lead DevOps Engineer
EPAM Systems
Lead DevOps Engineer Description
Join our team as a Lead DevOps Engineer and play a vital role in incident and request management using tools like Dynatrace, Grafana, and Splunk. Take charge of monitoring setup, tool administration, and resolving medium complexity tickets. If optimizing operational processes excites you, we encourage you to take this opportunity.
#LI-DNI#EasyApply
Responsibilities
- Create and maintain documentation outlining best practices for logging and monitoring
- Perform routine audits to ensure logging and monitoring practices align with compliance standards
- Take part in cross-functional meetings focused on logging and monitoring strategies
- Handle monitoring, alerting, operability, and observability tasks using Dynatrace, Splunk, and Grafana
- Triage tickets to assess urgency and update details accordingly
- Analyze and escalate tickets beyond Level 2 troubleshooting after reviewing documentation
- Provide clear and actionable notes for tickets that require escalation
- Use and develop documentation that addresses standard incidents and service requests
- Define standard completion times for tickets and set service-level objectives
- Review and present metrics on escalated tickets to improve support processes
- Address incidents and service requests for monitoring setup through JIRA
- Remain available for monitoring duties and escalations during off-hours and weekends
- Take responsibility for pager duty during emergency situations after work hours
- Regularly analyze and present metrics to refine and advance support processes
Requirements
- Bachelor’s degree in computer science or an equivalent field
- 5+ years of experience in DevOps or within Site Reliability Engineering teams
- Knowledge of observability, including monitoring, logging, and tracing
- Expertise in tools such as Dynatrace, Splunk, and Grafana
- Familiarity with Azure logging and monitoring tools, including Log Analytics and Azure Monitor
- Background in managing high-availability and fault-tolerant software in production environments
- Proficiency in English at a B2+ level
We offer
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Relocation bonus: transportation, 2 weeks of accommodation for you and your family and more
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.