FinTech Australia
FinTech Australia
About
About Us
What is Fintech
Contact Us
Policy
Policy
Policy Working Groups
Events
Events Calendar
The Finnies
Intersekt Festival
Members
Corporate Partners
Fintech Careers
Jobs Board
eLearning
Resources
Ecosystem Map
Regulatory Map
Investor Map
EY Fintech Census
Services Directory
News
News
Podcast
Member Portal
FinTech Australia
FinTech Australia
About
About Us
What is Fintech
Contact Us
Policy
Policy
Policy Working Groups
Events
Events Calendar
The Finnies
Intersekt Festival
Members
Corporate Partners
Fintech Careers
Jobs Board
eLearning
Resources
Ecosystem Map
Regulatory Map
Investor Map
EY Fintech Census
Services Directory
News
News
Podcast
Member Portal
Folder: About
Folder: Policy
Folder: Events
Members
Corporate Partners
Folder: Fintech Careers
Folder: Resources
Folder: News
Member Portal
Back
About Us
What is Fintech
Contact Us
Back
Policy
Policy Working Groups
Back
Events Calendar
The Finnies
Intersekt Festival
Back
Jobs Board
eLearning
Back
Ecosystem Map
Regulatory Map
Investor Map
EY Fintech Census
Services Directory
Back
News
Podcast
hero

Companies you'll love to work for

0
companies
0
Jobs
For Employers
Add your job
listings
Contact Us
For Employers
Find Candidates
Directly
Talent Pool
For Candidates
Help Recruiters
Find You
Talent Network
Search 
jobs
Explore 
companies
Join talent network
Talent
My job alerts

Lead Site Reliability Engineer

EPAM Systems

EPAM Systems

This job is no longer accepting applications

See open jobs at EPAM Systems.See open jobs similar to "Lead Site Reliability Engineer" FinTech Australia.
Software Engineering
Remote
Posted on Apr 11, 2025
Apply Apply

Lead Site Reliability Engineer Description

We are looking for an experiencedbto join our team and play a key role in ensuring the stability, scalability, and performance of our systems. This position involves improving infrastructure, enhancing automation processes, and maintaining optimal functionality across distributed systems and cloud environments. You will collaborate with diverse teams, drive technical initiatives, and provide mentorship to foster a culture of innovation and operational excellence.


#LI-DNI

Responsibilities

  • Enhance the performance and reliability of Linux-based systems used for production services and distributed environments
  • Implement advanced monitoring solutions with tools such as Splunk, Grafana, and Prometheus to strengthen system observability
  • Resolve complex Kubernetes-related issues and establish guidelines and best practices for the team
  • Create and maintain automation workflows using Bash and Python to optimize operational efficiency
  • Develop and manage container orchestration platforms like Kubernetes or EKS while sharing knowledge with the team
  • Design robust cloud architecture with AWS to ensure reliability and scalability of infrastructure
  • Champion automation efforts to streamline processes and reduce manual workloads
  • Provide leadership by promoting collaboration, accountability, and effective communication within the team
  • Support continuous learning and development within the team to encourage growth and innovation
  • Offer mentorship and technical expertise to team members to enhance operational practices and communication
  • Plan and execute disaster recovery strategies and capacity management to maintain system resilience
  • Automate deployment processes using tools like Terraform or CloudFormation to improve team productivity
  • Incorporate open-source technologies such as Cassandra, Kafka, Solr, Postgres, and Redis to strengthen SRE practices

Requirements

  • Bachelor’s degree in Computer Science, a related technical field, or equivalent hands-on experience
  • Five or more years of experience as a Site Reliability Engineer
  • At least one year of experience guiding and managing technical teams
  • Proficiency in Bash for scripting and automation tasks to enhance workflows
  • Experience with Grafana for monitoring and system performance visualization
  • Advanced knowledge of Linux systems and their optimization for production environments
  • Familiarity with Microsoft Internet Information Services (IIS) for managing web server frameworks
  • Proficiency in Prometheus for distributed system monitoring and alerting
  • Experience with Python for developing automation solutions and improving operational processes
  • Fluency in English, both written and spoken, at a B2 level or higher

Nice to have

  • Experience with Amazon Web Services (AWS) for designing scalable cloud solutions
  • Knowledge of cloud platforms and their integration into infrastructure design
  • Expertise in Kubernetes for orchestrating and managing containerized applications
  • Familiarity with Splunk for telemetry and log management
  • Hands-on knowledge of Terraform and Terraform Cloud for automating infrastructure deployment
  • Strong skills in troubleshooting and resolving complex system issues

We offer

  • Career plan and real growth opportunities
  • Unlimited access to LinkedIn learning solutions
  • International Mobility Plan within 25 countries
  • Constant training, mentoring, online corporate courses, eLearning and more
  • English classes with a certified teacher
  • Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
  • Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
  • Flexible work schedule and dress code
  • Collaborate in a multicultural environment and share best practices from around the globe
  • Hired directly by EPAM & 100% under payroll
  • Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
  • Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
  • 13 % employee savings fund, capped to the law limit
  • Grocery coupons
  • 30 days December bonus
  • Employee Stock Purchase Plan
  • 12 vacations days plus 4 floating days
  • Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
  • Monthly non-taxable amount for the electricity and internet bills

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.

Apply Apply

This job is no longer accepting applications

See open jobs at EPAM Systems.See open jobs similar to "Lead Site Reliability Engineer" FinTech Australia.
See more open positions at EPAM Systems
Privacy policyCookie policy
FINTECH AUSTRALIA

FinTech Australia exists to help our country become one of the world’s top markets for fintech innovation and investment.

IMPORTANT LINKS
  • Privacy Policy
  • Member Login
  • Join Fintech Australia
  • Contact Us
© 2023 FinTech Australia