Sre Manager Distributed Systems

Year    India, India

Job Description


Company OverviewArcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the world\'s most sophisticated financial institutions. We constantly innovate our platform and capabilities to meet tomorrow\'s challenges, anticipate the risks our clients encounter, and design advanced solutions to help our clients achieve transformational business outcomes.Financial technology is a high-growth industry as change and innovation continue to disrupt the status-quo and prompt major transformation. Arcesium is at a particularly interesting time in our own growth as we look to leverage our successfully established market position and expand operations in pursuit of strategic new business opportunities. We value intellectual curiosity, proactive ownership, and collaboration with colleagues, and we empower you to meaningfully contribute from day one and accelerate your professional development.We are looking for an experienced Engineering Manager to lead our Site Reliability Engineering (SRE) team. The ideal candidate will have a strong background in SRE principles and practices, as well as experience managing and mentoring engineers. The SRE Manager will be responsible for the overall success of the SRE team, including ensuring that our systems are reliable, scalable, and secure. The team is responsible for monitoring the stability and availability of mission critical production systems, managing incidents for quicker resolution, and establishing BAU. Team also building tools/infra which to be used by all development teams to assist in monitoring and troubleshooting.As a Site Reliability Engineering Manager at Arcesium, you are expected to:

  • Manage a team of SRE engineers / SRE Leads
  • Own end to end availability and performance of mission critical services and build automation to prevent problem recurrence
  • Work closely with engineering managers and development teams to ensure that platforms are designed with scale and operability in mind
  • Help manage the team\'s infrastructure e.g. containers infrastructure using Docker & Kubernetes cluster, Kakfa clusters, etc.
  • Manage the team\'s AWS accounts and other infra provisioning.
  • Day to day support of dashboard, including responding to outages and triaging cases escalated by clients/internal teams
  • Manage on-call rotations to provide 24 hours coverage
  • Ensure systems are always DR ready
  • Manage team projects with Agile Methodology (Scrum/Kanban).
  • Review various processes from time to time and drive continual improvement.
  • Mentor SREs with incident case-studies and technical workshops
  • Mentor and coach engineers to be curious and effective at discovering and solving technical challenges
What you\'ll need:
  • 10+ years of experience in DevOps/Site reliability/Automation with 4+ years of People/Team Management exposure
  • Experienced with variety of tools that help manage, understand, and debug large, complex distributed systems
  • Good knowledge of Unix system, web technologies, databases and public cloud systems like AWS, Networking, Systems
  • Reliability: An exposure to Chaos Engineering and various reliability practices including disaster recovery will be good to have
  • IT Service Management: Incident Management, Problem Management, Change Management
  • Languages: Any of Python/Java/Node.js/Ruby
  • Linux: System Administration + Shell Scripting
  • Cloud Computing: Amazon Web Services
  • Microservices & Containerization -- Docker, Kubernetes
  • Version Control -- Git, Github, Gitlab, etc.
  • Configuration Management -- Ansible/Chef/Puppet
  • IT Service Management: Incident Management, Problem Management, Change Management
  • Agile: Scrum, Kanban
Arcesium and its affiliates do not discriminate in employment matters on the basis of race, color, religion, gender, gender identity, pregnancy, national origin, age, military service eligibility, veteran status, sexual orientation, marital status, disability, or any other category protected by law. Note that for us, this is more than just a legal boilerplate. We are genuinely committed to these principles, which form an important part of our corporate culture, and are eager to hear from extraordinarily well qualified individuals having a wide range of backgrounds and personal characteristics.

Arcesium

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD3443200
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    India, India
  • Education
    Not mentioned
  • Experience
    Year