: Responsibilities and Day-to-Day View - Provide 24x7 Tier 1 first responder support for customer & agent facing applications
- Manage escalated issues, incidents and outages, driving triage and prompt mitigation in conjunction with T2
- Provide prompt communication and status of escalated issues, incidents and outages to leadership, business partners and other key stakeholders
- Work with Release Management related to upcoming changes to production to proactively identify risks and mitigate them
- Work closely with Product Development & Tier 2 SRE teams to ensure Knowledge Transfer & adequate monitoring related to changes to the system well in advance of change getting operationalized
- Apply Site Reliability Engineering best practices & principles working with T2 in defining functional and technical knowledge-base of the application, creation of run books, developing observability of the application in terms of alerts, monitoring and dashboards that enable proactive incident and problem detection, triaging of the incidents and helping conduct blameless post-mortems (After Action Reviews)
- Develop engineering solutions leveraging automation and AI to increase Tier 1 operational efficiency
- Optimize alerts to reduce Tier 1 alerting noise & fatigue
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.