Lead Site Reliability Engineer

Year    Hyderabad, Telangana, India

Job Description


The role of the Lead xe2x80x93 Site Reliability Engineer is to be hands-on and provide mentorship to other team members on core SRE principles and tools. The lead SRE will participate in end to end operational aspects of Production environment. The individual concerned will be able to work on cloud systems, networks, databases and help drive incident lifecycle management. As a member of the SRE team, you will also be working closely with the Architects, DevOps, Product and development teams to ensure we get the most out of the software on AWS platform. This role requires a highly skilled technology professional with excellent communication skills, strategic mindset, strong analytical and troubleshooting skills on AWS Cloud Platform.Other responsibilities include working with internal business partners to gather requirements, prototyping, architecting, implementing/updating solutions, building and executing test plans, performing quality reviews, managing operations, and triaging and fixing operational issues. Site Reliability Engineers must be able to adjust to constant business change; common types of changes include new requirements, evolving goals and strategies, and emerging technologies.About the Role:

  • Be hands-on and provide mentorship to a growing SRE team on core SRE principles and tools.
  • Foster a sense of automation in issue resolution; everything possible should be automated, and only when automation canxe2x80x99t resolve an issue should people get involved in the resolution
  • Lead efforts for updating production with new versions/infrastructures as they are available
  • Lead capacity planning efforts in collaboration with Architects and DevOps engineers to determine changes to infrastructure that are needed to support new load and performance characteristics
  • Leads engagement with software developers, DevOps and other infrastructure engineers to integrate software development and delivery from inception to full operation, ensuring robust released software and systems.
  • Ensure highest level of uptime to meet the customer SLA by implementing system wide corrections to prevent reoccurrence of issues.
  • Mentor other SRE team members to further develop their soft and hard skills
  • Triage, troubleshoot and resolve issues using golden signals and go past golden signals
  • Go past golden signals with additional principles such as chaos engineering to detect failure points and lead Game days for testing resiliency of team when it comes to incident response and remediations and synthetic monitoring.
  • Lead SRE team members to create and maintain Recovery Procedures, RCAxe2x80x99s in collaboration with other engineering teams.
  • Ensure Incidents assigned to the team are being managed within agreed SLAs
  • Ensure alarms are documented in up to date Knowledge Base Articles.
  • Ensures Production infrastructure is up to date with server/security patches and certificates.
  • Continuous improvement of system and application monitoring and automation
  • Identify and automate manual workarounds and process improvements
  • Proactive monitoring of Monitor the availability, latency, scalability and efficiency of all services
  • Perform periodic on-call duty as part of the SRE team
About You:
  • Skilled with cloud operations/administration in Amazon AWS.
  • Tax/Accounting domain experience
  • Bachelors or Masterxe2x80x99s in Computer Science discipline.
  • 5+ yearsxe2x80x99 experience focussed on Site Reliability Engineering or related position in AWS Cloud Platform.
  • At least 2 AWS Certifications are must. (AWS Sysops Admin and Architects certifications preferred).
  • Experience working with SQL, Windows Servers, Load balancers, Linux
  • Deep experience with AWS, Docker and Kubernetes, CloudFormation, CloudWatch, CodeDeploy, DynamoDB, Lambda, SQS, Amazon FSX, Elastic Search and networking concepts are must.
  • Program at a high level in at least one language such as: Java, C#, Javascript, Python or Ruby.
  • Integration experience with PagerDuty, ServiceNow, Datadog, CloudWatch.
  • Good understanding of Site Reliability Engineering (SRE) philosophies, technologies, platforms and tools, SLO management, incident resolution, and automation;
  • Ability to explain technical concepts in clear, non-technical language
  • Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
  • Knowledge of security and compliance standards such as SOC/PCI is a plus
#LI-HS1Whatxe2x80x99s in it For You?Join us to inform the way forward with the latest AI solutions and address real-world challenges in legal, tax, compliance, and news. Backed by our commitment to continuous learning and market-leading benefits, youxe2x80x99ll be prepared to grow, lead, and thrive in an AI-enabled future. This includes:Industry-Leading Benefits: We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing.Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, and hybrid model, empowering employees to achieve a better work-life balance.Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrowxe2x80x99s challenges and deliver real-world solutions. Our skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future.Culture: Globally recognized and award-winning reputation for inclusion, innovation, and customer-focus. Our eleven business resource groups nurture our culture of belonging across the diverse backgrounds and experiences represented across our global footprint.Hybrid Work Model: Wexe2x80x99ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected.Social Impact: Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives.Do you want to be part of a team helping re-invent the way knowledge professionals work? How about a team that works every day to create a more transparent, just and inclusive future? At Thomson Reuters, wexe2x80x99ve been doing just that for almost 160 years. Our industry-leading products and services include highly specialized information-enabled software and tools for legal, tax, accounting and compliance professionals combined with the worldxe2x80x99s most global news services xe2x80x93 Reuters. We help these professionals do their jobs better, creating more time for them to focus on the things that matter most: advising, advocating, negotiating, governing and informing.We are powered by the talents of 26,000 employees across more than 70 countries, where everyone has a chance to contribute and grow professionally in flexible work environments that celebrate diversity and inclusion. At a time when objectivity, accuracy, fairness and transparency are under attack, we consider it our duty to pursue them. Sound exciting? Join us and help shape the industries that move society forward.AccessibilityAs a global business, we rely on diversity of culture and thought to deliver on our goals. To ensure we can do that, we seek talented, qualified employees in all our operations around the world regardless of race, color, sex/gender, including pregnancy, gender identity and expression, national origin, religion, sexual orientation, disability, age, marital status, citizen status, veteran status, or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity/Affirmative Action Employer providing a drug-free workplace.We also make reasonable accommodations for qualified individuals with disabilities and for sincerely held religious beliefs in accordance with applicable law.Protect yourself from fraudulent job postings to know more.More information about Thomson Reuters can be found on .

Thomson Reuters

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD3638138
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Hyderabad, Telangana, India
  • Education
    Not mentioned
  • Experience
    Year