Senior Site Reliability Engineer

Year    MH, IN, India

Job Description

We're seeking a highly skilled Site Reliability Engineer (SRE) to join our growing software organization. As an SRE, you'll play a pivotal role in ensuring the reliability, scalability, and performance of our complex cloud-based applications. Your Kubernetes expertise and passion for automation will help us build and maintain resilient systems on Google Cloud Platform (GCP) or other cloud environments.





What you'll do

• Kubernetes: Design, deploy, and manage Kubernetes clusters in production, optimizing for performance and reliability.
• Cloud Infrastructure: Build and maintain scalable infrastructure on GCP (or other cloud providers), leveraging automation tools like Terraform.
• Performance Engineering:



+ Identify and analyze performance bottlenecks in applications and infrastructure.
+ Develop and implement performance optimizations.
• Observability: Implement comprehensive monitoring and logging solutions to proactively detect and resolve issues.
• Incident Response: Participate in on-call rotations, troubleshooting and resolving production incidents with a focus on minimizing downtime.
• Collaboration: Work closely with product development teams to promote reliability best practices and ensure smooth deployments.
• Manage system(s) uptime across cloud-native (AWS, GCP) and hybrid architectures.
• Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK).
• Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains.
• Build automated tooling to deploy service request to push a change into production
• Solve problems and triage complex distributed architecture service map.
• Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services.
• Lead availability blameless postmortem and own the call to action to remediate recurrences.
• On call for high severity application incidents and improving run books to improve MTTR
• Participate in a team of first responders in a 24/7, follow the sun operating model for incident and problem management.
• Effectively communicate to technical peers and team members in both written and verbal formats.

What experience you need

• BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required
• 5+ years of experience working with containers (Docker, Kubernetes).
• 5+ years of experience working with public cloud environments ( GCP preferred)
• Strong system administration skills, including automation and orchestration on Linux.
• Strong Kubernetes knowledge and hands-on production administration skills.
• Programming experience in one or more languages such as Python, Bash, Java, Go, Groovy or similar languages.
• Proficient in Identifying and analyzing performance bottlenecks in applications and infrastructure
• Proficiency with continuous integration and continuous delivery (CI/CD) using tools like Jenkins, Git.
• 2+ years of experience monitoring infrastructure and application performance.
• Solid understanding of application design principles and trade-offs.
• Knowledge of network infrastructure and security basics (DNS, subnets, firewalls, load balancers).



What could set you apart

• Experience with GCP/GKE, Composer.
• Certifications in Kubernetes (CKA, CKAD) or cloud certification.
• You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
• You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
• You have experience managing Infrastructure as code via tools such as Terraform or Cloud Formation
• You are passionate for automation with a desire to eliminate toil whenever possible
• You've built software or maintained systems in a highly secure, regulated or compliant industry
• You thrive in and have experience and passion for working within a DevOps culture and as part of a team

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3390085
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Contract
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    MH, IN, India
  • Education
    Not mentioned
  • Experience
    Year