Sre Ii Observability & Reliability

Year    KA, IN, India

Job Description

Job Summary



We are seeking a Senior Software Engineer to join our Site Reliability Engineering team, with a focus on Observability and Reliability. As a key member of our SRE team, you will play a critical role in ensuring the performance, stability, and availability of our applications and systems with a focused approach in Application Performance Management, Observability & Reliability of the platform.
The Senior Software Engineer will be responsible for the design, implementation, and maintenance of our observability and reliability infrastructure, with a primary focus on the ELK stack (Elasticsearch, Logstash, and Kibana). The role involves configuring, fine-tuning, and automating alerts, integrating Elastic solutions with other tools and applications, generating reports, and optimizing the observability and monitoring systems.

Key Duties & Responsibilities

1
Collaborate with cross-functional teams to define and implement observability and reliability standards and best practices.



2
Design, deploy, and maintain the ELK stack for log aggregation, monitoring, and analysis.



3
Develop and maintain alerts and monitoring systems, ensuring early detection of issues and rapid incident response.



4
Create, customize, and maintain dashboards in Kibana for different stakeholders.



5
Collaborate with software development teams to identify performance bottlenecks and recommend solutions.



6
Automate manual tasks and workflows to streamline observability and reliability processes.



7
Conduct regular system and application performance analysis and optimization, effective automation & tooling, capacity planning and optimization, security practices and compliance adherence, documentation and knowledge sharing, Disaster Recovery and backup.



8
Generate and deliver detailed reports on system performance and reliability metrics.



9
Stay up to date with industry trends and best practices in observability and reliability engineering.









Qualifications/Skills/Abilities
Minimum Requirements



Formal Education
Bachelor's degree in computer science, Information Technology, or a related field (or equivalent experience).



Experience (type & duration)
5+ years of experience in Site Reliability Engineering, Obervability & reliability, DevOps



Skills
• Proficiency in configuring and maintaining the ELK stack (Elasticsearch, Logstash, Kibana) is mandatory.
• Strong scripting and automation skills, with expertise in Python, Bash, or similar languages.
• Experience in Data structures using Elasticsearch Indices.
• Experience in writing Data Ingestion Pipelines using Logstash.
• Experience with infrastructure as code (IaC) and configuration management tools (e.g., Ansible, Terraform).
• Handson and experience with cloud platforms ( AWS preferred) and containerization technologies (e.g., Docker, Kubernetes).
• Good to have Telecom domain expertise but not mandatory
• Strong problem-solving skills and the ability to troubleshoot complex issues in a production environment.
• Excellent communication and collaboration skills.



Accreditation/certifications/licenses
Relevant certifications (e.g., Elastic Certified Engineer) are a plus.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD3577850
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    KA, IN, India
  • Education
    Not mentioned
  • Experience
    Year