New Balance's Direct-to-Consumer Engineering team is responsible for creating, maintaining and providing customer service for its branded eCommerce websites. We seek talented individuals that fit into our team-oriented atmosphere and are proud to have an environment that offers the comfort of a true work/life balance.
The Principal Site Reliability Engineer will play a lead role in the production environment by monitoring availability and taking a holistic view of system health. They will build software and systems to manage platform infrastructure and applications; improve reliability, quality, and time-to-market of our suite of software solutions; and measure and optimize system performance - all with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
Responsibilities
Ensure availability, latency, performance, and efficiency of our global ecomm sites
Experience driving change management and incident management
Promote best practices and innovative observability to guide product delivery teams in achieving operational excellence for new product deliveries.
Drive operational excellence and evangelize best practices in observability.
Develop unified observability dashboards and implement E2E observability requirements.
Design innovative observability solutions for internal and external stakeholders.
Contribute to observability instrumentation standards and create repeatable patterns for engineering teams.
Define and implement E2E observability requirements and lead teams to support E2E best practices.
Collaborate with cross-functional teams to achieve objectives and drive high reliability into systems.
Build proprietary tools to mitigate weaknesses in incident management or software delivery.
Implement SRE best practices to increase system reliability and performance.
Automate processes for improved collaborative response and prepare teams for incidents.
Maintain error budgets, meet SLOs, and support uptime and availability of critical platform components.
Automate technology stacks to improve operating costs while responding to traffic spikes.
Location: Pune - NBIT Office, Mandatory in person - Tu, We, Thu in a week
Work timings: First 3 months in EST to onboarding ramp up, move into IST work timings for 8 hours with a possible 1 hour overlap in the evening with US team in EST (10am to 7pm)
Required Skills and Experience:
Bachelor's Degree in Computer Science, Information Science, Engineering, or a related field.
10+ years of experience in code management, deployment processes, procedures, and tools in a DevOps or SRE role.
Experience with monitoring tools (preferred: Dynatrace, Splunk, Datadog, Grafana, and New Relic).
Proficiency in state-of-the-art observability trends, tools, products, and technologies.
Ability to identify organization-wide gaps in the SRE practice and implement solutions that contribute to organizational transformation.
Experience driving cross-organization adoption of new technologies or initiatives.
Ability to influence senior management in selecting the right strategy, processes, and structures to transform the organization into a modern SRE team.
Proactive in identifying performance bottlenecks, anomalous system behavior, and addressing root causes of service issues.
Passionate about technology with a strong sense of curiosity and a desire to improve processes, automate everything, and continuously learn.
Successful experience supporting a cloud production environment (strong preference for Azure).
Competency in one or more programming languages for automation (Python strongly preferred).
Knowledge of cloud deployment tools and methodologies (ideally Ansible, but Terraform, Azure DevOps, etc. are also considered).
Deep understanding of Kubernetes and Docker architecture and associated tools.
Experience with at least one configuration management solution (e.g., Chef, Ansible, AWS CodeDeploy).
Proficiency with repository and pipeline-related tools (e.g., GitLab, Jenkins, Bamboo, Travis, CircleCI).
Experience with implementing and using various application and infrastructure monitoring tools.
Strong troubleshooting skills.
* Ability to take ownership and deliver solutions autonomously.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.
Job Detail
Job Id
JD3653830
Industry
Not mentioned
Total Positions
1
Job Type:
Full Time
Salary:
Not mentioned
Employment Status
Permanent
Job Location
MH, IN, India
Education
Not mentioned
Experience
Year
Apply For This Job
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.