Strong knowledge of Linux/Unix systems and command line tools.
Proficiency in scripting languages such as Python, Shell, or Perl.
Experience with configuration management tools like Ansible, Puppet, or Chef.
Familiarity with cloud platforms like AWS, Azure, or Google Cloud.
Understanding of networking principles and protocols (TCP/IP, HTTP, DNS, etc.).
Knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools.
Expertise in monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk. (Optional - But Good to Know)
Experience with Citrix technologies such as XenApp, XenDesktop, and NetScaler
Support the administration and engineering of the Citrix environment.
Work with Citrix Provisioning Server, SQL Database, and Citrix License Server.
Experienced knowledge of virtualization technologies such as VMware or Hyper-V
Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex technical issues.
Excellent communication and collaboration skills to work effectively with cross-functional teams.
Strong attention to detail and ability to work in a fast-paced, dynamic environment.
Terraform basic syntax and GitLab CI/CD configuration, pipelines, jobs
Cloud resources provisioning and configuration through CLI/API
Understanding of how to do basic queries in logs tools for general questions
Operating system (Linux) configuration, package management, startup and troubleshooting
Block and object storage configuration
Networking VPCs, proxies and CDNs
Secondary skills required for the role.
Bachelor's degree in computer science, engineering, or a related field.
Proven experience as a Site Reliability Engineer or a similar role.
Solid understanding of software development methodologies and DevOps principles.
Experience with agile and iterative development processes.
Certification in relevant technologies or frameworks is a plus (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).
Familiarity with continuous integration/continuous deployment (CI/CD) pipelines.
Experience with source control systems such as Git or SVN.
Knowledge of security best practices and experience implementing security measures in a production environment.
Ability to work independently and handle multiple projects and priorities simultaneously.
Strong analytical and problem-solving skills, with a focus on continuous improvement and automation.
Role & Responsibilities of the Profile
Design and implement highly available and scalable systems, ensuring the reliability and performance of the company's website or application.
Collaborate with cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) for critical systems.
Monitor systems and applications, proactively identifying and resolving any performance bottlenecks or availability issues.
Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance.
Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents.
Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
Create and maintain documentation for system architecture, configuration, and troubleshooting procedures.
Perform capacity planning and resource allocation to ensure optimal system performance and scalability.
Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability and performance standards.
Stay up to date with industry best practices, new technologies, and emerging trends in site reliability engineering.
Objectives of this role
Run the production environment by monitoring availability and taking a holistic view of system health
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Provide primary operational support and engineering for multiple large-scale distributed software applications