Senior Prinicipal Member Of Technical Staff Ic5

Year    IN, India

Job Description

Team Overview: The OCI Cluster Networking team is at the forefront of building ultra-high-performance networking solutions to support advanced AI/ML/HPC workloads. This is your chance to join the AI revolution by designing scalable systems that support thousands of GPUs without compromising on performance.


Role Summary: As a Senior Principal Member of Technical Staff, you'll be part of a dynamic team responsible for designing, developing, and optimizing a software and hardware stack capable of running distributed AI/ML/HPC workloads across thousands of GPUs. You will work with cutting-edge libraries like NCCL, leverage high-performance networking, and build innovative, scalable solutions for our customers.


Who You Are: We're looking for adaptable, self-motivated engineers who can learn quickly. You are a solid developer and distributed systems generalist who can work across the stack, from low-level systems to high-level distributed system interactions. You value simplicity, scalability, and thrive in a collaborative, agile environment.


Career Level: IC5


Career Level - IC4



Key Responsibilities:

• Design and develop scalable, high-performance software and hardware solutions for distributed AI/ML/HPC workloads.
• Performance tune networking libraries (e.g., NCCL) and integrate them with our distributed systems.
• Collaborate with cross-functional teams on new initiatives and deliver innovative solutions to complex networking challenges.

Basic Qualifications:

• 10+ years of software development experience in systems or application-level engineering
• 2+ years of experience with collective communication libraries (e.g., NCCL, RCCL, MPI) and GPU frameworks (e.g., CUDA, ROCm)
• 2+ years of experience with ML training frameworks (e.g., PyTorch, TensorFlow)
• Proficiency in at least two of the following programming languages: Go, Java, C/C++, Python
• Strong knowledge of data structures, algorithms, and operating systems
• Excellent communication skills, both verbal and written
• Bachelor's degree in Computer Science, Engineering, or a related field

Preferred Qualifications:

• Master's degree in Computer Science or a related field
• Experience with RDMA programming, including GPUDirect RDMA
• Experience with distributed workload managers (e.g., Kubernetes)
• Proficiency with Linux performance tools
• Familiarity with SDN, NFV, and cloud networking
• Experience with Infrastructure-as-a-Service platforms (e.g., AWS, Azure, GCP)

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3555147
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Contract
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    IN, India
  • Education
    Not mentioned
  • Experience
    Year