ROLES & RESPONSIBILITIES We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. The ideal candidate will have extensive experience in ETL, Data Modelling, and Data Architecture. Proficiency in ETL optimization, designing, coding, and tuning big data processes using Scala is essential, along with hands-on experience in stream data processing using Spark, Kafka, and Spark Structured Streaming. Additionally, the candidate should have extensive experience in building data platforms using a variety of technologies, including Scala, SQL/PLSQL, PostgreSQL, SQL Server, Teradata, Spark, Spark Structured Streaming, Kafka, Parquet/ORC, Data Modelling (Relational Dimensional E-R Modelling), ETL, RDS (PostgreSQL, MySQL), Splunk, DataDog, Airflow, Git, CI/CD Jenkins, JIRA, Confluence, IntelliJ Idea, Agile - Scrum/Kanban, On Call & Operations, Code Review, RCP Framework, Query book, Build, Deployment CI/CD & Release Process, Backstage, PagerDuty, and Spinnaker. Key Responsibilities: Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latency.
Develop and maintain batch and stream processing data solutions using Apache Spark, Kafka, and Spark Structured Streaming.
Work on orchestration using Airflow to automate and manage data workflows.
Utilize project management tools like JIRA and Confluence to track progress and collaborate with the team.
Develop data processing workflows utilizing Spark, SQL/PLSQL, and Scala to transform and cleanse raw data into a usable format.
Implement data storage solutions leveraging Parquet/ORC formats on platforms such as PostgreSQL, SQL Server, Teradata, and RDS (PostgreSQL, MySQL).
Optimize data storage and retrieval performance through efficient data modelling techniques, including Relational, Dimensional, and E-R modelling.
Maintain data integrity and quality by implementing robust validation and error handling mechanisms within ETL processes.
Automate deployment processes using CI/CD tools like Jenkins and Spinnaker to ensure reliable and consistent releases.
Monitor and troubleshoot data pipelines using monitoring tools like DataDog and Splunk to identify performance bottlenecks and ensure system reliability.
Participate in Agile development methodologies such as Scrum/Kanban, including sprint planning, daily stand-ups, and retrospective meetings.
Conduct code reviews to ensure adherence to coding standards, best practices, and scalability considerations.
Manage and maintain documentation using tools like Confluence to ensure clear and up-to-date documentation of data pipelines, schemas, and processes.
Provide on-call support for production data pipelines, responding to incidents and resolving issues in a timely manner.
Collaborate with cross-functional teams including developers, data scientists, and operations teams to address complex data engineering challenges.
Stay updated on emerging technologies and industry trends to continuously improve data engineering processes and tools.
Contribute to the development of reusable components and frameworks to streamline data engineering tasks across projects.
Utilize version control systems like Git to manage codebase and collaborate effectively with team members.
Leverage IDEs like IntelliJ IDEA for efficient development and debugging of data engineering code.
Adhere to security best practices in handling sensitive data and implementing access controls within the data lake environment.
Good-to-Know Skills:
Programming Languages: Python, Bash/Unix/Linux
Big Data Technologies: Hive, Avro, Apache Iceberg, Delta Format
Containerization and Orchestration: Docker, Kubernetes
CI/CD Tools: Github Copilot
Additional Skills: Maven, CLI/SDK
Nice-to-Have Skills:
Networking: Subnets, Routes
Big Data Technologies: Flink
Key Responsibilities: Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latency.
Develop and maintain batch and stream processing data solutions using Apache Spark, Kafka, and Spark Structured Streaming.
Work on orchestration using Airflow to automate and manage data workflows.
Utilize project management tools like JIRA and Confluence to track progress and collaborate with the team.
Develop data processing workflows utilizing Spark, SQL/PLSQL, and Scala to transform and cleanse raw data into a usable format.
Implement data storage solutions leveraging Parquet/ORC formats on platforms such as PostgreSQL, SQL Server, Teradata, and RDS (PostgreSQL, MySQL).
Optimize data storage and retrieval performance through efficient data modelling techniques, including Relational, Dimensional, and E-R modelling.
Maintain data integrity and quality by implementing robust validation and error handling mechanisms within ETL processes.
Automate deployment processes using CI/CD tools like Jenkins and Spinnaker to ensure reliable and consistent releases.
Monitor and troubleshoot data pipelines using monitoring tools like DataDog and Splunk to identify performance bottlenecks and ensure system reliability.
Participate in Agile development methodologies such as Scrum/Kanban, including sprint planning, daily stand-ups, and retrospective meetings.
Conduct code reviews to ensure adherence to coding standards, best practices, and scalability considerations.
Manage and maintain documentation using tools like Confluence to ensure clear and up-to-date documentation of data pipelines, schemas, and processes.
Provide on-call support for production data pipelines, responding to incidents and resolving issues in a timely manner.
Collaborate with cross-functional teams including developers, data scientists, and operations teams to address complex data engineering challenges.
Stay updated on emerging technologies and industry trends to continuously improve data engineering processes and tools.
Contribute to the development of reusable components and frameworks to streamline data engineering tasks across projects.
Utilize version control systems like Git to manage codebase and collaborate effectively with team members.
Leverage IDEs like IntelliJ IDEA for efficient development and debugging of data engineering code.
Adhere to security best practices in handling sensitive data and implementing access controls within the data lake environment.
Good-to-Know Skills:
Programming Languages: Python, Bash/Unix/Linux
Big Data Technologies: Hive, Avro, Apache Iceberg, Delta Format
ABOUT THE COMPANY Infogain is a human-centered digital platform and software engineering company based out of Silicon Valley. We engineer business outcomes for Fortune 500 companies and digital natives in the technology, healthcare, insurance, travel, telecom, and retail & CPG industries using technologies such as cloud, microservices, automation, IoT, and artificial intelligence. We accelerate experience-led transformation in the delivery of digital platforms. Infogain is also a Microsoft (NASDAQ: MSFT) Gold Partner and Azure Expert Managed Services Provider (MSP). Infogain, an Apax Funds portfolio company, has offices in California, Washington, Texas, the UK, the UAE, and Singapore, with delivery centers in Seattle, Houston, Austin, Krak\xc3\xb3w, Noida, Gurgaon, Mumbai, Pune, and Bengaluru.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.