Job Summary:- Design, develop, and implement scalable data pipelines and streaming use cases using PySpark and Spark on a distributed computing platform.
- Possess strong programming skills in Spark streaming.
- Have familiarity with cloud platforms like GCP.
- Gain experience in big data technologies such as Hadoop, Hive, and HDFS.
- Perform ETL operations from various data sources and have experience with data warehousing concepts.
- Optimize PySpark jobs for performance and efficiency.
- Develop and maintain unit tests for data pipelines and streaming use cases.
- Troubleshoot and debug Spark applications.
- Collaborate with data scientists and analysts to understand data requirements.
- Document data pipelines and data models clearly and concisely.
- Participate in code reviews and knowledge sharing sessions.
- Stay updated with the latest advancements in PySpark and related technologies.
- Able to provide production support for the developed use cases.
- Have 3+ years of experience as a Data Engineer.
- Proven experience using PySpark for data processing and streaming use cases.
- Strong understanding of data warehousing, data modeling, and ETL processes.
- Familiarity with big data concepts and distributed computing frameworks such as Hadoop, Spark, Kafka.
- Experience with SQL and a relational database management system like MySQL, PostgreSQL.
- Experience with cloud platforms like AWS, Azure, GCP is a plus.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Ability to work independently and as part of a team.
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.