Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and translate them into efficient and scalable data solutions.
Design and develop end-to-end data pipelines encompassing data ingestion, transformation, storage, and delivery.
Utilize strong programming skills in Python to write clean, maintainable, and optimized code for data processing tasks.
Leverage expertise in Apache Spark and PySpark to distribute data processing across clusters and handle large datasets efficiently.
Possess a deep understanding of SQL (Oracle/SQL Server) and NoSQL databases (Big Data) to manage and query data effectively at scale.
Design and implement data pipelines on cloud platforms like AWS, Azure, or Snowflake for scalability and cost-effectiveness, leveraging services like S3, Blob Storage, or Data Lake Storage.
Orchestrate data pipelines using tools like Airflow or similar solutions to ensure smooth data flow, automation, and reliable scheduling.
Build and maintain integrations with RESTful and SOAP web services to facilitate seamless data exchange between systems.
Monitor and troubleshoot data pipelines to ensure data quality, consistency, and timely delivery.
Champion best practices for data engineering and maintain a high standard of code documentation.
Stay up-to-date on the latest advancements in data engineering tools and technologies, including Big Data frameworks and cloud platforms.
Must HaveJob Responsibilities
8+ & 14+ years of experience in data engineering with a proven track record of designing and implementing data pipelines.
Strong programming skills in Python with proficiency in libraries like Pandas, Spark, and PySpark for data manipulation and analysis.
In-depth knowledge of SQL (Oracle/SQL Server) and NoSQL databases (Big Data) for data storage and retrieval at scale.
Experience with data pipeline orchestration tools like Airflow or similar solutions.
Experience designing and implementing end-to-end data solutions, from data ingestion to consumption.
Familiarity with cloud platforms (AWS, Azure, Snowflake) for data storage, processing, and services.
Understanding of web service protocols (REST, SOAP) and experience building data integrations.
Excellent problem-solving and analytical skills with a passion for building efficient data infrastructures.
Effective communication and collaboration skills to work effectively with cross-functional teams.
Minimum Qualification
Bachelors / Masters of Engineering in Computer Science, Statistics, Mathematics, or a related technical field (PhD is a plus).
8+ & 14+ years of Software industry experience.
Principal Working Relationship
Reports to the AI / Data Science Manager.
Collaborates with data scientists, business analysts, data analysts, and other data engineers.
Nice to Haves
Experience with real-time data processing frameworks (Apache Kafka, Apache Flink).
Experience with data governance and data security best practices.
Experience with data visualization tools (Tableau, Power BI) for data exploration.
Experience with DevOps practices for continuous integration and deployment (CI/CD) of data pipelines.