As a Data Engineer - Data Governance & Architecture team you will work hands-on to deliver and maintain the pipelines required by the client specific business functions to derive value from their data. For this, you will bring data from a varied landscape of source systems into our cloud-based analytics stack and implement necessary cleaning and pre-processing steps in close collaboration with client business customers. Furthermore, you will work closely together with Data Governance and Quality & Compliance teams to ensure that all data assets are governed according to the FAIR principles. To keep the engineering team scalable, you and your peers will create reusable components, libraries, and infrastructure that will be used to accelerate the pace with which future use-cases can be delivered.
You will be part of a team dedicated to delivering state-of-the-art solutions for enabling data analytics use cases across the customers for specific sector of a leading, retail, global Science & Technology company. As such, you will have the unique opportunity to gain insight into our diverse business functions allowing you to expand your skills in various technical, scientific, and business domains. Working in a project-based way covering a multitude of data domains and technological stacks, you will be able to significantly develop your skills and experience as a Data Engineer.
Ensure effective Project Management, Project Proposals preparations, Design, Documentation, Development, Validation, Solution Architect and Support activities in line with client needs and architectural requirements.
You will be part of a project to upgrade, build and optimize a new Data Warehouse / Data Lake combination.
Adherence to the organizational guidelines and processes
Requirements:
Experience in creating productive and robust ETL pipelines for batch as well as streaming ingestion
Proven working expertise with Big Data Technologies Spark Scala/PySpark, and SQL
Experience in working with cloud environments such as AWS, GCP, and Azure
Knowledge of database technologies for OLTP and OLAP workloads and a firm grasp of SQL
Agile mindset and a spirit of initiative
Interest in solving challenging technical problems
Experience with test driven development and CI/CD workflows
Experience in working closely with Data Scientists / Analysts as well as business users
Knowledge of version control software such as Git and experience in working with relevant hosting services (e.g. Azure DevOps, Github, Bitbucket
Working with heterogenous compute environments and multi-platform setups
Basic knowledge of Statistics and Machine Learning is favourable
Mandatory Experience:
4-9 years of experience in Big Data tools such as Hadoop, Spark, etc.
Hands on Experience in Object Oriented or functional programming such as Scala or Java or Python
Must have Experience in working with AWS cloud environments
Have experience with Data Pipeline and workflow management toolssuch as Oozie,Airflow
Have experience working on Hive, PostgreSQL, HBase or related technology
Knowledge of Messaging and Event Systems such as Kafka
Strong verbal, presentation and written communications skills for technical audiences; able to explain technical solutions to technical teams
Have experience working on devops model
Good interpersonal, problem solving, reasoning and analytical skills
Experience with automation of data quality/governance controls
Ability to work independently
Ability to organize and manage multiple priorities
Experience meeting high production and quality standards in a fast paced, development and production support environment
Ability to research, analyze, document, and present organizational metrics that drive business decisions
Ability to work effectively in virtual environment where key team members and partners are in various time zones and locations
Mandatory Technical Skills
Big Data: Hadoop, Spark, MapReduce, Hive, Kafka, Spark- structure Streaming
Cloud Data Warehouse: Snowflake
Cloud: Amazon Web Services, AZURE,GCP
Programming Languages: Python, Java,scala
Development Tools: Eclipse IDE, Py-charm, VSCode
Repository: Git
NoSQL Databases: DynamoDB, Mongo DB,Hbase
SQL Databases: SQL Server, MySQL, Oracle
Data Analytics: Pandas, NumPy
Visualization: Python, Tableau
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.