Consultant II \xe2\x80\x93 Data Engineering Profile
Role and Responsibilities:
Execute and manage large scale ETL processes to support development and publishing of reports, Datamart\xe2\x80\x99s and predictive models.
Build ETL pipelines in Spark, Python, HIVE that process transaction and account level data and standardize data fields across various data sources.
Build and maintain high performing ETL processes, including data quality and testing aligned across technology, internal reporting and other functional teams
Create data dictionaries, setup/monitor data validation alerts and execute periodic jobs like performance dashboards, predictive models scoring for client\xe2\x80\x99s deliverables
Define and build technical/data documentation and experience with code version control systems (e.g. git). Ensure data accuracy, integrity and consistency
Find opportunities to create, automate and scale repeatable financial and statistical analysis
Collaborate with Data Engineering teams across regions for production and maintenance of client\xe2\x80\x99s key data assets.
Build right Data Engineering governance and practices to ensure sustainable and scalable processes.
Design, build and deploy advance analytics models aimed to improve fraud risk strategies of clients using Hadoop technology stacks and programming languages such as Hive, PySpark, Python, Spark & shell
Design, build and deploy an anomaly detection tool to identify suspicious transaction behavior of accounts
Develop a monitoring & alert mechanism for the anomalies using shell scripts
Design, build and deploy self-serve reporting tool across business functions and clients
Design complex algorithm and apply machine learning and statistical methods on large datasets for reporting, predictive and prescriptive modeling
Develop and implement coding best practices using Hadoop/Hive, Python, and PySpark
Collaborate with offshore and onshore team and effectively communicate status, issues, and risks daily
Review and propose new standards for naming, describing, managing, modeling, cleansing, enriching, transforming, moving, storing, searching and delivering all data products within the enterprise
Analyze existing and future data requirements, including data volumes, data growth, data types, latency requirements, data quality, the volatility of source systems, and analytic workload requirements
Design, build and deploy advance analytics models aimed to improve fraud risk strategies of clients using Hadoop technology stacks and programming languages such as Hive, PySpark, Python, Spark & shell
Design, build and deploy an anomaly detection tool to identify suspicious transaction behavior of accounts
Develop a monitoring & alert mechanism for the anomalies using shell scripts
Design, build and deploy self-serve reporting tool across business functions and clients
Design complex algorithm and apply machine learning and statistical methods on large datasets for reporting, predictive and prescriptive modeling
Develop and implement coding best practices using Hadoop/Hive, Python, and PySpark
Collaborate with offshore and onshore team and effectively communicate status, issues, and risks daily
Review and propose new standards for naming, describing, managing, modeling, cleansing, enriching, transforming, moving, storing, searching and delivering all data products within the enterprise
Analyze existing and future data requirements, including data volumes, data growth, data types, latency requirements, data quality, the volatility of source systems, and analytic workload requirements
Candidate Profile:
Preferred Qualifications -
Required \xe2\x80\x93 Hadoop/Hive, Python, Spark
2+ years hands-on experience working with Big Data Platforms such as Cloudera, Hortonworks, or MapR
Python & ML modeling experience is a plus
Experience with using the Agile approach to deliver solutions
Experience with handling large and complex data in Big Data Environment
Experience with designing and developing complex data products and transformation routines
Experience of working in financial services and risk analytics domain, a plus
Strong record of achievement, solid analytical ability, and an entrepreneurial hands-on approach to work
Outstanding written and verbal communication skills
BA/BS/B.Tech. minimum educational requirement with 3-5 years of minimum work experience
Lead Experience building Data Engineering Pipeline for Large BigData / Data warehouse / Data Lakes with BigData Hadoop technologies.
Working knowledge of Hadoop ecosystem and associated technologies, (for e.g., Apache Spark, Hive, Python, Presto, Airflow & Pandas)
Should have strong problem-solving capabilities and ability to quickly propose feasible solutions and effectively communicate strategy and risk mitigation approaches to leadership.
Technical Qualifications:
Strong experience in creating Large scale data engineering pipelines, data-based decision-making and quantitative analysis.
Strong Experience and exposure to code version control systems tools like GIT and job automation tools like Apache Airflow etc. and good knowledge of CI/CD pipelines is desirable.
Advanced experience in writing and optimizing efficient SQL queries with Python, Hive, Scala handling Large Data Sets in Big-Data Environments.
Experience with complex, high volume, multi-dimensional data, as well as machine learning models based on unstructured, structured, and streaming datasets.
Experience with SQL for extracting, aggregating and processing big data Pipelines using Hadoop, EMR & NoSQL Databases.
Experience creating/supporting production software/systems and a proven track record of identifying and resolving performance bottlenecks for production systems.
Experience with Unix/Shell or Python scripting and exposure to Scheduling tools like Oozie and Airflow.
Exposure to stream-processing systems like Apache Storm, Spark-Streaming