Consultant I \xe2\x80\x93 Data Engineering Profile
Role and Responsibilities:
Execute and manage large scale ETL processes to support development and publishing of reports.
Maintain the existing pipelines for Month End/BAU refresh.
Identify bugs, debug them and also implement the changes.
Maintain proper version controls and documentation of the pipelines.
Collaborate with Data Engineering teams across regions for production and maintenance of client\xe2\x80\x99s key data assets.
Design, build and deploy advance analytics models aimed to improve fraud risk strategies of clients using Hadoop technology stacks and programming languages such as Hive, PySpark, Python, Spark & shell
Design, build and deploy an anomaly detection tool to identify suspicious transaction behavior of accounts
Develop a monitoring & alert mechanism for the anomalies using shell scripts
Design, build and deploy self-serve reporting tool across business functions and clients
Design complex algorithm and apply machine learning and statistical methods on large datasets for reporting, predictive and prescriptive modeling
Develop and implement coding best practices using Hadoop/Hive, Python, and PySpark
Collaborate with offshore and onshore team and effectively communicate status, issues, and risks daily
Review and propose new standards for naming, describing, managing, modeling, cleansing, enriching, transforming, moving, storing, searching and delivering all data products within the enterprise
Analyze existing and future data requirements, including data volumes, data growth, data types, latency requirements, data quality, the volatility of source systems, and analytic workload requirements
Design, build and deploy advance analytics models aimed to improve fraud risk strategies of clients using Hadoop technology stacks and programming languages such as Hive, PySpark, Python, Spark & shell
Design, build and deploy an anomaly detection tool to identify suspicious transaction behavior of accounts
Develop a monitoring & alert mechanism for the anomalies using shell scripts
Design, build and deploy self-serve reporting tool across business functions and clients
Design complex algorithm and apply machine learning and statistical methods on large datasets for reporting, predictive and prescriptive modeling
Develop and implement coding best practices using Hadoop/Hive, Python, and PySpark
Collaborate with offshore and onshore team and effectively communicate status, issues, and risks daily
Review and propose new standards for naming, describing, managing, modeling, cleansing, enriching, transforming, moving, storing, searching and delivering all data products within the enterprise
Analyze existing and future data requirements, including data volumes, data growth, data types, latency requirements, data quality, the volatility of source systems, and analytic workload requirements
Candidate Profile:
Preferred Qualifications -
Required \xe2\x80\x93 Hadoop/Hive, Python, Spark
2+ years hands-on experience working with Big Data Platforms such as Cloudera, Hortonworks, or MapR
Python & ML modeling experience is a plus
Experience with using the Agile approach to deliver solutions
Experience with handling large and complex data in Big Data Environment
Experience with designing and developing complex data products and transformation routines
Experience of working in financial services and risk analytics domain, a plus
Strong record of achievement, solid analytical ability, and an entrepreneurial hands-on approach to work
Outstanding written and verbal communication skills
BA/BS/B.Tech. minimum educational requirement with 1-3 years of minimum work experience
Experience in maintaining existing pipelines/ maintaining BI reports
Working knowledge of Hadoop ecosystem and associated technologies, (for e.g., Apache Spark, Hive, Python, Presto, Airflow & Pandas)
Should have strong problem-solving capabilities and ability to quickly propose feasible solutions and effectively communicate strategy and risk mitigation approaches to leadership.
Technical Qualifications:
Strong Experience and exposure to code version control systems tools like GIT and job automation tools like Apache Airflow etc. and good knowledge of CI/CD pipelines is desirable.
Experience in writing and optimizing efficient SQL queries with Python, Hive, Scala handling Large Data Sets in Big-Data Environments.
Experience with complex, high volume, multi-dimensional data, as well as machine learning models based on unstructured, structured, and streaming datasets.