1. Data Acquisition
- Candidate should manage the existing Data pipelines built for data ingestion.
- Create and manage new data pipelines following the best practices for the new ingestion of data.
- Continuously monitor the data ingestion through Change Data Capture for the incremental load.
- Any failed batch job schedule to be analyzed and fixed to capture the data.
- Maintaining and continuously updating the technical documentation of the ingested data and maintaining the centralized data dictionary, with necessary data classifications.
2. Data Extraction and Cleaning
- Extraction of data from the data sources to be cleaned and ingested into a big data platform.
- Automation of data cleaning has to be defined before ingestions.
- Data cleaning to handle the missing data and remove any outliers and resolve any inconsistencies.
- Data quality check has to be performed in terms of accuracy, completeness, consistency, timeliness, believability, and interpretability.
3. Data Integration, Aggregation and Representation
- Exposing Data views or Data models to Reporting and source systems using Hive or Impala, or similar tools.
- Exposing cleansed data to the Artificial Intelligence team for building data science models.
4. Informatica Data Catalog
- Implement and configure the Informatica Enterprise Data Catalog (EDC) solution to discover and catalog data assets across the organization.
- Develop and maintain custom metadata scanners, resource configurations, and lineage extraction processes.
- Integrate EDC with other Informatica tools, such as Data Quality (IDQ), Master Data Management (MDM), and Axon Data Governance.
- Define and implement data classification, data profiling, and data quality rules to improve data visibility, accuracy, and trustworthiness.
- Collaborate with data stewards, data owners, and data governance teams to identify, document, and maintain business glossaries, data dictionaries, and data lineage information.
- Establish and maintain data governance policies, standards, and procedures within the EDC environment.
- Monitor and troubleshoot EDC performance issues, ensuring optimal performance and data availability.
- Train and support end-users in effectively utilizing the data catalog for data discovery and analysis.
- Keep up to date with industry best practices and trends, continuously improving the organization\'s data catalog implementation.
- Collaborate with cross-functional teams to drive data catalog adoption and ensure data governance compliance across the organization.
Skill Set:
- Certified Big Data Engineer from Cloudera/AWS/Azure
- Expertise with Big data products Cloudera stack.
- Expertise in Big Data querying tools, such as Hive, Hbase, and Impala.
- Expertise in SQL, writing complex queries/views, partitions, and bucketing.
- Strong Experience in Spark using Python/Scala.
- Expertise in messaging systems, such as Kafka or RabbitMQ.
- Hands-on experience in the Management of the Hadoop cluster with all included services.
- Implementing ETL process using Sqoop/Spark.
- Implementation including loading from disparate data sets, Pre-processing using Hive.
- Ability to design solutions independently based on high-level architecture.
- Collaborate with other development teams.
- Expertise in building stream-processing systems, using solutions such as Spark-Streaming, Apache NIFI, and KAFKA.
- Expertise with NoSQL databases such as HBase.
- Experience with Informatica Enterprise Data Catalog (EDC) implementation and administration.
- Strong knowledge of data management, data governance, and metadata management concepts.
- Proficiency in SQL and experience with various databases (e.g., Oracle, SQL Server, PostgreSQL) and data formats (e.g., XML, JSON, CSV).
- Experience with data integration, ETL/ELT processes, and Informatica Data Integration.
Location: Chandigarh
Salary: No bars for the right candidate.
Working: 5 days (WFO)
Expertia AI Technologies
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.