hyrefox consultants

Data Backfill Audit – Data Engineer(Contra )-Remote Full-Time

New Delhi
April 16, 2024
Apply Now
Deadline date:

Job Description

Why do we need this role?

In Q4 2021, Chegg performed a historical data backfill of Adobe logs. Site activity from our legacy Adobe logs was transformed and migrated into the structure of our Chegg-owned event schema. During the project, key metrics were validated to ensure that the transformed data aligned with historical metrics. Over the last four months, as more reports cut over to the new data source, we’ve discovered data issues in the secondary and tertiary metrics that were never validated during the backfill. Reacting to these issues is 1) causing data users to become less trustful of data and 2) sucking up bandwidth from teams who are already committed to H1 and H2 roadmap programs.

What will this role deliver?

This role will reprocess event data to resolve issues found during a data audit, to be led by the Data Analyst Contractor. The initial focus will be on fixing issues in the secondary and tertiary metrics from the historical backfill. The audit and reprocessed data will deliver a purified event lake that is accurate and trusted.

Responsibilities

This role will help determine the root cause of issues in the historical backfill, determine the best method to resolve them and implement the code (Databricks Notebook Python and Spark jobs) to resolve the issues. Key to this role will be having the ability to investigate complex data pipelines and existing Python code to identify the root cause of an issue and the technical ability to rewrite data transformations to efficiently reprocess historical data.

Requirements

• 3-7 years of demonstrated success in a data engineering role developing Python and Spark data pipelines

• Top-decile data engineering skills; strength in SQL and Python for data transformations

• Experience with Adobe Analytics Clickstream logs preferred

• Experience with data models and database technologies, especially Redshift and Spark

• Experience with Databricks preferred

• Experience troubleshooting and resolving data quality issues

• Able to communicate clearly, especially with written documentation

• Self-motivated, independent, organized and proactive, highly responsive, flexible and adaptable when working across teams

• Bachelor’s degree in a quantitative field (e.g. engineering, sciences, math, statistics, business or economics) required. Master’s degree in a quantitative field preferred

• Passionate about producing high-quality data products and communicating recommendations

Other Details

  • Keywords : ..