Nutrafol's data team saves 100+ hours per month reviewing dbt code changes while increasing trust in data

Datafold’s Data Diff is great for scaling the PR review process. It creates consistency across every data engineer - instead of each person having their own ideas about how to evaluate a PR, this makes sure every PR is evaluated in the same capacity.

Callie Davis , Vice President of Customer Data & Insights

Nutrafol is a hair wellness company. The subscription-based, direct-to-consumer e-commerce organization provides subscribers with a range of supplements and products to improve hair growth and increase overall health and wellness. A Series B startup, Nutrafol has raised $35 million with 2021 top-line revenue projected at $150-175 million.

The Problem

Nutrafol was experiencing data ingestion issues and inadvertent changes in data models. With a wide range of marketing and customer acquisition channels, the Data Team was responsible for pulling in data from numerous varied sources to support business visualizations and ad hoc reporting.

Frustratingly, leadership was too often the ones who found issues on those dashboards. The Data Team wanted to be the ones to say, “Don’t look at CPA, we were having issues with Facebook ingestion” but instead feared that the team would be facing 2-3 months of bad data without knowing it.

The Nutrafol Data Team would use some manual testing before making changes that would impact important KPIs, but often lacked the visibility into how changes would affect downstream data pipelines which could cause bad data to negatively impact the business. For example, incorrect data on Customer Acquisition Cost (CAC) could inspire business leaders to ramp up spend on a specific channel to gain new customers. If that number was reported as lower than the reality, this would reduce earnings, financial performance, and Profit & Loss (PNL) attribution.

To ensure high esteem of the Data Team, and maintain a strong level of confidence in the data across the organization, it was clear that the Data Team needed to catch these issues before they made it into production. This would boost confidence while also improving the data produced.

As the team and company grew, the data models continually evolved and grew as well. The Data Team needed platforms and tools to scale with the business, making proactive data validation a priority to get ahead of the bad data. This way, the data leaders could ensure a strong level of confidence in what the Data Team was doing and build reputation and trust in the data.

The Solution

Nutrafol brought in consultants to facilitate their data quality and observability journey. In collaboration with the Nutrafol CTO, the team decided to implement Datafold to improve data quality.

Mammoth Growth is a growth marketing and analytics team as a service that helps companies extract, visualize and interpret their data for growth. As a Datafold partner, Mammoth Growth worked to quickly implement Datafold to help Nutrafol speed up development and increase visibility into key metrics. Mammoth Growth’s strong understanding of Datafold, Snowflake roles and integrations with other tools such as dbt meant that Nutrafol were able to hit the ground running.

The most appealing part of the platform was the way that Datafold integrated deeply and seamlessly in the team’s workflows, making it ideal for everything from data discovery and tech spec to ongoing testing and maintenance of data pipelines.

Data Diff could check every change to a data pipeline and highlight how the change in source code would affect the data produced by the pipeline. This would make it easy to spot unexpected changes or impact downstream.

With column-level lineage, Datafold changed the perception of data quality at Nutrafol, particularly by providing clear visibility into data dependencies. For example, the team was transforming a column that feeds into the calculation of net revenue. Without Data Diff and column-level lineage, this value change in a relatively low priority column would have eventually impacted the high priority reporting, erroneously showing net revenue plummeting.

In addition, Datafold works best when it has clearly defined primary keys to use in comparisons - Mammoth Growth had already created extensive tests which translated directly to Datafold, ensuring a smooth integration with Nutrafol’s Pull Request process.

Datafold highlighted to the Nutrafol team how little things can cause huge downstream impact, level setting the understanding that even a minor change in a “bronze-level” table can change how things are calculated for a “gold-level” report. Thus, it’s imperative to check the downstream impact of all changes before they make it into production.

The Results

Increased data confidence. Proactive features reduce the need for alerts on changes made by the Data Team. Data Diff is ideal for data quality consistency; by integrating it into the pull request (PR) review process, data leaders can validate data changes and understand the impact on business metrics.
Automated ETL regression testing. Data Diff gives an automated “gut check” of any impacts of a change. By embedding this evaluation as part of the CI process, validating every source code change before merging into production, it raises the level of confidence that the Data Team has in the data. Plus, the “report card effect” means that the VP of Data and other data leaders who can’t investigate every code change can see at a glance if anticipated differences make sense based on their broader contextual knowledge.
Streamlined technical specifications requirements. As Nutrafol investigates recategorizing the product lineup, the tech spec process for the Data Team has been sped up significantly with Datafold’s column-level data lineage. As the different Nutrafol teams reevaluate product categorization, lineage show how those changes to business logic would impact models and require any necessary pipeline refactoring. While dbt’s table-level lineage can help, it is Datafold’s column-level lineage that transitions this from hours of investigation to a 1-hour meeting.
Scalable data quality as the team expands. The Nutrafol team is preparing to grow at a rapid pace, with Datafold becoming exponentially more important as the team gets bigger to ensure that old and new team members follow the same data validation process.

Mammoth Growth often recommends Datafold as part of a comprehensive Analytics Engineering build out as Data Testing is a priority for their clients. Datafold helps audit how changes to code will affect business reporting. Column level lineage and profiling makes it easy to understand ever growing data models and find the data you need.

Challenge

Learn how Nutrafol data team saves 100+ hours per month reviewing dbt code changes with Datafold's Data Diff.

Outcome

Weeks to <24 hours

Faster reaction to data outages

100+

Hours saved per month with automated regression testing & streamlined tech spec process

100%

Pull request consistency