Snapcommerce accelerated QA process for updating critical dbt models from 4 to 1 day

Datafold is like a booster shot... you get extra protection and security when you make changes.

Jonathan Talmi , Senior Data Platform Manager

If the accuracy of millions of dollars in weekly transactions depended on your data stack, how confident would you feel while making changes to your data code and queries? For Snapcommerce, manual data QA processes were time consuming and error prone. By adopting Datafold, Snapcommerce was able to cut down QA time while increasing accuracy and confidence in their data.

About

Founded in 2016, Snapcommerce is a mobile-centric commerce company that focuses on personalized and curated shopping results through messaging channels like SMS, Facebook Messenger, and others. Snapcommerce’s first product was Snaptravel, an Online Travel Agent (OTA), which provides users with affordable flights and hotel rooms. In total, across its product lines, Snapcommerce has achieved over $1 billion in total sales. Its most recent round of funding pulled in $85 million, bringing its total funding to over $100 million since its initial Series A funding of roughly $30 million in 2017. Snapcommerce is located in Toronto, Canada, with a company size of 100-200 employees.

The operating data stack at Snapcommerce manages over 200 data source tables in Snowflake and roughly 500 dbt models.

The Challenge

The challenge at Snapcommerce was a familiar one for their market segment: accurate supplier payment at scale. Snapcommerce pays suppliers millions of dollars per week. Mistakes can result in overpayments, which can be costly, or underpayments, which can sour relationships with key partners. All the while, Snapcommerce must account for bespoke supplier invoicing practices.

A key piece of their process is a complex series of SQL statements totaling nearly 500 lines. These statements handle not only the core decision logic, but also many edge cases to prevent accidental overpayment. When Snapcommerce’s own records disagree with invoices, the statements help resolve the discrepancy. For example, a statement might retrieve evidence of a pre-existing payment for an invoice. Snapcommerce needed a solution to help manage changes and updates to this complex business-critical logic.

In addition, all pay data needed to be reconciled by the finance team. That meant they needed to see row level changes across not just the table being updated, but on all downstream tables as well. When updates to the reconciliation logic were required, new tables were generated and the finance team manually compared them to the production tables. When things looked off, they noted the invoice and investigated further. Occasionally, the data was exported into Excel for more advanced comparisons. This process was time consuming and error prone.

Finally, Snapcommerce also needed to produce accurate internal reporting for board members and investors. Small changes in the data or processing algorithms could risk creating a ripple effect on the resulting reports used to make business decisions. This was a risk that Snapcommerce needed to mitigate through a holistic data management approach.

The Solution

For several reasons, Snapcommerce settled on Datafold as the preferred solution.

Diff for Complex Model Updates

From a data and finance perspective, Data Diff clearly outlined the impacts of particular changes to data models, processing algorithms, or when encountering edge cases. Prior to adopting Datafold, the change process looked like the following:

The data team would make a change to the decision algorithm that matches a travel booking to an invoice and determines whether a supplier payment should be made, and how much should be paid out. A model using the new logic was loaded into a sandbox environment.
A few invoices would be manually pulled and used as a representative test case, with the new changes resulting in several spreadsheets that needed review.
A manual and visual review of the outputs would be conducted to determine if the change was effective. This could take multiple days depending on the amount of data generated, and there was always a high risk of human error. After the Datafold integration, the change process took on the following form:
The data team would make a change to the decision algorithm that matches a booking to an invoice and determines whether a supplier payment should be generated.
A Data Diff is automatically generated with all of the impacted data highlighted.
A visual review of the impacted data is done in a fraction of the original time and with complete accuracy. Column-level metrics also help identify the broader financial impact of a code change, e.g. with this update, $X more will be paid out to suppliers, or Y more bookings will be marked as “to pay” Datafold greatly simplified the process of identifying impacted data while also making the output accessible cross-functionally—even to people outside of the data engineering team.

dbt integration

Datafold was the frontrunner for solutions because of its ease of integration and ease of use.

Snapcommerce needed a solution that would integrate easily with dbt. Fortunately, Datafold comes with a dbt integration, and its automatic Data Diff generation greatly reduces the time and effort needed to highlight the affected data and the speed of resolution. Data Diff provides an extra layer of protection and rigor to a data QA process that would otherwise need to be built from the ground up.

Snapcommerce put this integration to use on every pull request related to its finance reconciliation process around supplier payment processing. With 20 different payment-determination conditions to evaluate, integrating with Snapcommerce’s existing data platform was critical.

Finance Reconciliation

Snapcommerce also looked to Datafold for its ability to bring more collaboration between the data and finance teams. The data team makes updates to models and the finance team needs to verify the impact of those changes for payment generation and reporting. The Data Diffs produced by Datafold were **consumable by both the data team and finance team **and lead to them making more accurate updates to their dbt models faster.

The highlighted data would help the finance team determine what needed a closer look. In most cases, these instances were driven by an edge case, so the invoice that triggered the edge case would be used as the source of truth for the Data Diff review. Then, the edge case could be addressed in code to avoid further triggering of the edge case.

In practice, the Data Diff process used by Snapcommerce is:

A code change happens, and this generates a new Data Diff.
The diff link is forwarded to the finance team.
The finance team reviews the highlighted data in the diff, evaluating the level of correctness by using the source invoice as a reference point.
The finance team approves the diff or works with the data team to address any issues.

Datafold gives you the ability to QA things in a way that you could almost never do on your own.

The Result

Snapcommerce achieved a drastic reduction in the time needed for data QA. For more complex changes, the QA before Datafold integration could take as long as three to four days. After integrating Datafold, the QA time for Snapcommerce shrank to less than one day. Although absolute times varied by team and use case, on average Snapcommerce achieved a 50-75% reduction in time needed to perform QA on data-related changes at a higher level of accuracy.

Data Diff enabled Snapcommerce to elevate confidence in their data QA process, avoid financially impactful mistakes, and accurately pay suppliers. Lastly, Datafold’s user-friendly interface allowed non-technical resources to effectively collaborate with the data team in a mutually beneficial way, yielding a decrease in mistakes and an increase in morale.

Challenge

Snapcommerce uses Datafold and dbt to update complex business-critical logic without any mistakes. Datafolds integration with dbt lets users easily diff within the pull request and get a better view of dependencies with column-level lineage

Outcome

Payment-related data incidents

75%

Reduction in QA time, at a higher level of accuracy

100%

Data confidence