OPEN SOURCE DATA-DIFF

Easily Diff Data Across Databases

Data-diff is an open-source command-line tool and Python library to efficiently diff rows across two different databases.

Datafold DataDiff Open Source illustration

See it in action

25M+

rows in <10s

1B+

rows in ~2min

Billion Row Cross Databases Diffing In Minutes

Check that data gets from A to B

Whenever you replicate data from one database to another, you can now verify they actually match. This makes migrations less error prone and pipelining data more robust.

Datafold DataDiff Open Source illustration

Get detailed differences, fast

See exactly which rows don't match, and get high level statistics about differences within seconds.

Datafold extends data-diff

With column level lineage you can see the impact of inconsistent data on downstream models and dashboards. You can also use Data Diff to automate regression testing in transformation change management.

Datafold DataDiff Open Source illustration

Get started diffing data

$ pip install data-diff

Copy

And you’re ready to start comparing data across databases. Check out the documentation for a guide to setting up.

Request an Async demo in your inbox

Fill in the form and get a walkthrough of the platform tailored to your stack and use case
Request a demo

Book a Live demo

Schedule a personal call and see how Datafold can help you

Productive and enjoyable data engineering is right around the corner!

Schedule a demo

Get Started

To get Datafold to integrate seamlessly with your data stack we need to have a quick onboarding call to get everything configured properly

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Next