Easily Diff Data Across Databases
Data-diff is an open-source command-line tool and Python library to efficiently diff rows across two different databases.
rows in <10s
rows in ~2min
Whenever you replicate data from one database to another, you can now verify they actually match. This makes migrations less error prone and pipelining data more robust.
See exactly which rows don't match, and get high level statistics about differences within seconds.
With column level lineage you can see the impact of inconsistent data on downstream models and dashboards. You can also use Data Diff to automate regression testing in transformation change management.
$ pip install data-diff
And you’re ready to start comparing data across databases. Check out the documentation for a guide to setting up.