Sunsetting open source data-diff

As of May 17, 2024, we at Datafold have made the hard decision to no longer actively support or develop open source data-diff. This will enable us to focus entirely on evolving our core product, Datafold Cloud. 

Backstory

We started Datafold in 2020 to create a comprehensive data quality platform that solves the most tedious, critical, and error-prone parts of a data engineer’s workflow. This includes tasks like reconciling data during migrations and replication and making changes to data transformation code. To address these needs, we developed technology enabling the comparison of datasets at any scale.

Recognizing the need to integrate with various products in the data ecosystem and provide a rich UI to visualize differences, we initially launched Datafold as a SaaS product without an open source component. Then, two years ago, we attempted to make data diffing more accessible to the data community with a "light" version of our SaaS product by introducing an open source package called data-diff.

Since then, and especially in the last year, we’ve experienced increasing demand for Datafold Cloud and the growing sophistication and scale of our customers. As a customer-first and early-stage company, this has required putting all of our resources into enabling our customers. 

Continuing to support the open source tool for the larger community required maintaining two distinct products with different codebases yet significantly overlapping functionality.

As a data engineer who has experienced the frustration of using poorly built or maintained tools, I feel strongly that we’ll have a greater impact as a company by focusing our resources on one of these two products. Unfortunately, that means letting go of an open source project for the long-term benefit of our customers and our company.

As the Datafold team concludes the journey with open source data-diff, we look forward to even greater focus on Datafold Cloud. Data diffing continues to remain a core part of our product, and we’re excited to have released a number of key improvements over the past months, including sampling, real-time results, support for new databases, improved command line and developer functionality, and more, for both in-database and cross-database data comparison.

We will continue to innovate on how data teams can further move data quality testing to the left, with tooling that covers the entire data quality lifecycle.

We are grateful for your support and excited to have you with us on the next frontier at Datafold.

- Gleb

Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.

Datafold is the fastest way to test dbt code changes