Announcing the Dremio integration in Datafold

We are excited to announce the launch of our new integration with Dremio in Datafold Cloud. This integration will enable current (and future!) Dremio users to:

  • Accelerate migrations to Dremio faster with automated data reconciliation
  • Enhance dbt data quality with automated testing of dbt models in CI/CD
  • Validate data replication into Dremio with Datafold’s cross-database diffing and Monitors

Dremio is the Unified Lakehouse Platform designed for self-service analytics for flexibility and performance offered at an affordable cost. With Datafold, the data testing automation platform, Dremio lakehouse users can benefit from faster data development velocity while having full confidence in their data products. We are especially thrilled to partner with Dremio given their unique focus on shift-left approach to data quality implemented with Dremio’s Apache Iceberg Lakehouse Management Service.

Migrate to the Unified Analytics Platform, faster

Dremio is the preferred analytics platform for numerous global enterprises. As organizations move to modernize their data lake environments, transitioning to a more contemporary lakehouse setup involves migrating data pipelines and workloads. These pipelines and workloads are often highly complex, built on legacy systems accumulated over decades. Moving to a modern environment can prove complex and labor-intensive, with data reconciliation emerging as a pivotal challenge. For a modernization process to be successful, it is critical for organizations to ensure that outputs from the new system align with legacy outputs and standards.

Using Datafold’s migrations toolkit, Dremio users can make the move to a more modern unified lakehouse with greater speed and confidence.

With Datafold Cloud’s cross-database diffing—value-level comparison of tables across databases—you can verify parity of tables across data systems in minutes. Automatically know if parity exists between your legacy system and Dremio, and if it doesn’t, explore the value-level differences within the Datafold UI.

Value-level differences of a table between Postgres and Dremio

We know how challenging, complex, and long data migrations can be. Make your move to Dremio’s open, flexible, and performant lakehouse with greater confidence and speed with Datafold’s automation and value-level differences by your side.

Prevent data quality issues in dbt

For current dbt and Dremio users, leverage the power of Datafold’s data diffing directly in your CI process. Anytime a new PR (pull request) is opened for your dbt project, Datafold will automatically add a comment in your PR summarizing the data differences between the staging and production versions of the dbt models you modified in your PR. Datafold’s CI comment will also list out potentially changed downstream dbt models.

If greater analysis is required to understand if potential data changes are expected and acceptable, jump directly into the Datafold UI to view the value-level data differences between your staging and production models.

With Datafold, gain the power to know exactly how your dbt code changes will impact your data before those changes are merged into your production Dremio environment.

Datafold automatically adds a comment to your dbt PR summarizing data differences between your prod and dev versions of modified models

With Datafold and Dremio, move your data quality tests to the left, prevent data quality issues from ever entering your production environment, and enable your data and analytics engineers to work with greater confidence and speed.

Data reconciliation at the value-level

For data teams that are replicating data into Dremio on a regular basis, validate your source database with the data loaded into Dremio on a continuous basis using Datafold’s cross-database diffing and Monitors functionality.

With Monitors in Datafold, your team can quickly identify parity of tables across systems on an ongoing basis. Monitors support your ability to:

  • Run data diffs for tables across your source database(s) and Dremio on a scheduled basis.
  • Receive alerts to Slack, PagerDuty, webhooks, or email when data diff results deviate from your expectations.
  • Analyze at the value level how two tables match exactly.

Ensure the data powering your core analytics work is correct using Datafold’s cross-database diffing with Dremio.

Demo time!

Watch Datafold Solutions Engineer Leo Folsom demonstrate how our new integration with Dremio works to make your migrations and dbt development and deployment more automated and higher quality.

Getting started

With Datafold and Dremio, migrate faster, move your data quality tests to the left, prevent data quality issues from ever entering your production environment, and enable your data and analytics engineers to work with greater confidence and speed.

To get started with Datafold, check out the following resources:

To learn more about Dremio,

  • To get started with Dremio, you can go here and choose your deployment type.
  • Subsurface LIVE is also happening on May 2-3, which is a global event where data engineers, data analysts, and data executives gather together to discuss and learn the positive impact of Data Lakehouse solutions.

Happy diffing!

Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.

Datafold is the fastest way to test dbt code changes