Introducing the Datafold VS Code Extension
Datafold is excited to announce the launch of the first version of the Datafold VS Code Extension, a new developer experience tool that’s integrating data quality testing, data diffing, and Datafold into your development workflow.
The VS Code extension is an enhancement of the open source data-diff product from Datafold. With the Datafold VS Code extension, you can quickly run and diff the dev and prod results of dbt models, view and interact with data diff results in a clean GUI, and automatically diff models that change—all within your VS Code IDE.
By bringing data quality testing and diffing directly into the development process, you’re allowed to focus on what really matters: quickly and confidently delivering high-quality analytics solutions to your stakeholders.
<blockquote>💡 What is a data diff? A data diff is a comparison between two data tables that checks whether every value has changed, stayed the same, been added, or removed. It’s just like a git code diff, but for the tables in your data warehouse. </blockquote>
Why diffing belongs in the development workflow
We have yet to meet a data practitioner who hasn’t viscerally felt the risk of introducing a major regression. And there’s a few reasons why:
- You’re developing and testing with few guardrails in place, making it hard to assess the impact of your code changes on the data.
- You ship either untested code or write a ton of boilerplate ad hoc SQL to “roughly” check your change.
- Your dbt tests pass, and your PR reviewer agrees everything looks good, but still…no one fully understands how the code updates are changing the data.
When you diff your data, you get to preview your code changes on data quickly and develop a full understanding of how the dev and prod versions of your data will differ. When you diff your data during development, you identify potential regressions in real time, before your PR is even opened—and long before a bad PR gets merged in to production.
When we developed open source data-diff, we began seeing the ways it changed the dbt development and testing landscape: Analytics engineers began regression testing; PRs began getting merged in faster (without being reverted); data quality testing started becoming streamlined.
At Datafold, we believe that data quality is a byproduct of a great workflow. And a great workflow starts by testing your data during development. This is why we’re supporting data diffing directly in your dbt workflow—so you can soundly test your data with ease and develop dbt models with confidence.
The Datafold VS Code Extension is not just lowering the barrier to data diffing—the proactive data testing solution that allows you to find potential data quality issues before they enter your production environment; it’s also reducing the inherent friction that exists between development tools and data work—empowering you as a data practitioner to spend less time shifting between tools and more time solving real analytics problems.
Streamlined developer experience
By installing the Datafold VS Code Extension in your local VS Code environment, you can:
- Run and diff your dbt models of choice,
- See high-level diff overview of dbt models in a clean GUI, and
- Use the Watch Mode to automatically see diff results after each dbt run.
In addition, by installing the Datafold VS Code extension, you’ll receive free 30-day access to value-level differences—a Datafold Cloud exclusive (❕) feature.
Using this extension, you’re bringing data diffs seamlessly into your dbt workflow to reveal the high and value-level impact of code changes as you code.
To get started with the new Datafold VS Code extension, simply install the extension via the VS Code Extension Marketplace. Follow the demo video below to install data-diff, run a dbt model, and see diff results within your VS Code environment.
For the future: Bringing diffing and Datafold Cloud to your local IDE
And this is just the beginning for bringing data quality testing directly into your dev environment. As we continue to develop the Datafold VS Code extension, keep an eye out for improvements such as:
- Integrating more Datafold Cloud features like column-level lineage and data apps,
- Greater diffing context and power directly in your dev workflow,
- and more 👀
If you’re an existing Datafold user or interested in getting started with Datafold using the VS Code extension today, please check out the following resources:
- 🛠️ Install the extension from the Datafold VS Code Extension Marketplace page
- 📖 Read the documentation for more details of the extension’s functionality
Have some thoughts on how we can improve the data developer experience? Please reach out to us in the #tools-datafold channel in the dbt Community Slack.
Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.