Whether the decision is made by an executive looking at a dashboard or by a machine learning model, one can no longer ignore the quality of the data that feeds those decisions at a modern organization: too much is at stake. Data teams are facing extraordinary complexity and volumes of data on the one hand, and increasing reliability expectations on the other. This reality is impossible to manage without the right tools that monitor and control data quality.
Software engineers faced a similar challenge a decade ago amid the explosion of cloud infrastructure and distributed applications. The tools for continuous integration, automated testing, and ubiquitous observability that make modern software systems possible are still new to the data world. But it is the implementation of these ideas and processes that can enable data teams to tame the complexity, move fast and with confidence.
We are building Datafold to 10x the productivity of data professionals across all industries and company sizes by giving them full visibility into their data assets and automating toil tasks that currently consume most of their time. One of our first features – Data Diff – helps data developers to quickly verify the changes introduced to the data pipelines, effectively automating one of the most time-consuming and high-risk workflows.
At the same time, robust engineering practices are introduced in other parts of data stack: the dbt team has been leading the Analytics Engineering movement and enabled Data developers with a user-friendly approach to building SQL data pipelines that elegantly incorporates some of the most important principles of agile software engineering, such as:
Now Data teams can take their workflows to the next level using the one-click Datafold integration with dbt to boost their productivity and move faster without risking degrading data quality thanks to three Datafold features:
It’s a one-click integration with dbtCloud, and for teams hosting dbt themselves, we expose an API for connecting with the CI.
With Datafold Data Diff, you now have the ability to see how a change in your SQL code affects the data in your modified data table as well as its downstream dependencies.
Data engineers spend countless hours manually mapping their data flows. When they aren’t digging into old spreadsheets or reading the source code files, they are asking their colleagues for help. Often the need for lineage comes with an urgency of resolving a data incident that requires immediate reaction to avoid costly damages.We realize how stressful and mundane this process is, which is why we’ve released column-level lineage. Using SQL files and metadata from the data warehouse, Datafold constructs a global dependency graph for all your data, from events to BI reports:
Detailed lineage can help you reduce incident response time, prevent breaking changes, and optimize your infrastructure. Goodbye to spending late nights answering questions such as:
Finding and understanding the data for every task is often a time-consuming process, considering that nowadays it's not uncommon for a data warehouse to have 5000+ tables and 100,000+ columns.With Datafold Data Catalog, you can keep your data documentation close to your code (e.g. dbt model) and serve it in a responsive interface with full-text search & per column-profiler. Alternatively, you can use a Notion-like editor to document your tables and columns:
Want to give it a try? Schedule a demo here to see it in action.