Datafold Blog
Exploring how to do data better
Exploring how to do data better
A guide on how to build data systems that do not break.
Read More
Read More
A guide on how to build data systems that do not break.
Read More
Datafold is now available on the AWS Marketplace. Leverage AWS Marketplace benefits to unlock the leading automated data testing tool.
Read More
Read More
Datafold is now available on the AWS Marketplace. Leverage AWS Marketplace benefits to unlock the leading automated data testing tool.
Read More
Get an overview of Data Quality Meetup #9. With speakers from Capital One, FINN Auto, Airbytes, Virgin Media O2 and dbt Labs. The Meetup was focused on "Running dbt at scale", where a variety of approaches were discussed across small startups to enterprise-sized organizations. Each speaker presented during the lightning talks round, which later transitioned into the panel discussion.
Read More
Read More
Get an overview of Data Quality Meetup #9. With speakers from Capital One, FINN Auto, Airbytes, Virgin Media O2 and dbt Labs. The Meetup was focused on "Running dbt at scale", where a variety of approaches were discussed across small startups to enterprise-sized organizations. Each speaker presented during the lightning talks round, which later transitioned into the panel discussion.
Read More
Once Datafold made their first dedicated data hire (me), the volume and frequency of contributions to our dbt project went through the roof. Our single Github Actions job was doing a full build of our project with every pull request I opened, and for each merge. This posed problems: specifically, long feedback loops on CI, and an increased Snowflake bill.
Read More
Read More
Once Datafold made their first dedicated data hire (me), the volume and frequency of contributions to our dbt project went through the roof. Our single Github Actions job was doing a full build of our project with every pull request I opened, and for each merge. This posed problems: specifically, long feedback loops on CI, and an increased Snowflake bill.
Read More
This blog outlines how to setup a dbt Cloud PR Job in CI, which enables building dbt models and running dbt tests whenever you push commits to your GitHub (or GitLab) Pull (or Merge) Request.
Read More
Read More
This blog outlines how to setup a dbt Cloud PR Job in CI, which enables building dbt models and running dbt tests whenever you push commits to your GitHub (or GitLab) Pull (or Merge) Request.
Read More
Why we created a Development Testing suite for dbt and how to get started with the open source data-diff dbt integration, as well as a brief introduction of Datafold Cloud.
Read More
Read More
Why we created a Development Testing suite for dbt and how to get started with the open source data-diff dbt integration, as well as a brief introduction of Datafold Cloud.
Read More
Learn how to add freshness tests to your dbt sources, determine freshness values and use the dbt source freshness command.
Read More
Read More
Learn how to add freshness tests to your dbt sources, determine freshness values and use the dbt source freshness command.
Read More
Learn about common challenges and solutions to test data pipelines that spread across multiple layers and tools. In the future, data testing efforts may consolidate on the transformation layer while the orchestration layer simplifies creating testing environments.
Read More
Read More
Learn about common challenges and solutions to test data pipelines that spread across multiple layers and tools. In the future, data testing efforts may consolidate on the transformation layer while the orchestration layer simplifies creating testing environments.
Read More
Learn how to validate dbt model changes with dbt's audit_helper package and Datafold's data-diff CLI.
Read More
Read More
Learn how to validate dbt model changes with dbt's audit_helper package and Datafold's data-diff CLI.
Read More
Learn how to test and validate dbt code changes in Snowflake with data-diff. After building your dbt models with 'dbt run' in development you can run the 'data-diff --dbt' command to compare dbt models between environments.
Read More
Read More
Learn how to test and validate dbt code changes in Snowflake with data-diff. After building your dbt models with 'dbt run' in development you can run the 'data-diff --dbt' command to compare dbt models between environments.
Read More
Learn how to use dbt-expectations to test dbt sources and models. Avoid alert fatigue by limiting the scope of your tests and using Datafold to validate dbt code changes.
Read More
Read More
Learn how to use dbt-expectations to test dbt sources and models. Avoid alert fatigue by limiting the scope of your tests and using Datafold to validate dbt code changes.
Read More
Data quality issues are caused by changes in the code, data and infrastructure. Compare solutions to test code, observe data and monitor infrastructure.
Read More
Read More
Data quality issues are caused by changes in the code, data and infrastructure. Compare solutions to test code, observe data and monitor infrastructure.
Read More
Learn how to create dbt Python models in Snowflake, Databricks and BigQuery. dbt Python models are defined as a Python function named model that returns a dataframe.
Read More
Read More
Learn how to create dbt Python models in Snowflake, Databricks and BigQuery. dbt Python models are defined as a Python function named model that returns a dataframe.
Read More
Discover dbt Cloud key features and learn what it takes to implement equivalent features for dbt Core.
Read More
Read More
Discover dbt Cloud key features and learn what it takes to implement equivalent features for dbt Core.
Read More
Get an overview of Data Quality Meetup #8. With speakers from Maple, Crane Worldwide Logistics, Zingg.ai, Entalytica.com and Datafold. The Meetup included lightning rounds on Improving Data Quality, Data Quality Defenses, Entity Resolution, CI Pipelines and Datafold's launch of the dbt and Data Diff integration.
Read More
Read More
Get an overview of Data Quality Meetup #8. With speakers from Maple, Crane Worldwide Logistics, Zingg.ai, Entalytica.com and Datafold. The Meetup included lightning rounds on Improving Data Quality, Data Quality Defenses, Entity Resolution, CI Pipelines and Datafold's launch of the dbt and Data Diff integration.
Read More
Whatever layer of the data stack you focus on, there is inherently one technology that is valuable to all layers: testing. Discover how testing throughout the stack leads not only to stronger proactivity, but also to more productivity.
Read More
Read More
Whatever layer of the data stack you focus on, there is inherently one technology that is valuable to all layers: testing. Discover how testing throughout the stack leads not only to stronger proactivity, but also to more productivity.
Read More
Software testing has been a fundamental component of the software development life cycle (SDLC) for the last 40 years. Though the frameworks and methodologies for software testing have changed dramatically in the last four decades, the approach to data quality testing has not seen the same rate of change until recently.
Read More
Read More
Software testing has been a fundamental component of the software development life cycle (SDLC) for the last 40 years. Though the frameworks and methodologies for software testing have changed dramatically in the last four decades, the approach to data quality testing has not seen the same rate of change until recently.
Read More
To set the stage, let’s first discuss what data testing is all about and how it’s different from data observability. First, it’s important to note that Data Testing is part of the development and deployment workflow. The goal of Data Testing is to catch bugs before they hit production. Data Observability, on the other hand, is concerned with the state of your data after it has made it into production. Learn how data testing gets done today and the best practices for automating tests to improve data quality.
Read More
Read More
To set the stage, let’s first discuss what data testing is all about and how it’s different from data observability. First, it’s important to note that Data Testing is part of the development and deployment workflow. The goal of Data Testing is to catch bugs before they hit production. Data Observability, on the other hand, is concerned with the state of your data after it has made it into production. Learn how data testing gets done today and the best practices for automating tests to improve data quality.
Read More
Code review health is something teams should monitor and iterate on over time. Learn the best strategies for data teams to improve their code review process.
Read More
Read More
Code review health is something teams should monitor and iterate on over time. Learn the best strategies for data teams to improve their code review process.
Read More
Learn data validation and verification techniques to confirm a successful migration from Redshift to Snowflake.
Read More
Read More
Learn data validation and verification techniques to confirm a successful migration from Redshift to Snowflake.
Read More
To deliver high quality data quickly teams need principles to guide how they move, update, and manage data while keeping their stakeholders informed.
Read More
Read More
To deliver high quality data quickly teams need principles to guide how they move, update, and manage data while keeping their stakeholders informed.
Read More
A review SQL tools for performing data analysis from the command line covering ease of installation, supported data and file formats, SQL functionality, and benchmarking on a common query.
Read More
Read More
A review SQL tools for performing data analysis from the command line covering ease of installation, supported data and file formats, SQL functionality, and benchmarking on a common query.
Read More
Datafold has officially launched its partnership with Snowflake, the Data Cloud company by providing proactive testing solutions to help joint customers trust and depend on their data.
Read More
Read More
Datafold has officially launched its partnership with Snowflake, the Data Cloud company by providing proactive testing solutions to help joint customers trust and depend on their data.
Read More
Many teams create data, many teams consume data, and the data team is responsible for facilitating.
Read More
Read More
Many teams create data, many teams consume data, and the data team is responsible for facilitating.
Read More
Get an overview of the Data Quality Meetup #7. With speakers from Virgin Media O2, Convoy, Nixtla, Metaplane, and Doctolib, the event included lightning rounds on data contracts, data observability, real-time data-sharing, testing automation, and time-series forecasting.
Read More
Read More
Get an overview of the Data Quality Meetup #7. With speakers from Virgin Media O2, Convoy, Nixtla, Metaplane, and Doctolib, the event included lightning rounds on data contracts, data observability, real-time data-sharing, testing automation, and time-series forecasting.
Read More
Learn the difference between development vs production environments and what you should consider for your own development environment in dbt.
Read More
Read More
Learn the difference between development vs production environments and what you should consider for your own development environment in dbt.
Read More
In this episode of Monday Morning Data Chat, Gleb, Matt and Joe discuss improving the Modern Data Stack.
Read More
Read More
In this episode of Monday Morning Data Chat, Gleb, Matt and Joe discuss improving the Modern Data Stack.
Read More
How to setup and run dbt with airflow on your local machine.
Read More
Read More
How to setup and run dbt with airflow on your local machine.
Read More
Overview on getting started with dbt Core via CLI on an M1 Mac.
Read More
Read More
Overview on getting started with dbt Core via CLI on an M1 Mac.
Read More
Data contracts can help us prevent data quality issues by formalizing interactions and handovers between different systems (and teams) handling data.
Read More
Read More
Data contracts can help us prevent data quality issues by formalizing interactions and handovers between different systems (and teams) handling data.
Read More
Github offers many different features such as protected branches, pull requests, and code reviews. It is best practice to take advantage of all of these in your analytics workflow and use them to maintain the integrity of your data models.
Read More
Read More
Github offers many different features such as protected branches, pull requests, and code reviews. It is best practice to take advantage of all of these in your analytics workflow and use them to maintain the integrity of your data models.
Read More
Gleb and Simon discuss the Data Diff backstory including key design decisions, open sourcing, and how Data Diff works in practice.
Read More
Read More
Gleb and Simon discuss the Data Diff backstory including key design decisions, open sourcing, and how Data Diff works in practice.
Read More
Datafold now seamlessly integrates with Hightouch to show how changes to a dbt model will impact Hightouch models and syncs.
Read More
Read More
Datafold now seamlessly integrates with Hightouch to show how changes to a dbt model will impact Hightouch models and syncs.
Read More
A guide on how to onboard Analytics Engineers to your company and data stack.
Read More
Read More
A guide on how to onboard Analytics Engineers to your company and data stack.
Read More
Data diffing is the process of comparing two datasets. See various ways to compare data at different levels of complexity.
Read More
Read More
Data diffing is the process of comparing two datasets. See various ways to compare data at different levels of complexity.
Read More
Open source data-diff automates data quality checks for data replication and migration.
Read More
Read More
Open source data-diff automates data quality checks for data replication and migration.
Read More
Learn best practices for how to write and manage dbt tests in your organization.
Read More
Read More
Learn best practices for how to write and manage dbt tests in your organization.
Read More
Data lineage tools provide visibility into how data is connected upstream and downstream within a database.
Read More
Read More
Data lineage tools provide visibility into how data is connected upstream and downstream within a database.
Read More
It's official Datafold is now SOC2 Type II compliant. We follow a security by design approach to our software development process and are focused on keeping our customers' data safe.
Read More
Read More
It's official Datafold is now SOC2 Type II compliant. We follow a security by design approach to our software development process and are focused on keeping our customers' data safe.
Read More
Datafold has partnered with dbt Labs and has launched an integration with dbt to deliver column-level lineage, data diff, and shareable impact reports for analytics engineers.
Read More
Read More
Datafold has partnered with dbt Labs and has launched an integration with dbt to deliver column-level lineage, data diff, and shareable impact reports for analytics engineers.
Read More
2021 was a big year for Datafold. We reflect on top feature updates, blogs, and major company announcements from the past year.
Read More
Read More
2021 was a big year for Datafold. We reflect on top feature updates, blogs, and major company announcements from the past year.
Read More
Get an overview of the Data Quality Meetup #6. With speakers from Yelp, Patreon, Convoy, and Lightdash, the event included lightning rounds on data quality best practices and approaches from leading data-driven companies.
Read More
Read More
Get an overview of the Data Quality Meetup #6. With speakers from Yelp, Patreon, Convoy, and Lightdash, the event included lightning rounds on data quality best practices and approaches from leading data-driven companies.
Read More
Datafold Founder and CEO, Gleb Mezhanskiy, shares what prompted Datafold's creation, how it has grown, and plans for the future.
Read More
Read More
Datafold Founder and CEO, Gleb Mezhanskiy, shares what prompted Datafold's creation, how it has grown, and plans for the future.
Read More
What should you be looking for when doing data QA with Data Diff? There are three core checks that can help prevent surprises in production dashboards, and this blog walks you through what you're looking for in each step.
Read More
Read More
What should you be looking for when doing data QA with Data Diff? There are three core checks that can help prevent surprises in production dashboards, and this blog walks you through what you're looking for in each step.
Read More
There are plenty of rules around PII, but you can stay on top of where your sensitive data is flowing in your pipelines with column-level lineage.
Read More
Read More
There are plenty of rules around PII, but you can stay on top of where your sensitive data is flowing in your pipelines with column-level lineage.
Read More
Bad data cost Samsung and Uber ridiculous sums of money with issues that could have been averted if they had been invested in data quality management. Read about their mistakes, and see how you could avoid doing the same.
Read More
Read More
Bad data cost Samsung and Uber ridiculous sums of money with issues that could have been averted if they had been invested in data quality management. Read about their mistakes, and see how you could avoid doing the same.
Read More
If you want column-level lineage but you prefer tools like Amundsen or Data Hub, Datafold's GraphQL API lets you bring your metadata with you.
Read More
Read More
If you want column-level lineage but you prefer tools like Amundsen or Data Hub, Datafold's GraphQL API lets you bring your metadata with you.
Read More
Without proactive data quality management, mistakes will happen. What you do can help improve your data quality in the future. Data quality post-mortems are a valuable tool for building improved processes and systems, plus rebuilding stakeholder trust.
Read More
Read More
Without proactive data quality management, mistakes will happen. What you do can help improve your data quality in the future. Data quality post-mortems are a valuable tool for building improved processes and systems, plus rebuilding stakeholder trust.
Read More
It can be hard to even answer the question "is our data in good shape?" but these teams have gone on a journey towards improved data quality management. Here's how.
Read More
Read More
It can be hard to even answer the question "is our data in good shape?" but these teams have gone on a journey towards improved data quality management. Here's how.
Read More
Doordash, Rocket Money, Appfolio, Evidently.ai, and Narrator share valuable insights at the fifth Data Quality Meetup hosted by Datafold.
Read More
Read More
Doordash, Rocket Money, Appfolio, Evidently.ai, and Narrator share valuable insights at the fifth Data Quality Meetup hosted by Datafold.
Read More
SOC 2 compliance is a major step on our security journey. Here are some lessons we learned, as well as what Datafold's compliance means for your business.
Read More
Read More
SOC 2 compliance is a major step on our security journey. Here are some lessons we learned, as well as what Datafold's compliance means for your business.
Read More
In July 2021, Datafold co-founder and CEO Gleb Mezhanskiy went on the Data Engineering Podcast to share his thoughts about a proactive approach to data quality management.
Read More
Read More
In July 2021, Datafold co-founder and CEO Gleb Mezhanskiy went on the Data Engineering Podcast to share his thoughts about a proactive approach to data quality management.
Read More
If you're looking to build the ideal modern data stack for analytics using only open-source options, this is the blog for you. Find all the best open-source alternatives to your favorite paid tools.
Read More
Read More
If you're looking to build the ideal modern data stack for analytics using only open-source options, this is the blog for you. Find all the best open-source alternatives to your favorite paid tools.
Read More
Data quality is increasingly a top KPI for data teams, even as multiple sources of data are making it harder to maintain data quality and reliability. These tools can facilitate quality data at every step.
Read More
Read More
Data quality is increasingly a top KPI for data teams, even as multiple sources of data are making it harder to maintain data quality and reliability. These tools can facilitate quality data at every step.
Read More
Lightdash is an open-source alternative to Looker that natively integrates with dbt. It may not be as mature as other open-source products like Metabase, Querybook, or Superset, but it is different in a few essential ways.
Read More
Read More
Lightdash is an open-source alternative to Looker that natively integrates with dbt. It may not be as mature as other open-source products like Metabase, Querybook, or Superset, but it is different in a few essential ways.
Read More
Learn what steps your team needs to take to improve data quality and get the most out of your data.
Read More
Read More
Learn what steps your team needs to take to improve data quality and get the most out of your data.
Read More
Data quality is always evolving, so where is it in 2021? We asked and you answered - here are the results.
Read More
Read More
Data quality is always evolving, so where is it in 2021? We asked and you answered - here are the results.
Read More
Learn what steps your team needs to take to improve data quality and get the most out of your data.
Read More
Read More
Learn what steps your team needs to take to improve data quality and get the most out of your data.
Read More
Lyft vs. Shopify in testing ETL at scale, using fake data to align your stakeholders, and how to avoid nuclear meltdowns in your data platform.
Read More
Read More
Lyft vs. Shopify in testing ETL at scale, using fake data to align your stakeholders, and how to avoid nuclear meltdowns in your data platform.
Read More
Good Data: How Spotify, Shopify & Lyft approach data quality
Read More
Read More
Good Data: How Spotify, Shopify & Lyft approach data quality
Read More
Why implement regression testing for ETL code changes, how to align data producers and consumers, and what Data teams at Carta, Thumbtack, Shopify & Clari do to solve data quality.
Read More
Read More
Why implement regression testing for ETL code changes, how to align data producers and consumers, and what Data teams at Carta, Thumbtack, Shopify & Clari do to solve data quality.
Read More
Take your ETL workflow to the next level with Datafold and dbt integration that automates data testing and provides column-level data lineage
Read More
Read More
Take your ETL workflow to the next level with Datafold and dbt integration that automates data testing and provides column-level data lineage
Read More
The more people that are looking at the data, and the more apps that are using the data, the faster data quality issues will be identified and resolved.
Read More
Read More
The more people that are looking at the data, and the more apps that are using the data, the faster data quality issues will be identified and resolved.
Read More
On the second Data Quality Meetup, we discussed three types of data testing and when to apply them, new-generation ETL frameworks and ROI of open-source data catalogs.
Read More
Read More
On the second Data Quality Meetup, we discussed three types of data testing and when to apply them, new-generation ETL frameworks and ROI of open-source data catalogs.
Read More
Over the past 10 years, we've seen a great advancement in technologies and tools for analytics and machine learning: with today’s modern analytics stack, we have fast and scalable data warehouses, dirt-cheap data storage, capable ETL orchestrators, and powerful BI tools.
Read More
Read More
Over the past 10 years, we've seen a great advancement in technologies and tools for analytics and machine learning: with today’s modern analytics stack, we have fast and scalable data warehouses, dirt-cheap data storage, capable ETL orchestrators, and powerful BI tools.
Read More
Unlocking the next level with most popular ETL orchestrator
Read More
Read More
Unlocking the next level with most popular ETL orchestrator
Read More
Put a comma in the right place
Read More
Read More
Put a comma in the right place
Read More
Objective criteria and subjective advice when choosing a data warehouse for analytics.
Read More
Read More
Objective criteria and subjective advice when choosing a data warehouse for analytics.
Read More