August 20, 2021
Time travel is a useful feature of some modern data warehouses that allows querying table at a particular point in time. Using that feature in combination with Data Diff can be very helpful to detect data drift in a table by diffing it against its older version. When testing changes in prod vs. dev environments, time travel can also help align both environments on the state of source data.
August 12, 2021
Now it’s possible to automate full impact analysis of every PR to ETL code in Gitlab repositories.See how a change in the code will impact the data produced in the current and downstream tables.
More information on how to set it up can be found in the docs.
While the true power of ML-aided alerts comes from monitoring metrics in time, sometimes it may be helpful to check a single value against a set threshold.
August 5, 2021
Datafold will now automatically populate Catalog with column and table descriptions & tags from dbt, Snowflake, BigQuery, Redshift and other systems, creating a unified view.
Additional descriptions can be added using Datafold’s built-in rich text editor.
July 27, 2021
July 16, 2021
Since tags became a really popular way to document tables, columns, and alerts in Catalog, many of you have requested a better way to manage them including the ability to customize their color to enhance readability. Now all tags can be created, edited and deleted in the Settings menu.
June 29, 2021
Lineage graphs can often get very complex and messy with all dependencies plotted at once. That’s why by default, Datafold shows a slice of the full lineage graph centered on a particular table (“dim_businesses” in the image below). That means that the graph will show tables and columns directly upstream or downstream of the chosen table. At the same time, downstream tables (“report_hourly_bysiness_pageviews”) may have other upstream dependencies unrelated to the table on which the lineage view is centered. To avoid bloat, those dependencies are shown as dashed lines. Clicking on them will center the lineage graph on the chosen table.
May 28, 2021
Sometimes it may be helpful to compare columns with a threshold instead of strict equality. For instance, when a database column is a FLOAT computed as a division of aggregates (e.g. COUNT(*) / SUM(someFloatCol)), the results of the computation are not strictly deterministic, resulting in differences that are irrelevant from the business standpoint but would be flagged by diff if strict equality is used: 1.1200033 vs. 1.1200058. Diff tolerance allows you to specify an absolute or relative threshold below which differences in values would be considered equal.
When entering tags, you can rely on autocomplete to avoid creating semantically similar tags:
May 18, 2021
Fixed saving datasources and CI integrations with empty cron schedule.
May 13, 2021
On-prem deployments now require an install password at first install used to check the state of the CI process.
May 7, 2021
Streamlined UI with more settings
May 1, 2021
April 23, 2021
The dbt configuration now presents a list of accounts instead of hardcoding the account name manually.
April 20, 2021
April 15, 2021
Lineage: multiple small bugfixes
April 12, 2021
April 9, 2021
April 7, 2021
Instead of querying the entire SQL query history, Datafold now looks at only new queries and updates the lineage graph incrementally. Currently works for Snowflake and Bigquery.
Now supports browsing super-wide (100+ col) tables without any interface lags.