Changelog

v1.19.0

August 20, 2021

Data Diff time travel in BigQuery and Snowflake

Time travel is a useful feature of some modern data warehouses that allows querying table at a particular point in time. Using that feature in combination with Data Diff can be very helpful to detect data drift in a table by diffing it against its older version. When testing changes in prod vs. dev environments, time travel can also help align both environments on the state of source data.

v1.18.8

August 12, 2021

Gitlab support for Data Diff

Now it’s possible to automate full impact analysis of every PR to ETL code in Gitlab repositories.See how a change in the code will impact the data produced in the current and downstream tables.

More information on how to set it up can be found in the docs.

Added support for alerts on scalar values

While the true power of ML-aided alerts comes from monitoring metrics in time, sometimes it may be helpful to check a single value against a set threshold.

v1.18.0

August 5, 2021

Catalog learns about your data from everywhere

Datafold will now automatically populate Catalog with column and table descriptions & tags from dbt, Snowflake, BigQuery, Redshift and other systems, creating a unified view.

Additional descriptions can be added using Datafold’s built-in rich text editor.

v1.17.1

July 27, 2021

Primary keys for dbt models for Data Diff CI integration can now be specified on a table level

  • Errors and warnings are now collapsed in Github/Gitlab comments to avoid bloat
  • Improved performance of the Catalog search filter
  • Improved handling of dbtCloud retries: Datafold now retries 4 times after receiving 500 errors from the dbtCloud service for up to 4 seconds
  • Data source log extractor for lineage can now be done on a cron schedule
  • Alerts now show the modified at timestamp
  • Improved chrontab validation: removed once-an-hour restrictions on scheduling
  • It is now possible to disable alert query notifications
  • Catalog now shows the timestamp when the dataset was last modified

v1.16.0

July 16, 2021

Customizable Tags

Since tags became a really popular way to document tables, columns, and alerts in Catalog, many of you have requested a better way to manage them including the ability to customize their color to enhance readability. Now all tags can be created, edited and deleted in the Settings menu.

Improvements:

  • Improved profiler reliability

v1.15.0

June 29, 2021

Interactive external dependencies

Lineage graphs can often get very complex and messy with all dependencies plotted at once. That’s why by default, Datafold shows a slice of the full lineage graph centered on a particular table (“dim_businesses” in the image below). That means that the graph will show tables and columns directly upstream or downstream of the chosen table. At the same time, downstream tables (“report_hourly_bysiness_pageviews”) may have other upstream dependencies unrelated to the table on which the lineage view is centered. To avoid bloat, those dependencies are shown as dashed lines. Clicking on them will center the lineage graph on the chosen table.

v1.13.0

May 28, 2021

Per-column Data Diff Tolerances

Sometimes it may be helpful to compare columns with a threshold instead of strict equality. For instance, when a database column is a FLOAT computed as a division of aggregates (e.g. COUNT(*) / SUM(someFloatCol)), the results of the computation are not strictly deterministic, resulting in differences that are irrelevant from the business standpoint but would be flagged by diff if strict equality is used: 1.1200033 vs. 1.1200058. Diff tolerance allows you to specify an absolute or relative threshold below which differences in values would be considered equal.

Tags autocomplete

When entering tags, you can rely on autocomplete to avoid creating semantically similar tags:

Improvements:

  • Fixed a bug that prevented admins from sending password reset emails
  • "Discourage manual profiling" flag added to data source settings. If the flag is set, when the user tries to refresh a data profile, a warning popup will appear.

v1.12.8

May 18, 2021

Fixed saving datasources and CI integrations with empty cron schedule.

v1.12.0

May 13, 2021

On-prem deployments now require an install password at first install used to check the state of the CI process.

v1.11.0

May 7, 2021

New Data Diff UI & Landing Page

Streamlined UI with more settings

Improvements:

  • The application now posts update messages when waiting for dbt runs to finish.
  • Added an API endpoint to get status of CI runs. It can be used to check state of a CI process.
  • Use standard notation for crontab format
  • Fixed a bug where the dbt meta schedule stopped working

v1.11.0

May 1, 2021

New application root page

  • The CI config ID is now visible in the CI settings screen
  • Allow using the dbt CLI to post the manifests to Datafold, so that Datafold can run diffs in a similar way as in the dbtCloud integration
  • Documentation is now available from the header in the app
  • Fixes a bug where the dbt cloud account number was passed as a string

v1.9.2

April 23, 2021

The dbt configuration now presents a list of accounts instead of hardcoding the account name manually.

v1.9.0

April 20, 2021

Automatic dbt docs sync to Datafold Catalog

  • Fixed a bug where Snowflake timezone-aware fields were compared against timezone-naive instances
  • Search: added Select all/ Deselect all to data source filter
  • Updated loading indication when loading data source schema
  • Search: the user is redirected on /search page when no results are round in as-you-type mode
  • Updated usage of URL params for search
  • Search: tree and sider are now responsive (expand if schema names don't fit into width)
  • Updated scrolling UX
  • Profiler: removed experimental_ guards from new profiling and sampling UIs
  • Profiler: fixed an issue with DATE & DATETIME for Snowflake table profiles

v1.8.9

April 15, 2021

  • Lineage: fixed hanging PostgreSQL query due to query planner misoptimization
  • Lineage: hotfix for Snowflake + dbt

v1.8.8

April, 2021

Lineage: multiple small bugfixes

v1.8.7

April 12, 2021

  • Lineage: support for Snowflake semistructured data
  • Lineage: fixed a bug where some parts of the graph were not displayed
  • Profiling: bugfix in settings
  • Data Diff: fixed handling of time datatype
  • Data Diff: soft-fail on inf and NaN float values
  • Made sure that CI data diffs are resilient to server-side interruptions

v1.8.6

April 9, 2021

  • Correctly display arrays and maps in profiler sample
  • Several bugfixes in lineage UI
  • Fixes in the color scheme
  • Added support for incremental SQL log fetching to build column-level lineage
  • Several fixes in the lineage query parser

v1.8.5

April 7, 2021

Incremental Column-level Lineage

Instead of querying the entire SQL query history, Datafold now looks at only new queries and updates the lineage graph incrementally. Currently works for Snowflake and Bigquery.

Faster Column Profiler

Now supports browsing super-wide (100+ col) tables without any interface lags.