Changelog
Datafold Product Newsletter September 2023 — 🍂 Falling into data conferences 🍂
As we say goodbye to the last few days of summer, we start saying hello to #conferenceseason 🎃. This fall, you can expect to see the Datafold team IRL at 3️⃣ upcoming conferences!
On the product front, the Datafold team has been hard at work improving your experiences with data diffing, Datafold Cloud, and new product innovations. Here’s an overview of what’s new:
- 1️⃣ Datafold Cloud Looker Integration
- 2️⃣ Automatic primary key inference for incremental and snapshot dbt models
- 3️⃣ Less noise, more signal in the enhanced Datafold Cloud CI printout
🐦❤️👁️ Datafold Cloud Looker Integration
ICYMI we launched the Datafold Cloud Looker Integration: bringing enhanced lineage and impact analysis to your dbt project and beyond. Using the Looker integration, you can:
- Visualize Looker assets (Explores, Views, Dashboards, and Looks) in Datafold’s column-level lineage
- See potentially impacted Looker assets from your dbt code change in the Datafold CI comment

Yes, we think this is some very cool tech (what can we say we’re a bit biased 😂). But more importantly we think that this means you stop getting those “you broke my dashboard” DMs 😉.
⚡Automatic primary key inference for incremental and snapshot models
Previously, Datafold Cloud identified primary keys from an additional YAML config or the dbt uniqueness test. Now, when you define a unique_key in your dbt model config, Datafold Cloud will automatically infer that is the primary key to be used for Datafold’s diffing. Unique keys defined in dbt can be both singular or composite keys. This is particularly useful for more complex incremental and snapshot models, where you may want to define a unique key, but not test uniqueness in dbt.

🔊 Enhanced Datafold Cloud CI printout: Goodbye noise, hello signal
Datafold CI comment will soon highlight which values are different between dev and prod by pulling them to the top of the comment. This will reduce alert fatigue and make it much easier to see whether your code changes will change the data (and how), or keep it the same.
Rows, columns and PKs that are not different will be grouped together under the NO DIFFERENCES dropdown.
Please note that this feature is currently being rolled out to existing customers over the next few weeks.

👋 Come see the Datafold team IRL at upcoming conferences!
We’d love to hear about your data quality pain points and wins face-to-face at some upcoming data conferences. The Datafold team will be present at the following:
- 🇬🇧 Big Data London (Sept. 20-21) — get your FREE tickets here! Our team will be at booth #552 ready to talk all things data quality, UK football, and best pubs in London 😉
- 🏖️ Coalesce Conference San Diego (Oct. 16-19) — tickets can be found here. We’ll be at booth #108 offering tips on data quality and some cool refreshments 🍹 And don’t forget to join us at our Coalesce After Party with Hightouch, Airbyte, Databricks, Secoda, and Hex 🥳!

- 🎡 …and we couldn’t get enough of London, so we’ll back again to the UK for Coalesce London (Oct. 17) — get your tickets here.
Oh, and did we mention we’ll have some fun swag there for folks who come by and say hi 😎 We can’t wait to see you there!
Downstream Looker assets in Pull Requests and Lineage

When you make a change to your dbt project, how do you make sure Looker Views, Explores, Looks, and Dashboards don’t unexpectedly change—breaking data pipelines, business processes, and stakeholder trust?
🙁 Opening many tabs and fiddling with dashboards
😭 CTRL+F-ing your LookML repository
🗯️ Asking teammates on Slack
🤔 🤷♀️ 🥲
Starting today, your answer to that can be simple: “Datafold.”
We’ve launched a Looker integration that shows Datafold Cloud users the Looker Views, Explores, Looks, and Dashboards that could be impacted by your dbt models.
These Looker assets will be visible in Column-Level Lineage in the Datafold Cloud UI …

… as well as right within your pull request:

What about dbt Exposures? What about Spectacles?
- dbt Exposures require manual configuration, which is not scalable or automated. Datafold Cloud’s Looker integration Just Works™️.
- Spectacles in CI will tell you if your LookML is broken, but not if the data changed. This is like a dbt build in CI which is successful, but the data is wrong. Datafold and Spectacles work great as side-by-side partners to ensure you’re only allowing the highest quality data into your BI tool.
Is this only for dbt models?
Nope — Looker assets that are downstream of any data warehouse object will appear in Datafold Cloud Column Level Lineage.
Enough! How can I get started today?
To get started with the Datafold Cloud Looker integration, please reach out to a Datafold Solutions Engineer who can get you set up. You can also check out our docs 📚 to see how simple it is to begin.
VS Code extension, improved Datafold Cloud CI, and upcoming launches
💡 Did you know that you Datafold Cloud’s column-level lineage includes assets outside of your dbt project? Because Datafold’s column-level lineage is built based on your data warehouse’s query history (and not your dbt project’s manifest.json), you can have a full view of how your data moves its way through your ecosystem—including through all those dbt models, ad hoc tables built by analysts, and BI tool assets.
Now back to our scheduled program….the Datafold team has been hard at work improving your experiences with data diffing, Datafold Cloud, and new product innovations. Here’s an overview of what’s new:
1️⃣ Datafold VS Code Extension
2️⃣ Quality of life product improvements in Datafold Cloud (intuitive column remapping in CI and saying goodbye 👋 to stuck CI)
3️⃣ Some very exciting product launches on the horizon (hint: BI tool integrations in Datafold Cloud)
🚀 Datafold VS Code extension
ICYMI we launched the Datafold VS Code extension: a powerful new developer tool bringing data diffing directly to your dev environment. Use the Datafold VS Code extension to quickly run and diff dbt models in clean GUI, and develop dbt models with confidence and speed.
In addition, by installing the Datafold VS Code extension, you’ll receive free 30-day trial access to value-level differences—a Datafold Cloud exclusive (❗) feature. Join us in the #tools-datafold channel in the dbt Community Slack for feedback and any questions about this 🙂.
☁️ Datafold Cloud improvements
Column remapping in CI comments
If your PR includes updates to column names, you can specify these updates in your git commit message using the following syntax: datafold remap old_column_name new_column_name. That way, Datafold will understand that the renamed column should be compared to the column in the production data with the original name 🙏.
By specifying column remapping in the commit message, when you rename a column, instead of thinking one column has been removed, and another has been added, Datafold will recognize that the column has been renamed.

In the example above, the column sub_plan is renamed to plan, and Datafold recognizes these are the same column with this commit message. This feature is particularly useful if there are changes to upstream data sources that impact many downstream models.
Faster, leaner, and smarter Datafold in CI
Datafold is all about giving you the information you need, where and when you need it, as soon as possible. That includes getting out of the way quickly when it's not yet time to data diff. Now, when your dbt PR job does not complete for any reason, Datafold will detect that right away and cancel itself, allowing your CI checks to complete. Everyone loves faster (unstuck) CI!

👀 Coming soon - betas and upcoming launches
Keep an eye out for exciting developments on:
- 📈 Evolved lineage with Looker and Tableau integrations in Datafold Cloud. If your team is interested in seeing the Looker integration live, come join us at an upcoming Datafold Cloud Demo!
- 🔀 Cross-data warehouse diffing for accelerated database migrations and validating data replication.
- ...and more!
Happy diffing!
🆕 Announcing the Datafold VS Code Extension
We’ve launched the Datafold VS Code Extension—a new developer experience tool that’s integrating data quality testing, data diffing, and Datafold into your development workflow.
The VS Code extension is an enhancement of the open source data-diff product from Datafold. Using the extension, you can easily install open source data-diff, run your dbt models, and see immediate diffing results between your dev and prod environments in a clean GUI—all within your VS Code IDE.

⬇️ Install the Datafold VS Code Extension
You can install the Datafold VS Code Extension by using the VS Code Extension tab.

💻 Data diff a dbt model using the GUI
Once you’ve followed the simple steps in our documentation to get started, you’ll be able to diff any dbt model or set of models using either the simple GUI.
First, open the Datafold Extension by clicking on the bird of Datafold on the left hand side of your VS Code window. Then, click on any model's "play" button to run a data diff between the development and production version of that model.
💡 Be sure to dbt build or dbt run any models that you plan to edit or diff, to ensure relevant development data models and dbt artifacts exist.

⚒️ Data diff your most recent dbt run or build
You can also use the “Datafold: Diff latest dbt run results” command in the VS Code command palate. This enables you to automatically diff a group of models that were built in the last dbt build or dbt run.

🔎 Explore value-level data diff results
By installing the Datafold VSCode extension, you’ll receive free 30-day trial access to value-level differences—a Datafold Cloud exclusive feature (❕). To see value-level differences, click on the blue "Explore values diff" next to the "Values" section to see and interact with value-level differences.


👁️ Data diff in real time as you develop with Watch Mode
In the settings of the Datafold VS Code extension, you can enable "Diff watch mode." With watch mode on, the Datafold VS Code Extension will automatically run diffs after each dbt invocation that changes the run_results.json of your dbt project. Turn on this setting if you want diffs to be automatically run between changed dbt models.


🎥 Demo video
Watch Datafold Solutions Engineer Sung Won Chung install and use the Datafold VS Code extension!
📖 Resources
For additional resources, please check out the following:
- Detailed docs on the Datafold VS Code functionality
- Blog post on why we built this extension, and where we see it going in the future
Happy diffing!
Skip diffs, advanced filters, and a beta Looker integration!
We’re excited to share some new product updates that give you greater control over what gets diffed, how you interact with diffs and their results with advanced filtering, and identifying how code changes impact your BI tools.
Here’s an overview of what’s new:
- Skip diffs with commit messages
- More powerful values filtering
- Datafold’s new Looker integration pre-release
⏩ Skip diff functionality with commit message
We get it—not every commit needs a diff! Now, you can choose to skip a diff generated by a commit by adding this string (datafold-skip-ci) in your commit message. By adding this string anywhere in your commit message, your commit will not trigger a Datafold CI run.
This feature is particularly useful if you’re adding in hotfix commits, committing many commits back-to-back in a short timeframe, or looking to reduce compute costs from unnecessary diff runs.

➖ Negative filtering in search
Never has filtering been more intuitive! We recently added functionality for negative search in Lineage data explorer. Using negative search, by adding a dash (-) before the term to exclude, you can more easily filter on specific patterns of schemas (compared to deselecting those that don’t meet your criteria). We’ve additionally added support for * and ? wildcards, where * matches any number of characters and ? matches any single character.
Examples:
- ORG_ACTIVITY -DEV will match any asset name that contains ORG_ACTIVITY and does not include the string DEV
- RUDDERSTACK*MARKETING will match any asset name that contains RUDDERSTACK followed by MARKETING at any point in the string
- PR_???_ will match any asset name that contains PR_ followed by any 3 characters and _ . For example, PR_???_ will return PR_123_ and exclude PR_12_ from search results

💥 More powerful diffs and values filtering
We’ve added new filtering capabilities in your diffs log and values-level diffs to make searching for diffs (and potential errors) faster and easier.
Quickly filter out diffs with differences
Using the Result filter in your log of Data Diffs, simply filter on Different to find only diffs where there were differences.

Filter columns in the UX
For Data Diff results with many differing columns, you’re now able to search and filter columns at scale—no more never-ending scrolling to the right to find the column you need!
To use, open the “Show Filters” menu to select and sort across your diff results at scale.

Filter columns by value
For easier value-level diff navigation, you can filter on specific column-values. Simply click on the gray filter symbol to the right of a column name and input the value you want to filter for. For example, look for a diff based on a primary key that’s giving you an issue!

👭 Join our Looker integration beta!
We’re very excited to share a pre-release of Datafold's new Looker integration! If your business uses Looker for reporting, you can enable Looker Views in Datafold’s lineage explorer and see potentially impacted Looker Views in Datafold’s CI comment—bringing impact analysis beyond your dbt project.
If you have any interest in trying out the new Datafold Cloud Looker integration early, please sign-up here.
👀 Coming soon
Keep an eye out for exciting developments on:
- 💻 Enhanced developer experience with our ✨new✨ VSCode Extension—click here if you would like to be a beta tester 🧑🔬
- 🔀 Cross-data warehouse diffing (pssst if you're interested in trying the alpha for this, please respond to the product newsletter email or email gleb@datafold.com)
- 📈 More BI tool integrations
- ...and more!
Data Diff Management + Version Control Integration
We’ve increased the amount of context available from your Github & Gitlab integrations to the Datafold user interface so you can more clearly understand the relationship between your diffs and specific commits and pull requests.
Filter Data Diffs by the pull request creator
Easily filter for pull requests by Github or Gitlab user names, trace that back to the specific pull request or the commit that triggered it.

Data Diffs grouped by pull request
This makes it easier to navigate between your pull request, commit history and the associated diffs, tracking changes and validation over time.

Grouped diff deletion and cancellation
You can now select a group of diffs and click the Delete Data Diff or Cancel Data Diff options in the top right section of the page.

Streamlined Data Diff Results
We’ve shortened the feedback loop in our results pages to rapidly show more relevant information. For example, you’ll now see column-level metrics earlier in the CI/CD and command-line data-diff results. We’ll also show downstream app dependencies for each Data Diff within the UI, allowing you to quickly get the appropriate lineage for a given downstream dependency.
.gif)
data-diff -- dbt
We have released the dbt integration for our open-source data-diff tool. Data-diff helps to quantify the difference between any two tables in your database. You can now see the data impact of dbt code changes directly from your command line interface. No more ad-hoc SQL queries or aimlessly clicking through thousands of rows in spreadsheet dumps.
If you use dbt with Snowflake, BigQuery, Redshift, Postgres, Databricks, or DuckDB, try it out and share your feedback. It only takes a couple settings and a one line command to see your data diffs in development:dbt run --select <model(s)> && data-diff --dbt
This shows the state of your data before and after your proposed code change:

Use this dbt + data-diff integration to quickly ensure your code changes have the intended effect before opening up a pull request.
We’re excited to hear your feedback via the project’s GitHub project page or the #tools-datafold channel in the dbt Slack community.
Diffing Hightouch models in CI/CD
We’ve all been there - accidentally breaking downstream dependencies that don’t live in your warehouse. How could you possibly have known what another team’s pesky filter was going to be? Well your business intelligence, marketing & operations teams can sleep more soundly knowing that your data team has full visibility into these types of breaking changes, and can prevent them before they happen.
Ship faster and more confidently after Datafold compiles and materializes Hightouch models based on each branch of a change, and then diffs them to flag any potential changes in the query output. We see teams moving towards faster and actionable data pipelines every day, so confidence in every change is vital to keeping your team humming along.
Diffing Hightouch models will show up alongside your standard diff results in Github comments and within the Datafold app.
To get started, first configure your Hightouch account within Datafold. Then, since this feature is still in beta, opt-in here to enable it!
.png)
CI Jobs Management
It’s now even easier to manage CI runs within Datafold, and we’ve added several navigation improvements. The goal is to make it even easier to manage your CI jobs at scale.
- First, find and filter for your CI job quickly via the Datafold Jobs tab, which is now visible to all users.
- Status Page - We’ve added a more detailed CI job status page with a breakdown of individual steps and results
- Cancel + Rerun CI Jobs - You can now easily cancel running jobs, or rerun jobs within the Datafold user interface.
.gif)
All users now have access to the Jobs user interface, and are able to see a CI Job results page, and view the individual Data Diffs associated. Each CI Job results page contains the status of all data diffs, intermediate steps, and gives you the ability to cancel an active CI Job run. Excited to hear your feedback.
Clearer Diff Sampling Logic
Sampling diff results is helpful for speedy and efficient checks of extremely large data sets. However, sometimes you need to ensure 100% test coverage of every single row, even for large data sets. To assist, we’ve added more clarity to when and where data will be sampled during the in-app diff creation workflow.
You can now explicitly disable sampling for a diff. Users running data migrations, where running a data diff against the entirety of a dataset is required for user acceptance testing, it’s now clearer and easier.
.png)
Introducing Slim Diff in CI/CD
- Slim Diff helps teams prioritize business-critical models in CI/CD workflows - it gives teams control over exactly which models to diff on each pull request. When enabled - Slim Diff runs data diffs for only specified models based on dbt metadata, and skips models that aren’t explicitly tagged or are excluded from data diffing.
Column Remapping in Data Diff creation flow
- Quickly remap columns within the Data Diff UI or API creation flow for known column name changes to ensure all columns are compared correctly.

Schema Comparison Sorting
- Faster schema comparisons to see what changed inline, especially when column order has changed.

Cancel In-Progress Data Diffs
- Now you can quickly cancel currently running diffs in both the Data Diff results, as well as the administrator interface. As always, you can cancel all diffs within CI run as before from the same administrator interface.

Globally exclude tables from CI/CD diffs
- Use your dbt metadata to exclude particular folders or models from being tested against in CI/CD workflows. Use cases vary from excluding sensitive tables to unsupported downstream usages. Your data team can configure Datafold to be aligned with their priorities.
Lightning-fast in-database comparisons for the data-diff library + DuckDB support
- Have you ever wanted to quickly and easily get a diff comparison of two tables in your dbt development workflow? Now you can! Our wonderful Solutions Engineers spun up a tutorial on how to use our open-source data-diff library to find potential bugs that unit testing or monitoring would have missed.
- Additionally, our data-diff community contributors have continued to improve the product - including adding DuckDB support. We appreciate the support @jardayn!
- The latest release of Datafold’s free, open-source data-diff library is optimized for even faster Data Diffs within the same database. Compare any two tables within a warehouse and receive a detailed breakdown of schema, row and column differences.
Improved Diff Results Sorting and Filtering
- We’ve added improved sorting and filtering interfaces to the Data Diffs analysis workflow, making it easy to find specific rows within your diff results. For example, if you’re trying to confirm that the values for a particular primary key in your sea of modified data changed exactly as expected, filter for the specific primary key or changed column value you’re looking for.

CSV Export
- You can now export CSVs of Data Diff results and primary keys that are exclusive to one of the datasets in your comparison! This is perfect for debugging and reconciling missing data between two data sets, and sharing that information across your organization.

- Don’t forget you can always materialize your Data Diff results to a table in your database and natively join your results to your source data, or do a deeper analysis on those differences. Enabling this setting in the Data Diff creation flow via our API or the Datafold app will create a table in your temporary schema with matched rows, values, and flags for which columns.

Materialize diff results to table is an option within the Data Diff creation workflow in both the Datafold App and our REST API.
Lineage Usage Metrics
- Column and Table-level query metrics in Lineage - right-click on any table or column reference within the Datafold Lineage UI to view how many times a particular user account has read or written to a particular table, allowing you to identify commonly or infrequently used data points.

- Popularity metrics now include all cumulative downstream usage of column or table, showing the total downstream reads for a particular client.
- Popularity Filters - Filter lineage nodes by their relative popularity compared to all indexed tables in Lineage
Data Diff Improvements
- Cancel CI Job button via the Datafold Jobs UI - Admin users are now able to cancel CI/CD diff tasks via the Jobs UI in Admin Settings.
- Copy Data Diff Configuration JSON to Clipboard - the info button within the diff results page now contains a button to copy the JSON payload required to create a diff via the REST API.

- Set diff time travel logic at the dbt-model level. For example, if your dev and production tables have known differences due to timing of incremental source data, you can add a time-travel configuration to ignore the most recent data, preventing false positives in CI/CD. Learn more about time travel here and more about dbt metadata configuration here.
Other Improvements
- Catalog search improvements to weight exact-text matches more aggressively, and hide less relevant results.
- Datafold CI/CD integration now populates a list of deleted dbt models within the pull request comments.
- Improve lineage support for dbt-based Hightouch models
Popularity counters in Lineage
To help understand how frequently the assets in your warehouse are used, Lineage now displays an absolute access count per table and column for the last 7 days. To help you interpret that information, a relevant popularity rating from 0 to 4 is assigned, indicating how relatively popular a particular database object is relative to others.

Other changes
- For on-premise deployments, we now support data diff in CI for Github on-premise servers. To use your own private Github server instead of a cloud version (https://github.com), set a <span class="code">GITHUB_SERVER</span> environment variable and set it to your Github on-prem URL.
- In the app, the BI Settings section has been renamed to “Data Apps” and now includes both Mode and Hightouch integrations.
- Performance improvements to lineage.
- In the Lineage UI, Hightouch models and syncs now link to Hightouch App. This can be configured using the “workspace URL" field in the Hightouch integration settings.
- Visual improvements to data source names and logos in Catalog and Lineage.
- Updated display of long names of tables in Lineage.
- Popularity is now a general filter in Catalog. It can be applied to both tables and columns.
- Data Source and Data App source filters in Catalog are now merged for better search experience.
- Users can now add, remove, and query tags for Mode dashboards, Hightouch models, and Hightouch syncs using GraphQL API.
- Added usage info for tables and columns to GraphQL API.
- CI configurations can now be paused, preventing them from running checks on pull requests.
- Added support for BigQuery’s bignumeric and bigdecimal data types.
- Now data source mapper field in Data Apps create/edit form is validated after all the data sources are mapped.
- In the Data App settings, we’ve added direct links to our documentation.
Bug fixes
- In some cases, data diffs were not canceled after CI run cancellation. These diffs were stuck in a WAITING status forever.
Multidimensional Alerts (Beta)
Users can use <span class="code">GROUP BY</span> in alert queries to dynamically produce several time series at once. Each dimension is named after the values of the dimensional/categorical field(s) of <span class="code">GROUP BY </span>; its thresholds and anomaly detection can be configured separately. New time series will appear (and disappear over time) according to the data’s changes without the need to modify a plethora of alerts with <span class="code">WHERE</span> filters.
This feature is currently in Beta and is available upon request — please reach out to support@datafold.com to enable it for your organization.


Datafold <> Hightouch Integration
Hightouch models and syncs are now discoverable through the Datafold Catalog and visible in Datafold’s Column-Level Lineage - making it possible to trace data from source to activation.
This feature is currently available upon request — please reach out to support@datafold.com to enable it for your organization.

See downstream data applications in PR Comments
Datafold now shows downstream data applications, e.g. Mode reports and Hightouch syncs, that might be affected by a code change.

Data Diff results materialization
Users can now save Data Diff results in their databases for further analysis. Current support is limited to PK duplicates, exclusive PKs, and all value level differences.


Other changes
- Significantly improved CI-based Data Diff performance for large warehouses with many tables, schemas, etc.
- Expandable metric graphs to make comparison more convenient.
- For On-Premises Implementations - If the environment variable <span class="code">DATAFOLD_AUTO_VERIFY_SAML_USERS</span> is set to "true", then users created during SAML sign-up will not have to verify their emails.
- Better display for values match indicator in Data Diff -> Values tab.
- Reformatted long alert names in the filter popup for readability.
Bug fixes
- Resolved the issue where the Datafold-sdk failed to perform a primary keys check for manifest.json if there were some tables in the manifest that had not yet been created in DB.
- Jobs request fails when filters are cleared.
Databricks support
You can now add Databricks as a data source, with full support for Data Diff, table profiling, and column-level lineage.

Other changes
- Data Diff sampling thresholds are no longer limited to hardcoded defaults and can now be configured from the UI.
- We updated the Jobs page to make connection types, table names, and runtimes easier to read.
Bug fixes
- Slack and email alert notifications were not delivered for some customers between 2022-05-31 18:00 UTC and 2022-06-07 11:00 UTC (SaaS)
- Profile histograms and completeness info did not render immediately on load.
- Job Source filter did not contain all the possible values that our API can return.
- “Created time” and “last updated time” were not displayed in the list of Jobs.
- Incorrect status in gitlab CI pipelines. Datafold App will no longer block a merge if something is wrong with the Datafold App.
Lineage UI filters
Navigating large lineage graphs is now easier with filters that help filter out the noise. Datasource/database/schema filters allow you to control the amount of information displayed.

User group mapping between Datafold and SAML Identity Providers
Organizations using a SAML Identity Provider (Okta, Duo, and others) to authenticate users to Datafold via Single Sign-On can now set up a mapping between SAML and Datafold user groups.. Users will be automatically assigned to desired Datafold groups according to the pre-configured mapping when using SAML login.
This feature is available on request — please get in touch with Datafold to enable it for your organization.


Other changes
- Added a special method to our SDK to check the correctness of dbt artifacts submitted to Datafold when using the dbt Core integration. Now Data Diff can finish even if something is wrong with uploading dbt artifacts. See the documentation for details.
- Now Datafold shows Slack users/groups with the conventional @-form, like in the Slack App.
- SAML validation & configuration errors are now exposed to users so that they can debug their setup.
Bug fixes
- Sometimes the job status is displayed as `notAvailable`.
- BI reports with special characters in names (slashes, hashes, etc) are not displayed or routed correctly.
- When BI report's preview is downloaded with an error, the loading indicator is displayed forever.
- Multi-word search requests were squashed, omitting spaces.
- Inviting a user that was already in Datafold caused an error with an unclear message. Now it says explicitly that the problem is with the user being already invited.
Lineage UI filters
Navigating large lineage graphs is now easier with filters that help filter out the noise. Datasource/database/schema filters allow you to control the amount of information displayed.

User group mapping between Datafold and SAML Identity Providers
Organizations using a SAML Identity Provider (Okta, Duo, and others) to authenticate users to Datafold via Single Sign-On can now set up a mapping between SAML and Datafold user groups.. Users will be automatically assigned to desired Datafold groups according to the pre-configured mapping when using SAML login.
This feature is available on request — please get in touch with Datafold to enable it for your organization.


Other changes
- Added a special method to our SDK to check the correctness of dbt artifacts submitted to Datafold when using the dbt Core integration. Now Data Diff can finish even if something is wrong with uploading dbt artifacts. See the documentation for details.
- Now Datafold shows Slack users/groups with the conventional @-form, like in the Slack App.
- SAML validation & configuration errors are now exposed to users so that they can debug their setup.
Bug fixes
- Sometimes the job status is displayed as `notAvailable`.
- BI reports with special characters in names (slashes, hashes, etc) are not displayed or routed correctly.
- When BI report's preview is downloaded with an error, the loading indicator is displayed forever.
- Multi-word search requests were squashed, omitting spaces.
- Inviting a user that was already in Datafold caused an error with an unclear message. Now it says explicitly that the problem is with the user being already invited.
Data Diff sampling for small tables disabled by default
To avoid unnecessary overhead, Data Diff sampling is disabled for smaller tables. At this point the thresholds for table size are hardcoded defaults, configuration UI is coming. See the documentation for more details.
Other changes
- Alert query columns are automatically classified to time dimension and metric columns; there is no more need to put the time column first.
- Datafold no longer uses labels on GitLab to track the status of the Data Diff process, the status can now be tracked from the CI pipelines functionality.
Bug fixes
- Issue with include and exclude columns in diffs
- Off-charts dependencies of the in-focus table in Lineage are now displayed (and act) correctly as "Show more" → Change direction of Lineage
- The Settings menu item in the Admin section is sometimes not rendered correctly
- Catalog search by one- and two-letter words does not work
- Rows with NULL primary keys always got filtered out during data diff if sampling had been enabled
Data Diff filters can be configured in the dbt model YAML
Now you can configure Data Diff filter defaults in dbt model YAML. Filtering can be used to force Data Diff to compare only a subset of data, i.e. you may want to compare just the latest week to save DWH resources and reduce diff execution time. See the documentation for details.

Other changes
- Selecting a column and its connected nodes in Lineage is now followed by an indicator that also allows to exit the selected path mode. Click on empty space is deprecated.

- Fold sections of Github / Gitlab printouts to save screen space. They can be easily unfolded to check verbose diff information.
- Show actual Slack error codes on test notifications, so that users can debug their Slack-Datafold integration.
- Datafold now sends a confirmation email when SAML users are auto-created.
- Now Lineage is showing all columns of table that are in the database, not only ones that have connections detected by Lineage.
- Improvement to the autocomplete feature in Data Diff.
Bug fixes
- API key not copied into clipboard with input built-in tool
- Cell data in Data Diff Sampling tab is not copied from the popover
- Sometimes NaN appears instead of alert weekly estimates.
- Disabled users logging in through OAuth no longer raise an error.
You can now receive Alert notifications at arbitrary webhooks with arbitrary payloads (including but not limited to JSON) — in addition to Slack & email notifications. See the documentation for details.
This feature is available only on request — please contact Datafold to enable it for your organization.

For API-first users, all API errors from all API endpoints are now unified as per RFC-7807 with the same structured JSON payload, the 4xx HTTP status codes are normalized for most cases. This might simplify parsing the error messages, for example, due to invalid input and incompatible configuration. The UI error messages will be more descriptive in some cases where they were not.

Other changes
- A new API endpoint <span class="code">`/api/v1/dbt/check_artifacts/{ci_id}`</span>to check for dbt artifacts after uploading. This endpoint might be triggered during a CI process, for example, in Github actions or Gitlab CI, to help Datafold understand the status of downstream tasks.
- Improved performance of dataset suggestions in Data Diff, now search-based.
Bug fixes
- Lineage off-chart dependencies for upstream nodes not displayed
- Snowflake table/column casing issues are resolved
- Special characters are now properly handled on the data source names
- Table profiling will not be done for disabled data sources
- Lineage column selection dropped after table expansion
- Jobs UI now shows main jobs instead of result sub-jobs for profiling and data diffs
- Off-chart edge switches lineage direction for primary table
- Redirect to lineage from profile was sometimes broken
Refactored navigation design

Other changes
- Improved formatting of integers for column profiles in Data Diff

- Now we're displaying columns list, their description and tags in Profile, even if profiling is disabled

- Added excludes/includes support to GraphQL search endpoint
Bug fixes
- Fix: lineage not expanding for the second time
- Fix: last run filter in search showing numbers instead of days/weeks
- Fix: expanding lineage showing incomplete list of tables
- Fix: incorrect sorting in a primary key block in the Data Diff UI
- Fix: ability to navigate to data source creation dialog with non-confirmed e-mail
SAML
Organizations can now use any SAML Identity Provider to authenticate users to Datafold via Single Sign-On. This includes Google, Okta, Duo, and many others, including private/corporate identity providers.

Other changes
- During CI runs, data diff jobs will automatically select a created_at or updated_at column with an appropriate timestamp type as the time dimension
- Catalog search has been improved in both performance and result ranking
- Tags automatically created during dbt processes that have been superseded are periodically removed
- A custom database can be specified for Lineage metadata in Snowflake sources
Bug fixes
- Masked fields in Snowflake data sources could cause errors when materializing temporary tables
- Disabled users could not be re-enabled
- Posting labels to Gitlab triggered notifications when there were no changes
- Table profiling failing for views in PostgreSQL data sources
New Lineage UI
The lineage UI was updated to improve the performance for large graphs and to make exploring dependencies more intuitive. Among other changes, the view now distinguishes between upstream and downstream graph directions, and filter settings have moved to the top to provide a larger area for the lineage canvas.

Improved Slack alert messages
To make the anomaly notifications more actionable, the notifications now include the alert name, the actual value and provide more context to the anomaly that occurred.

Reduced verbosity for new tables in the Data Diff CI output
When new tables are created in a PR, the block has been reduced to only show the number of rows and number of columns, and a link to the table profile is inserted.

Other changes
- Automatically created tags from ETL are now cleaned up automatically after their initial use to reduce tag clutter
Bug fixes
- BI dashboards stopped displaying in the catalog
- Added missing icons of BI data sources
- Lineage paging stopped loading off-chart dependencies
- Github refresh button didn’t work correctly
- dbt metadata synchronization for dbt older than 1.0.0 in combination with Snowflake didn’t work correctly
Fine-grained control of what data assets show up in Datafold
There is such a thing as too much data observability. To help you separate signal from the noise and only see tables that actually matter, we added fine-grained settings that allow you to define which databases, schemas, and tables should show up in Datafold Catalog and Lineage and which should be hidden (e.g. dev/temp tables). The filtered out data assets can still be found by their full name (e.g. “db.schema.table”)

Alert subscriptions for Slack user groups
Slack user groups can be now subscribed to alerts — e.g. all members of team X, on-call engineers, incident commanders. Special handles @channel & @here can also be notified in case of alerts — for all or currently online members of a channel respectively.

Pausing data source in the UI
You can temporarily disable or pause data source in the UI


Other changes
- Subscribed users will be notified in case an alert has an execution error (e.g. database permission/connection failure) — not only on actual anomalies
- Improved alert texts in Slack
- Dramatic speedup of schema download from Snowflake
- For Data Diff in CI, unchanged tables are grouped at the top of the report
- For manually created Data Diffs, the primary key case is automatically inferred
- Data diffs on Snowflake are now running much, much faster
Bug fixes
- Fix: Notifications were sent to deleted integrations/destinations for some time after the deletion. No more
- Fix: Slack App integrations were sometimes not showing users & channels if reinstalled from Slack, not from Datafold
- Fix: Plain CI configuration could not be saved/edited when the template variables section was empty.
- Fix: Setting update time for Alerts
- Fix: Proper DB types mapping for the new Snowflake schema downloader
- Fix: non-existing Slack users are filtered from Alerts
- Fix: A lot of upstream deps take too much space in the layout. Now we're showing the first 3, and the rest are available in Lineage UI
- Fix: Multiple tables in a CI diff were too large for a single comment post. The tables are now paginated across multiple comments
- Fix: Hours jump in Alerts time picker
Data Diff can now compare VARIANT type in Snowflake

Other changes
- Added the ability to pause a data source in the API. When a data source is paused, all its data is retained in the system but schema indexing, profiling, and lineage processing are disabled
- Improved error reporting for Redshift data sources when Datafold does not have permissions to access the table
- Lineage speed improvements
Bug fixes
- Fixed a bug where spaces in Data Diff values tab were missing
- Fixed an issue where a Github integration didn't show an error message when it cannot be deleted
- Fixed a bug where the user invite link for organizations that have Okta enabled did not work
- Fixed a bug where BI reports could appear orphaned, not having any links to tables
- Fixed a bug where a CI run could fail if the dbt manifest didn’t contain the raw relation name
- Fixed a bug where the CI reported booleans instead of numbers for the number of mismatched columns
- Fixed a bug in CI where, when a table has no differences, the link to the table profile malfunctioned
- Fixed testing Github repository connections
- Fixed Slack notifications where the integration could not be deleted if currently used in alerts. In the new behavior, it will unsubscribe all related notification methods from alerts as the integration is deleted
Allow CI to continue if Data Diff fails
When you integrate Data Diff into the CI flow, you can control whether an error during Data Diff processing causes the CI flow to fail or continue. This allows you to configure Datafold to be non-blocking in your CI which can be helpful when introducing Data Diff in your development process initially.

Support for key-pair authentication for Snowflake
In our effort to support the most secure practices possible, we’ve added the ability to configure a Snowflake data source to use key-pair authentication. This is more secure than password authentication alone. See Datafold’s Snowflake documentation for details.

Other changes
- Visually collapse Data Diff reports if no changes are detected to save users time
- Optimized schema fetching during a data diff to reduce the runtime of a single diff, as well as the load on the data warehouse
- Irrelevant diff views are not hidden if the primary key was not specified
- The “time dimension” field in the Data Diff view now suggests only date/time columns
Bug fixes
- Integrations could not be deleted if they were used in any alerts
- Minor rendering issue with Datafold logo on the login page
- In the Data Diff view, each of the Dataset text entry fields had its input blocked while its loading indicator was active
Data Diffs without primary keys
Now you can run data diffs without specifying primary keys to compare table schemas and column profiles. Specifying primary keys is required for value-level comparison.

Other changes
- GitLab CI integrations now respect the file ignore lists (previously, it was supported only for GitHub)
- Improved filters autocomplete performance
Bug fixes
- Alert deletion could sometimes be slow or time out
- An unnecessary expand icon in the data source tree filter is not shown anymore
- UI could break if you had more than 500 tags in the organization
Data Diff improvements
Sum and Average diff metrics
Data Diff now also compares sums and averages for numerical columns which can be helpful for analyzing changes in distributions:

Improved handling of long values
When browsing value-level diffs, overflowing values can be explored and compared by hovering over them. The long values can now be copied to clipboard for further analysis.

Ignoring certain files in Data Diff CI
A new setting for CI integrations allows users to selectively ignore files modified in a PR and skip running Datafold for irrelevant changes. Files can be excluded, re-included, and re-excluded again, thus allowing complex patterns for the cases like “only run datadiffs if any dbt files have changed, except for the .txt and .md files in that folder”.

Lineage Improvements
Original SQL queries
You can see SQL query that was used to create/update a table or refresh a BI report in both Datafold Catalog or Lineage views:


BI report filtering
BI reports in Lineage can now be filtered by popularity and freshness:

Mode dashboard previews
You can see a preview screenshot for any Mode report on the Datafold Lineage graph:

Other changes
- Timepicker in Alerts schedule now has a correct “Now” button that converts current time to UTC using the time zone from the browser
- Now you can use Cmd/Ctrl + Click to open a data diff or an alert in a new tab
- You can now see “Last run datetime” in the list of alerts.
Bug fixes
- SQL queries are now visible again in Profile and Lineage for tables
- Multiple lineage UI improvements
New Datafold Slack App and alert subscriptions
Adding Slack channel destinations is easier with the new Slack App. Users can subscribe to alerts and get mentioned in the designated channels allowing for more targeted alerting and collaborative incident resolution. Documentation is available here.


Single sign-on through Okta
Single sign-on through Okta is now available for Datafold Cloud.

Datafold <> Mode Integration now in beta
Mode reports are now discoverable through Datafold Catalog and appear in Lineage which enables tracing data flows on a field level all the way to Mode reports and dashboards. Let us know if you would like to enable it for your account.


Other changes
- Fix for faux-off-chart-deps in Lineage
- Added a UTC notation to Last Run in Catalog results
- Row counts in Diff now take time travel specifiers into account
- Improved refreshes for the GitHub app to use the app authentication token instead of user to server token
- Added the database name to all Redshift and PostgreSQL tables. This allows for use of dbt integration for those databases, and lineage in case of Redshift if cross-database queries are used in the ETL process.
Diffing for advanced data types
Data Diff can now compare Snowflake's VARIANT and ARRAY types. Profiling information won't be generated for those columns, but they will show up in overall statistics, and in the Values tab. Previously VARIANT and ARRAY types were ignored during comparisons.

Improved diff sampling
When comparing tables (for example, Staging and Prod versions of your dbt model), Data Diff provides a sample of divergent values for every column that doesn’t fully match between tables. Previously Diff would select ~15 rows for every column that had differences. If there were just a few such columns, the overall sample size could be quite small. The algorithm now selects ~1,000 rows regardless of the number of columns that are different.

Bug fixes
- Fixed an issue where the “$” character was not accepted in a password
- Improved integer formatting throughout the app
- Improved performance in the Catalog search input
- Fixed 5+ smaller UI issues
Mode reports in Lineage & Catalog
Mode is now available as an integration in Datafold in alpha testing mode. Once enabled, Datafold will index all reports in your Mode account to make them available in the Datafold Catalog search and Lineage.
You can now discover relevant Mode reports alongside datasets in the same search experience. It’s also possible to filter Mode reports based on popularity and freshness.
You can trace field-level data lineage to Mode reports in the Datafold Lineage view to see which tables and columns feed what report, making it easy to perform refactorings and troubleshoot issues:

New Jobs UI
With the new Jobs UI you can check what tasks are currently running in your Datafold account and easily troubleshoot various integrations such as Diff in CI as well as audit the use of Datafold.

Bug fixes
- Fixed displaying of Alert schedules when an hourly interval is selected.
Automatic inference of primary keys for dbt models + CLI tool to check primary key settings for Data Diff

For Data Diff to work in CI, it needs to know the primary key for each table it analyzes. Datafold provides a few options for defining primary keys in the dbt model:
- Define it as meta.primary_key in dbt YAML
- Define it as a table or column-level tag in dbt YAML
- Automatically infer primary keys based on uniqueness tests
To help you ensure that Data Diff can look up or infer primary keys for all tables in your dbt project, we added check-primary-keys command to the Datafold CLI.
Quickly navigate to columns using Go To search bar in Diff UI
Now you can quickly jump to any column in the Diff Values tab which can be helpful when diffing especially wide tables:

Run Data Diffs only with the Datafold label
There are situations where you don't want to run Data Diff in your CI unconditionally. Running it on every change, is the recommended way, to make sure that you don't let any unindented changes slip through. Similar to running the unit and integration tests in the CI, you don't want to disable the tests, since it will likely break a test without you knowing it.
When you're integrating Data Diff, you sometimes want to try it on a select number of changes. This is why we added a new option to the CI integration:

Checking this box won't start a Data Diff right away when opening up a new Pull Request. After setting the Datafold label in Github/Gitlab, it will start the actually diff.

Improvements for Postgres data sources
Postgres has a feature where a currently logged in user can change to acquire only the privileges of a selected role. This is done using the <span class="code">SET ROLE</span> command. <span class="code">SET ROLE</span> effectively drops all the privileges assigned directly to the session user and to the other roles it is a member of, leaving only the privileges available to the named role. This is now implemented for both PostgreSQL and PostgreSQL Aurora as an extra optional parameter in the datasource configuration.

For Aurora PostgreSQL data sources, we've also added an optional keep-alive setting that will allow you to turn on keep-alives for very long running queries. This is a parameter specified in seconds. Leave the option empty to disable keep alives.
Tooltips added to data source fields to avoid confusion
To provide some more context to the options available in the data sources configuration screen, we have added tooltips. We hope this makes the configuration settings a little bit easier without changing back-and-forth between our documentation pages.

Optimization for GraphQL
Our new GraphQL API is also becoming more mature. We applied a performance optimization for loading database and schema info. Previously it was required to load the tables first, but those can now be queried separately.
Bug fixes
We have also added a couple of bug fixes:
- Fixes bug where a CI configuration could not be created without the require_label set
- Fixes selected suggestion id flashing in search autocomplete
- Fixes page size navigation in the Data Diff's Values tab
- Fixes error that was thrown when empty sampling results arrived in the Table Profile sample tab
- Fixes the frontend flooded with 500 errors when alert estimates encountered an error
- Fixes sampling table not being re-rendered when new results come in after reload
Improved messaging on the GitHub integration
This update is based on customer feedback to have more meaningful feedback in the Data Diff process. We updated more information to the GitHub statuses when running the Data Diff:

For example, we include the git hash of the job that it is waiting for. After the job starts, it will show a link to the actual job:

This can be either the job building the pull-request or the main branch. This helps to understand what’s going on when running the Data Diff, and what it is waiting for.
datafold-sdk upload-and-wait
The datafold-sdk is used for synchronizing the information after a dbt run into Datafold. Datafold will extract the table and column information and it is used for Data Diff when running on a pull request.
It is a common practice to clean up the tables after a run on a pull request has ran. But Datafold might need these tables to run the Data Diff. Therefore we have the Datafold upload-and-wait command. Instead of starting the Data Diff asynchronously, it will block for the Data Diff to complete. This makes sure that you don’t drop all the tables before the Data Diff has finished.
Catalog support for dbt sources and seeds
Datafold works seamlessly with dbt. With the latest version of Datafold, we support synchronizing the metadata from dbt’s sources and seeds. Sources are tables that are external to dbt, often tables in the landing zone. When declaring a source, you can annotate it with additional information, which is also synchronized to Datafold.

Smart scheduler
New Smart Scheduler service to manage data source concurrency when scheduling table profiling tasks.

We’ve implemented a new scheduler that we call the smart scheduler. Most users know that certain tasks can impose some load on the data warehouse. This allows us to have more control on the tasks that are running, resulting in a more predictable load. We built this together with our Redshift users because Redshift doesn’t handle concurrency very well. This provides a way to run the tasks in a gentle way.
Descriptive errors on profiling errors
It can happen that a query against the data warehouse results in an error. Maybe the database is offline? Maybe the table is huge and it takes a very long time? Or in the example, below we’re having a divide by zero at runtime. We now have more informative errors when the profiling job fails.

Lineage edges are now hoverable showing source and target nodes, which are highlighted on edge click.
Improved Lineage navigation: when switching central table origin, also switch table for Profiling and Sampling tabs.
Add GraphQL API for lineage
GraphQL is an increasingly popular method for retrieving information. It gives the developer more control over the desired entities and which specific fields they want to access. We now support a GraphQL API for our lineage information. Read more about it in this technical blog.
We’re continuously adding more information to the GraphQL API. For the latest state, please refer to the documentation.
Support dbt_utils for inferring Primary Keys
For running Datafold, we use the primary key of the table to see what changed. One popular way of checking this constraint is using the unique_combination_of_columns function from dbt_utils. With Datafold we now detect the use of these tests, and infer the primary key from it. This allows you to easily get started with Data Diff. Next to this, you can always set the primary key explicitly if desired.

Revamped the signup flow with new UI to create better user experience, and simplified dbt configurations.
Data Diff time travel in BigQuery and Snowflake
Time travel is a useful feature of some modern data warehouses that allows querying table at a particular point in time. Using that feature in combination with Data Diff can be very helpful to detect data drift in a table by diffing it against its older version. When testing changes in prod vs. dev environments, time travel can also help align both environments on the state of source data.
Gitlab support for Data Diff

Now it’s possible to automate full impact analysis of every PR to ETL code in Gitlab repositories.See how a change in the code will impact the data produced in the current and downstream tables.
More information on how to set it up can be found in the docs.
Added support for alerts on scalar values
While the true power of ML-aided alerts comes from monitoring metrics in time, sometimes it may be helpful to check a single value against a set threshold.
Catalog learns about your data from everywhere

Datafold will now automatically populate Catalog with column and table descriptions & tags from dbt, Snowflake, BigQuery, Redshift and other systems, creating a unified view.
Additional descriptions can be added using Datafold’s built-in rich text editor.
Primary keys for dbt models for Data Diff CI integration can now be specified on a table level

- Errors and warnings are now collapsed in Github/Gitlab comments to avoid bloat
- Improved performance of the Catalog search filter
- Improved handling of dbtCloud retries: Datafold now retries 4 times after receiving 500 errors from the dbtCloud service for up to 4 seconds
- Data source log extractor for lineage can now be done on a cron schedule
- Alerts now show the modified at timestamp
- Improved chrontab validation: removed once-an-hour restrictions on scheduling
- It is now possible to disable alert query notifications
- Catalog now shows the timestamp when the dataset was last modified
Customizable Tags

Since tags became a really popular way to document tables, columns, and alerts in Catalog, many of you have requested a better way to manage them including the ability to customize their color to enhance readability. Now all tags can be created, edited and deleted in the Settings menu.
Improvements
- Improved profiler reliability
Interactive external dependencies

Lineage graphs can often get very complex and messy with all dependencies plotted at once. That’s why by default, Datafold shows a slice of the full lineage graph centered on a particular table (“dim_businesses” in the image below). That means that the graph will show tables and columns directly upstream or downstream of the chosen table.
At the same time, downstream tables (“report_hourly_bysiness_pageviews”) may have other upstream dependencies unrelated to the table on which the lineage view is centered. To avoid bloat, those dependencies are shown as dashed lines. Clicking on them will center the lineage graph on the chosen table.
Per-column Data Diff Tolerances

Sometimes it may be helpful to compare columns with a threshold instead of strict equality. For instance, when a database column is a FLOAT computed as a division of aggregates (e.g. COUNT(*) / SUM(someFloatCol)), the results of the computation are not strictly deterministic, resulting in differences that are irrelevant from the business standpoint but would be flagged by diff if strict equality is used: 1.1200033 vs. 1.1200058. Diff tolerance allows you to specify an absolute or relative threshold below which differences in values would be considered equal.
Tags autocomplete
When entering tags, you can rely on autocomplete to avoid creating semantically similar tags:
Improvements
- Fixed a bug that prevented admins from sending password reset emails
- "Discourage manual profiling" flag added to data source settings. If the flag is set, when the user tries to refresh a data profile, a warning popup will appear.
Fixed saving datasources and CI integrations with empty cron schedule.
On-prem deployments now require an install password at first install used to check the state of the CI process.
New Data Diff UI & Landing Page

Streamlined UI with more settings
Improvements
- The application now posts update messages when waiting for dbt runs to finish.
- Added an API endpoint to get status of CI runs. It can be used to check state of a CI process.
- Use standard notation for crontab format
- Fixed a bug where the dbt meta schedule stopped working
New application root page

- The CI config ID is now visible in the CI settings screen
- Allow using the dbt CLI to post the manifests to Datafold, so that Datafold can run diffs in a similar way as in the dbtCloud integration
- Documentation is now available from the header in the app
- Fixes a bug where the dbt cloud account number was passed as a string
The dbt configuration now presents a list of accounts instead of hardcoding the account name manually.
Automatic dbt docs sync to Datafold Catalog

- Fixed a bug where Snowflake timezone-aware fields were compared against timezone-naive instances
- Search: added <span class="code">Select all</span>/ <span class="code">Deselect all</span> to data source filter
- Updated loading indication when loading data source schema
- Search: the user is redirected on <span class="code">/search page when no results are round in <span class="code">as-you-type</span> mode
- Updated usage of URL params for search
- Search: tree and sider are now responsive (expand if schema names don't fit into width)
- Updated scrolling UX
- Profiler: removed <span class="code">experimental_</span> guards from new profiling and sampling UIs
- Profiler: fixed an issue with DATE & DATETIME for Snowflake table profiles
- Lineage: fixed hanging PostgreSQL query due to query planner misoptimization
- Lineage: hotfix for Snowflake + dbt
- Lineage: multiple small bugfixes
- Lineage: support for Snowflake semistructured data
- Lineage: fixed a bug where some parts of the graph were not displayed
- Profiling: bugfix in settings
- Data Diff: fixed handling of <span class="code"> time</span> datatype
- Data Diff: soft-fail on <span class="code">inf</span> and <span class="code">NaN</span> float values
- Made sure that CI data diffs are resilient to server-side interruptions
- Correctly display arrays and maps in profiler sample
- Several bugfixes in lineage UI
- Fixes in the color scheme
- Added support for incremental SQL log fetching to build column-level lineage
- Several fixes in the lineage query parser

Incremental Column-level Lineage
Instead of querying the entire SQL query history, Datafold now looks at only new queries and updates the lineage graph incrementally. Currently works for Snowflake and Bigquery.
Faster Column Profiler
Now supports browsing super-wide (100+ col) tables without any interface lags.