What to do with CI results in your dbt project?

In the first two articles of this series on better CI pipelines for your dbt Core project, we looked at how to build a super simple CI workflow with only 43 lines of code before taking things up another notch with adding 4 more integrations for a more advanced CI pipeline. Today, we’ll wrap things up by looking at how to interpret and make the most of CI check results. 

If you made it this far, I don’t need to persuade you of why Continuous Integration (CI) checks play a pivotal role in ensuring the reliability and efficiency of dbt projects. If you’re just joining us now, it’s quite likely that you’re looking for help with CI test failures that are blocking your PRs from merging to production. 

In teams that are new to building a culture of testing, where CI acts as a gatekeeper against bad code wreaking havoc on production, it can be hard to get buy-in from everyone on the data team since it adds more friction to their workflow. 

The first data meme I've ever made, and a pretty good representation of a real project!

It’s not enough to just set up CI pipelines; data teams need to be educated on how to respond to test failures. Setting up CI pipelines only helps accelerate development when everyone can own the end-to-end process of building models, modifying code, and confidently debugging any CI errors that pop up. 

CI results: What are they and how do you make the most use of them?

Let’s back up a bit. Continuous Integration (CI) is a process where code changes are automatically tested and integrated into a shared repository frequently, often multiple times a day. In dbt projects, CI checks involve running things such as automated tests on your data models, SQL queries, and configurations to ensure they meet certain company policies. 

Types of CI results 

CI results refer to the outcome of these automated tests, and whether they pass or fail criteria like:

  • dbt build: This builds and tests your models, including running seeds and snapshots, to ensure that your models build as expected. 
  • Configuration checks: CI validates that your YAML files are configured correctly during the build process; incorrect YAML will result in compilation errors or trigger failures in the CI pipeline. 
  • SQL linting: SQL linting checks analyze your SQL code for syntax errors, formatting inconsistencies, and potential performance issues. It helps maintain code quality and readability by enforcing coding standards.

When all tests pass, it indicates that your project is in good shape and ready for deployment. If any tests fail, it signals potential issues that need to be addressed, such as bugs in your data models, misconfigurations, or SQL syntax errors.

Debugging for everyone 

Debugging is tough. It’s probably one of the most frustrating experiences in data engineering, and more so if you’re the sole data engineer on a team entrusted with maintaining business critical pipelines. Being faced with CI failures with vague or non-existent errors that resist every solution you throw at it is demoralizing, and eats up precious time spent on new projects. 

The good news is that the distribution of errors pairs well with the Pareto principle: only a handful of errors make up 80% of what you’ll see day-to-day. We’ll first discuss what best practices around debugging look like. Then, I’ll go over the top three cases of CI failure, and share what you should look out for and strategies for resolving them more effectively. 

Best practices for debugging less painfully

I’ve been coding for a few years now, but the feeling of seeing in-progress GitHub Actions jobs turn from the spinner symbol to a big red ❌ never fails to still trigger deep panic in me. Having the CI pipeline fail repeatedly, and mysteriously, is a rude ending to the flow state I enjoyed during hours of development. Instead of frantically copying and pasting error messages into your search browser or these days, ChatGPT, take a second to practice a systematic approach that may save you hours of frustration. 

  1. Read the error message carefully 

Always read the error message carefully before jumping into problem-solving mode. Error messages are a great example of information overload, but there’s often a lot of duplication of information and you’ll have to read between the lines to find clues to the root problem (or more likely, problems). 

This seems really obvious but it’s one that I tend to skim over as I’m impatient to try some solutions based on (often incorrect) hunches I have. A good example comes from when I was recently debugging an error while creating the demo CI pipeline repository. The 11th CI job failed at the dbt build step:

This was confusing as the tables were being created and did exist in Snowflake, so how could it no longer access the table? Because I had made a bunch of big configuration changes alongside other code changes in this PR, I started undoing what I had done to try and triangulate which change had caused a permissions issue. This was the wrong approach, as I learnt after a fruitless hour undoing commits and watching the GitHub Actions spinner result in the angry looking ❌each time. 

  1. Ask for help sooner than you think necessary

I reluctantly jumped on a Zoom call with a colleague, who focused only on what the error message was saying, and systematically verified my profiles.yml Snowflake secrets with what I had in Snowflake itself. In two minutes he found the source of my problems: I was using a Snowflake account that didn’t have sufficient permissions to access a schema I had switched to. He upgraded my access and the CI job ran successfully. 

As we gain more experience as developers, we also become better at pattern-matching problems and having a finely honed intuition for what the likely cause is. I probably would have ended up figuring out what the problem was eventually, but it would have taken hours compared to my colleague’s 5 minutes. 

The top 3 CI failures in dbt projects

dbt build issues

What to look out for:

CI jobs often fail at this step where dbt build is unsuccessful. It could look like compilation errors, missing dependencies, dbt test failures, or conflicts between different versions of dbt or its dependencies.

What a solution looks like:

Check if your dbt project's dependencies are up to date and compatible with the version of dbt being used. Update dependencies as needed. Consider updating your dbt version to the latest stable release to leverage bug fixes, performance improvements, and new features.

- name: Check and update dbt dependencies
        run: |
          dbt deps

Ensure that your dbt project's configuration files (dbt_project.yml and profiles.yml) are correctly specified. 

Check that any required environment variables are properly set and where they’re located. For example, in our pre_commit_checks.yml file, note how the Snowflake environment variables are attached to the entire job and not a specific step. 

    runs-on: ubuntu-latest
      SNOWFLAKE_SCHEMA: "${{ format('{0}_{1}', 'PR_NUM', github.event.pull_request.number) }}" 

This means that these variables will be available to all steps within the job, ensuring that any part of the CI process that needs access to Snowflake credentials can use them. If security is a significant concern or if only a few steps require the credentials, it is better to attach them to specific steps to minimize the risk of exposure.

Configuration issues

What to look out for:

Configuration-related failures in CI checks may include errors related to missing or improperly configured environment variables, syntax errors in YAML configuration files, or conflicts between configuration settings.

What a solution looks like:

Again, verify that all required environment variables are properly set, either within your CI/CD environment or through configuration files.

Check for syntax errors or formatting issues in YAML configuration files, ensuring that they adhere to the correct structure and conventions. Use a YAML linter like Prettier to automatically validate YAML syntax and identify any inconsistencies or errors.

SQL linting error

What to look out for:

SQL linting errors indicate issues with the syntax, formatting, or style of SQL code used in your dbt project. The errors can vary greatly depending on your SQL dialect of choice, the linting integrations you use, and what rules you specify. For example, we’re fans of SQLFluff at Datafold, and in my own demo repositories of CI pipelines (simple, advanced), I specified the Snowflake dialect and excluded two rules:

# Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html
# Or run 'sqlfluff dialects'
dialect = snowflake
# Comma separated list of rules to exclude, or None
exclude_rules = L001,L029
max_line_length = 120

L001 identifies inappropriate spacing, including excessive whitespace, while L029 does not allow keywords to be used as identifiers. It’s up to you and your team to decide what rules you wish to exclude as it is highly dependent on your specific business context and existing coding conventions. 

What a solution looks like:

Refactor the affected SQL code to address the reported issues, such as correcting syntax errors, improving code readability, or adhering to established coding standards.

Getting help more effectively

In a subset of cases, you may have done all the right things and still not figured it out. It’s quite possible you might be grappling with unknown unknowns: contextual blindspots that you may not even be aware that you had. 

This is why knowing when to ask for help is itself a best practice–though many of us are reluctant to exercise for fear of looking stupid. 

There are good and bad ways of asking other data engineers for help. The bad way looks like this:

Hey everyone, I'm stuck on a linting error in my dbt project. 
Can someone help me figure it out?

There’s a better way:

Hi team, I'm facing a SQL linting error in my dbt project that I'm struggling to resolve. 
Here are the specifics:
- Problem: The error message indicates inconsistent indentation in one of my SQL models.
- Context: This model is part of our customer analytics suite and is responsible for aggregating purchase data.
- Code Snippet: Here's the relevant snippet from the affected model:    
    {{ config(materialized='table') }}
            sum(total_amount) as total_purchase_amount
        {{ ref('sales') }}
    group by
- Attempts: So far, I've tried adjusting the indentation of the `sum` function to align with the `select` statement. However, the error persists.

When seeking help, aim to provide a clear and concise description of the issue at hand. A well-defined problem facilitates quicker comprehension and targeted assistance. Include details like:

  • Error messages
  • What steps you’ve taken so far
  • Any hunches or guesses you have about the error
  • The versions of any relevant dependencies
  • Code snippets, as inspecting the code directly can reveal things you missed the first time

After debugging, optimize

Once you’ve figured out the source of an error and know how to fix it, it’s worth thinking more broadly about what aspects of your dbt project could be optimized to prevent these issues from recurring. For example:

  • Updating dbt versions to leverage new features and bug fixes.
  • Standardizing and validating configuration settings to avoid misconfigurations.
  • Refactoring SQL code to improve readability, performance, and maintainability.

What should your CI data pipeline look like?

If you’re ready to put the pieces together and set up a comprehensive and mature CI pipeline, here’s how we can help you do just that:

Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.

Datafold is the fastest way to test dbt code changes