Evaluating dbt Cloud features vs dbt Core
dbt Cloud is a managed service from dbt Labs that provides a web-based UI for data analysts to develop, test, and deploy code changes to their data warehouse. dbt Core is a command line tool that powers dbt Cloud.
For data teams looking to adopt dbt, the first question they’ll need to answer is “dbt Cloud or dbt Core?”. While dbt Core is free and open-source, dbt Cloud is feature rich and solves many problems that you’ll encounter when implementing dbt Core.
In this article, we will compare dbt Cloud vs dbt Core with a focus on the dbt Cloud features, and what it takes to implement equivalent features for dbt Core.
Why use dbt?
Fundamentally, dbt (data build tool) compiles code to SQL, and runs it against your database.
More important than any specific functionality, dbt takes standard software engineering principles and applies them to the historically neglected data space. dbt is a gateway drug that introduces data teams to concepts like version control, modularity, and testing. dbt provides a launchpad for data teams to take their work to the next level.
What is dbt Core?
dbt Core is an open-source command line tool that helps data engineers and analysts manage data warehouse transformations. You install and use dbt Core on the command line. For some, this will feel right at home; for others, this might be intimidating.
You can install the dbt-core package through a data warehouse specific package like dbt-snowflake that contains all the code needed for dbt to interact with the data warehouse.
Executing dbt run for a single model will compile the code to raw SQL and create the data models in the data warehouse:
What is dbt Cloud?
While a fantastic tool, dbt Core requires quite a bit of gumption and know-how to make good use of all its features. dbt Cloud is a turnkey hosted service powered under the hood by dbt Core. You don’t need dbt Cloud to run dbt, but it will have you up and running quickly and easily.
Key features of dbt Cloud
dbt Cloud is simple to implement, includes support for scheduling jobs, CI/CD, testing, documentation, and even has a tidy IDE.
Integrated development environment (IDE)
dbt Cloud’s IDE allows users to work on their dbt project right from their web browser. Some folks on your team might be passionate (like, really passionate) about their text editor extensions, managing Python virtual environments, and using git via CLI. But for less experienced or opinionated teammates, the dbt Cloud IDE is an easy and approachable way to work.
The dbt Cloud IDE is a one-stop-shop where you can write and run code (SQL, Python, and Jinja). You can visualize model relations in a DAG, generate docs, and version control code changes via git. The editor even has ergonomic features like code diff view, autocomplete, and a code formatter.
Alternative IDE options for dbt Core
For all its bells and whistles, the dbt Cloud IDE will always lack some customization options that seasoned developers can’t live without. VSCode is a popular choice for editors, and offers handy packages like dbt-power-user. For terminal customization, Zsh has dozens of themes available via Oh My Zsh.
If you ever find yourself in a git quagmire, the dbt IDE can be limiting. Experienced developers will likely prefer to git their way through merge conflicts using a classic terminal.
dbt Cloud job scheduling provides a slick interface to easily run jobs on a schedule, view historical logs, configure error notifications, and refresh documentation.
Despite the perception that dbt Cloud might be limiting for teams with complex deployments, you can still pull off things like blue/green deployments, zero-copy-clones, and drop deprecated models from production. The run-operation command is a great way to invoke DDL via macros in your dbt runs, and offers a ton of flexibility. The biggest drawback of orchestration using dbt Cloud is that you only have access to dbt commands, and cannot run arbitrary bash commands.
Alternative job scheduling options for dbt Core
GitHub Actions, Gitlab CI, Airflow and Prefect are all viable alternatives to using dbt Cloud’s scheduler. Overarching orchestration tools can be used to trigger dbt jobs as part of a larger process, like linking data ingestion to data transformation. For organizations where you can leverage engineering resources and expertise, glomming onto established and supported technologies can make sense.
Continuous integration (CI) is the process of ensuring that new code properly integrates with the rest of your project. In order to test the compatibility of proposed code changes, dbt Cloud’s CI job will build impacted models in a staging schema. At the bare minimum, compiling the project will ensure that syntax and references are valid. Depending on the level of rigor needed, tests can be run or the models can be manually reviewed.
Alternative CI options for dbt Core
For the bold, you can recreate a CI job similar to dbt’s via GitHub Actions. For a more robust CI process, you can use Datafold’s dbt integration to see how code changes impact data, from summary stats to row-level detail. It’s not always clear how code changes will manifest as data changes. Datafold allows you to quickly and confidently evaluate proposed changes to your dbt project. It’s again worth noting that dbt Cloud limits you to dbt commands, which some teams may find restrictive.
dbt Cloud offers notifications for job statuses that can be sent via Slack or email. With just a few clicks, you can configure alerts for failing jobs. Sending timely notifications to the correct people can help resolve issues faster!
Alternative notification options for dbt Core
You can create similar generic alerts with a generic orchestrator like Airflow. Tools like fal and Elementary offer a bit more nuance with alerts, particularly with regard to tests. dbt job notifications lack detail on why your job failed; these two tools can be used to add actionable context to alerts.
By integrating documentation with the job scheduler, dbt Cloud makes it easy to generate and render documentation for your dbt project. With a single check box, you can set documentation to update with each run. Because documentation is built into the development workflow, dbt docs represent a living and accurate representation of your dbt project. Other developers, business stakeholders, and your future self can all reference dbt docs in order to understand model relationships and logic.
Docs can be referenced via the documentation tab in dbt Cloud, or from the IDE.
Alternative documentation options for dbt Core
When self-managing dbt Core, you can host dbt docs via Amazon S3 or Netlify, and have a similar experience. But again, this requires more legwork and infrastructure skills to secure your docs. (Noticing a theme?)
dbt docs use model ref()s to build a dependency graph between models, and show table-level lineage. For more advanced docs, Datafold provides dbt column-level lineage. Datafold analyzes SQL statements in your data warehouse to produce an even more robust graph.
dbt’s semantic layer allows you to define key business metrics in code, and reference metrics in downstream tools like Hex, Mode, Atlan, and more. dbt metrics can be aggregated at different grains, allowing end-users to pivot metrics by dimensions on the fly, all while maintaining the same source of truth. Metrics can be defined as dbt metadata, and highlighted in your DAG. All of this helps address the age-old struggle of KPIs with multiple definitions living in different tools and systems. The dbt Semantic Layer is currently in Public Preview, available to dbt Cloud accounts.
Alternatives to the dbt semantics layer
Looker’s LookML is a compelling alternative to dbt’s semantic layer, and offers much of the same functionality, albeit at the BI level. Notably, the dbt semantic layer does not integrate with Looker. If you’re using Looker, or another tool with BI as code, the dbt semantic layer might not be for you.
dbt Cloud includes API access for customers on the Team or Enterprise plans. There are two APIs: dbt Cloud Administrative API, dbt Metadata API.
The Admin API can be used to kick off jobs, download artifacts, and manage your dbt accounts. The Metadata API contains information about your project, and can be used to improve the quality and efficiency of your project.
Alternative options to the dbt API
While there is no like-for-like alternative to the Admin API, there are alternatives to the Metadata API. Packages like Elementary and re_data can be used to collect metadata and analyze your project.
dbt Cloud pricing
dbt Labs offers tiered pricing for dbt Cloud. Generally speaking, higher tiers offer additional seats and features.
The dbt Cloud Developer plan is free and includes one developer seat. This plan contains most features covered in this article, including an IDE, job scheduling, CI, and GitHub/GitLab integrations.
This plan suits data teams of one.
The dbt Cloud Team plan is priced at $100 per seat, up to eight developer seats, and five read-only seats. Read-only seats would be occupied by teammates who only view documentation, and do not contribute code.The team plan has all the features of the Developer plan, plus API access and the semantic layer.
This plan suits small to medium data teams.
The dbt Cloud Enterprise plan has custom pricing. It offers features that many enterprise customers demand, like single sign-on, multiple deployment regions, service level agreements, and audit logging. A notable “proper” feature is that Enterprise supports unlimited projects. For larger organizations, multiple projects can offer flexibility that a single monolith project might lack.
Even moderately sized data teams may quickly find themselves in the Enterprise plan, particularly if you expect contributors from across the organization or for broad usage of dbt docs.
dbt is a fantastic tool that has transformed the data analytics space, and I strongly recommend that all data teams should consider adopting dbt.
dbt Cloud is the quickest and easiest way to start running with dbt. However, dbt Cloud offers few, if any, features that you can’t piece together on your own. When evaluating dbt Cloud, the question you should be asking is not, “can I do this on my own?” but “why would I do this on my own?”
Especially for smaller data teams, dbt Cloud is well worth skipping the hassle of managing your own dbt infrastructure. The largest and most frequently overlooked cost in the Modern Data Stack is … you! dbt Cloud costs a fraction of the price of a full time employee, and frees data engineers and analysts to focus on delivering high value analytics to the business.
Kick the tires with a free trial of dbt Cloud!
Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.