3 cloud data migration tools to consider

Migrating data to the cloud is tricky work riddled with surprises. You might start by thinking, “Well, we’re just moving from Oracle to Snowflake, and both of those are SQL.” Then you find out Oracle’s SQL is pretty different from Snowflake’s SQL and migrating from one to the other is gonna take a lot more than asking ChatGPT to write the script for you. (I hope you like cast() functions!)

Shortly you’ll find yourself spending as much mental effort navigating the tooling landscape as it takes to do the work itself. It’s difficult—if not impossible—to know what each tool actually does. Their feature descriptions all use the same words, but do slightly different things. So, you start downloading them, only to find that most of them are extremely manual. They output Excel files and lack the documentation you need to be successful.

To call it a struggle is an understatement. Every tool provider is after that sweet enterprise money with big data migrations and large license needs. It’s made the space super crowded. Searching google for “cloud data migration tools” will yield almost nothing useful. It’s become a pay-to-play minefield.

So, we’re here to make your life a little easier and walk you through the tools you might find most useful.

How data migrations work—or at least should work

Before we get to the tooling, we should at least explain our perspective so you understand the lens through which we look at these tools.

There are two ways to perform a data migration: 1. Lift and shift, 2. Migrate and refactor. We have opinions on which is better and will yield greater success. You can find out more by reading our data migration best practices guide and prepare yourself for some of the most common data migration challenges (and benefits!).

Why lift and shift is the best data migration method

In a lift and shift data migration, you’re:

  1. Copying everything from your source database to your target system
  2. Validating that everything got there and looks correct
  3. Turning off your legacy systems
  4. Modernizing and refactoring your data to take advantage of the target system’s efficiencies

We like this method best for a variety of reasons, primarily because you’re preserving the data as close as possible to the legacy setup and immediately relieving workloads from your (likely overworked) legacy system. You have to do the least amount of rework and your data consuming apps should need little, if any, updates. If you read our best practices guide, you’ll find out why our CEO wishes he did a lift and shift at Lyft. Instead, they chose to migrate and refactor.

Why you shouldn’t refactor during your data migration

Refactoring while migrating your data is a bit like changing the oil in a car that’s speeding down the freeway. Except you’re also trying to change the chassis of your car at the same time.

There are a dozen reasons not to refactor your data while performing a data migration. We think the biggest pitfalls come down to:

  • Refactoring during a migration always increases the scope and timeline of work. The technology and requirements can change before your eyes. You can easily find yourself needing to change your migration strategy mid-way through the effort.
  • It’s really difficult to tell a data stakeholder that their data is “unchanged” because—well, it isn’t. That’s why you refactored. You’re asking them to trust you that the data quality is the same as it ever was and that their rework efforts will be worth it.
  • It’s high-risk, expensive, and there’s a better way (lift and shift)

Different tools serve different phases of data migrations

There are three phases to a data migration:

  1. Migration: Move the data
  2. Validation: Prove the data is the same
  3. (Bonus Points!) Translation: Convert legacy SQL scripts to the SQL flavor of your new database

We haven’t seen a single tool that handles all three. Some tools have a very specific focal point and others have breadth without a lot of depth. Most of the tools out there do data comparisons, but don’t typically offer automation or a diverse set of data output options. It’s usually Excel file exports and the tools are clearly targeting enterprise users.

Other solutions we see are for data migration planning. If you have thousands of tables and downstream BI assets, these tools help you to know what to migrate and prioritize.

Migrate data with ETL tools like Fivetran and Airbyte

Migrating data from one system to another is nothing to take lightly. It’s a non-trivial engineering effort, especially since you’re likely to have your legacy and next-gen systems online in parallel for some amount of time. You need rock-solid data replication with 100% accuracy. Your data migration shouldn’t come at the cost of data quality.

For that reason alone, we recommend using a couple of commercial options that are reliable and great at what they do. Of course, DIY data pipelines are always an option and plenty of people roll their own, but those require a pretty significant amount of engineering. If you’re not already building pipelines yourself, doing so in the midst of a migration project might not be the best time to start.

Fivetran: Automated data movement

Fivetran calls itself the "automated data movement platform" because they’ve seen how the data engineering space has evolved. Data teams aren’t just moving data into centralized data warehouses. They’re moving data between all sorts of systems.

With Fivetran, you can usually get data pipelines built in a few minutes. They’re fully automated, meaning they seamlessly handle unexpected schema changes and other situations that typically muck up replication. They’re also fully managed, meaning Fivetran support teams are on the hook if there’s an incident or issue with your pipeline’s availability. You can even build custom pipelines if one of their 487 connectors doesn’t support your needs.

You can try connectors for free and the initial data sync is also free. They even have a free tier offering the transfer of hundreds of thousands of rows every month. Every pricing tier supports dbt, too.

Link: Fivetran.com

Airbyte: A robust data integration platform

Airbyte is an excellent alternative to Fivetran, especially if you’re looking for an open source, self-managed option. This works very well for enterprises that have the staff and expertise to support data pipelines. Airbyte offers many of the same features as Fivetran, but with more of a DIY feel. Its open source nature means you have more control over your data pipelines. Many of its connectors are open source, too, so you can extend them however you see fit.

With Airbyte, you can build pipelines in a few minutes. It can handle most schema changes and is resilient for data sources with unexpected changes. There are a variety of support options for the pipelines you set up, and you can build your own connectors or leverage experimental ones from the open source community. They have 350+ connectors, meaning most of the popular data sources and destinations are covered.

Pricing depends on the size of your organization, data volume, and whether you’re self-hosting. Every pricing tier supports dbt, too.

Link: Airbyte.com

Validate a data migration with Datafold

Datafold (hey, that’s our product!) is purpose built for accelerating data migrations. What started as an open source data diffing utility has expanded into a cloud-hosted automation and column-level lineage platform with deep dbt integration. Datafold allows you to do row-by-row comparisons of tables across databases, showing you not just how different your datasets are, but exactly where and how they differ—at any scale.

These features are essential for data migrations because you can ensure parity across thousands of tables in an instant. You can quickly and easily provide assurance that the data in your destination is the same as it ever was.

Datafold Cloud features a REST API and scheduler, so you can monitor and maintain data parity 24x7. This is invaluable for large data migration efforts with thousands of tables. Paired with our column-level lineage, you can see exactly which downstream assets are using this data and ensure they operate without interruption. You can also make prioritization decisions about your data migration based on which tables (and downstream BI assets) are most used.

Datafold Cloud also supports a SQL Translator, so you can translate your legacy code into the SQL dialect of your data warehouse with the click of a button. No more Googling "date_trunc function in BigQuery."

Data migrations live and die by their toolchain

Migrating data is like moving houses. You gotta pack everything (migration), check if you didn't leave anything behind (validation), and sometimes, rearrange furniture to fit the new place (translation).

Unless your product is super simple, which is super super rare, you’re going to need tools to help you successfully migrate your data. Even if you’re moving from a modern cloud data warehouse to another cloud data warehouse, you should still use tooling to migrate and validate. You never know if there’s a UTF encoding difference between the two instances that’ll muck up your unicode characters.

There are a lot of tools out there, but it’s hard to find good ones. ETL tools like Fivetran and Airbyte, AWS DMS, and Datafold Cloud are excellent ways to get started that won’t just leave you wishing you’d taken the idea of becoming a monk more seriously.

You can’t copy/paste your data in a migration, just like you can’t copy/paste your boxes and furniture between two houses. You need the right tools, and sometimes you need to call some friends over to help. Don’t be afraid to reach out to us. We’ve got your back and a couple of furniture dollies. (Yes, we’re okay with pineapple pizza.)

Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.

Datafold is the fastest way to test dbt code changes