Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
By providing this information, you agree to be kept informed about Datafold's products and services.
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action

How CHG Healthcare saved 15 months migrating from MySQL to Snowflake + dbt

One of America’s largest healthcare staffing firms, CHG, cut their MySQL-to-Snowflake migration time from 18 months to 3 months using the AI-powered Datafold Migration Agent. They reduced costs by 80% while ensuring perfect data parity across over a thousand custom queries.

The challenge

CHG Healthcare maintained a critical reporting system built on MySQL. These reports, which are powered by complex queries that transform data ingested from upstream sources, provided essential credentialing information for healthcare professionals, ensuring they could legally practice in specific locations. The firm needed to migrate these reports to Snowflake without any service disruptions to customers.

The migration presented several complexities:

  • Over 1,000 custom MySQL queries needed translation
  • Many of the queries contained complex joins and subqueries requiring careful translation
  • Reports had to produce identical results in both environments to maintain data parity
  • Manual validation of each query would require enormous effort and introduce risk
  • Some queries exhibited special handling requirements, including non-deterministic behaviors

The company’s reporting ecosystem had evolved organically over the years, making it challenging to identify and categorize all the queries that required migration.

Initially, the data team considered a traditional approach, using contractors for manual SQL translation and validation. They estimated that the contractor approach would take up to 18 months to complete. But given the critical nature of these reports and the resource and timeline required by contractors, this approach simply didn’t align with their business objectives and timelines.

“Before we considered Datafold, the business was ready to spend $10,000 per week on consulting resources for 12-18 months, or even longer than that.”

— Abigail Green, Data Engineering Manager

The solution

CHG Healthcare leveraged Datafold’s AI-powered Data Migration Agent (DMA), a full-cycle migration solution that addressed their key challenges:

AI-powered SQL translation with continuous refinement

Datafold deployed DMA to translate MySQL queries to Snowflake syntax, with outputs validated using Datafold’s proprietary data diff technology. Unlike traditional translators that rely on static code parsing and perform no validation of outputs, Datafold’s DMA leverages large language models (LLMs) with a validation-driven feedback loop that intelligently identifies and corrects translation errors. This approach handled complex SQL patterns that simple translation tools would miss, creating a self-improving system that continuously refined the code until achieving parity.

Cross-database diffing for comprehensive validation

DMA’s one-two punch is the combination of translation with Datafold’s industry-leading cross-database data diffing that provides value-level comparison between source and target environments. For the company, this revealed detailed schema comparison, summary statistics, and row-level differences between the outputs of the original queries and the translated Snowflake versions. This automated validation eliminated the need for manually checking innumerable data points across reports and provided clear evidence of migration success.

Specialized handling for edge cases

For queries with special characters or non-deterministic behavior, the Datafold team implemented specialized handling techniques for DMA that addressed MySQL-specific behaviors. This ensured consistent results even when dealing with invisible characters, trailing spaces, and dialect-specific functions.

Batched delivery through pull requests

Datafold delivered translations in batches through pull requests to the company’s GitHub repositories, allowing for continuous integration and testing throughout the project. Continuous delivery allowed for fast feedback loops and quickly built confidence among executive stakeholders that the migration was significantly accelerated.

“The Datafold process — translating with the right LLM, validating the results, and letting humans review what falls out — is how I’d recommend every migration be done. It’s faster, more accurate, and eliminates many sources of human error, especially in the complex final reporting layer where mistakes matter most.”

— Abigail Green, Data Engineering Manager

The results

CHG Healthcare realized significant benefits from the Datafold implementation:

Timeline reduction from 18 months to 3 months

According to the team, Datafold compressed what would have been an 18-month project into just 3 months. The automated approach allowed the team to parallelize work that would have been sequential if they’d instead used contractors, dramatically accelerating their timeline while maintaining quality.

Data validation at massive scale

The team reported that automated cross-database validation provided a level of confidence that simply wasn’t possible with a manual validation process. With Datafold, they could verify that every value matched perfectly between systems, eliminating the possibility of subtle data discrepancies that could impact critical decisions.

Redirecting internal resources to high-impact work

By automating the bulk of the migration with Datafold, CHG was able to redirect budget and internal resources to strategic initiatives rather than manual migration work.

Looking ahead

Beyond the initial migration, the company is implementing additional Datafold capabilities. The team quickly realized Datafold’s capabilities extended far beyond their initial migration use case. Data lineage, automated testing, and observability and monitoring tools are helping them build a more robust data ecosystem for the long term.

  1. CI/CD Integration: Automatically testing dbt model changes to predict downstream impacts before deployment
  2. Data Quality Monitoring: Setting up observability for critical metrics with anomaly detection
  3. Impact Analysis: Using lineage to trace how changes affect downstream Tableau dashboards

What began as a migration tool has evolved into a strategic platform for ensuring data quality across the company’s entire analytics stack.

Challenge
MySQL MySQL
1,000+ queries
Snowflake Snowflake
dbt Cloud dbt Cloud
Outcome
Completion time
3 months
than the next best alternative
6x faster
80% cheaper