Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action
March 5, 2025

AI in data engineering: Use cases, benefits, and challenges

Discover how AI in data engineering is transforming workflows with automation, code optimization, and data warehouse efficiency. Learn key use cases, benefits, and challenges of implementing AI-driven solutions in data teams.

Kira Furuichi
AI in data engineering: Use cases, benefits, and challenges

AI is rapidly evolving how data engineers are building, managing, and optimizing data infrastructure. From automating code generation to accelerating data migrations, AI has the potential to massively enable data engineers to work faster and smarter.

While AI’s adoption in data engineering workflows is still “finding its groove,” it presents today significant opportunities to drastically improve manual workflows, reduce costs, and catalyze innovation. In this post, we’ll explore four key use cases where AI is making an impact, the benefits it brings to data teams, and the challenges organizations must navigate when implementing AI-driven solutions.

AI in data engineering use cases

While there are many (still expanding!) use cases for AI in data engineering, the focus of this article will be on four practical use cases of it:

  1. Code generation and optimization
  2. Automated code reviews
  3. Data warehouse performance and cost optimization
  4. Data migrations

Code generation and optimization

Probably one of the biggest (and buzziest) areas of AI being utilized in data engineering is around code generation. This may look like using tools such as GitHub Copilot to help generate SQL for data transformation, or using dbt Cloud’s in-app AI to quickly generate dbt models. As writing code is still a large manual process for many data engineers, relying on AI to aid in code generation and optimization will continue to be a growing area of interest and implementation for data teams.

Code reviews

One of the most time-consuming parts of the data and analytics engineering workflow is around manual code reviews—checking code for a variety of things:

  • Consistency and adherence to code formatting standards
  • Completeness
  • Correctness
  • Performance
  • Impact on downstream models, assets, and data
  • and more

Without AI, this translates to hours of works per pull request, back-and-forth between the PR opener and reviewer, and ultimately, time spent on manual work that can (and will likely be) automated by AI one day.

Datafold is pushing the boundaries of automated code reviews with our AI-Code Reviews that make PR reviews faster, clearer, and more actionable. Specifically, they support two functions using AI:

  1. Surface the most critical insights about each PR, so you know what requires attention at a glance.
  2. Ask and answer questions in a convenient chat interface to dig deeper into changes or impacts.

Data warehouse performance and cost optimization

As businesses continue to accumulate more data, data warehouse costs are only going up, and expanding data in a cost efficient way has never been more important. AI can and will be eventually used to help data engineers:

  • Optimize query performance by offering more performant SQL suggestions to data transformations as they write and update code
  • Suggestions and improvements to data transformation run schedules
  • and more

This is still an area being explored and developed by data teams and SaaS companies, and will be interesting to keep an eye on as data team expenses (like all other teams) come under greater  pressure.

Data migrations

For migrations, there is enormous opportunity for AI to greatly accelerate the way data teams approach and execute migrations. But also in a practical way.

While AI will likely completely remove all manual work relating to migrations (someone still needs to turn on the new database!), or completely removing the need for outside consultants for large-scale migrations, we’ll see AI and LLMs used to automate the most tedious parts of a migration: code translation and cross-database validation.

The Datafold Migration Agent (DMA) uses AI and LLMs to automatically convert code to the new SQL dialect or workflow of your choice, and fine-tune code until data parity is met between legacy and new systems. DMA ultimately uses AI and LLMs to support:

  1. Zero manual validation, at scale: DMA now automatically verifies every record across both legacy and new databases to ensure accuracy, without having data teams waste time creating cross-database comparisons for every migrated table.
  2. Lowers migration risks: With DMA’s AI fine-tuning itself until accuracy is met and parity is 100%, DMA is not only accelerating migration timelines massively, but lowering the risk of inaccurate data in the new system.
  3. Faster time-to-production: With end-to-end value-level comparisons done automatically by DMA, data teams also have auditable comparisons between systems to earn stakeholder sign-off faster.

Benefits of AI in data engineering workflows

The main benefit of implementing AI strategically in data engineering is around the considerable time and cost savings AI brings to teams.

Increased efficiency with automation

The biggest gain from implementing AI in data engineering workflows undoubtedly comes from increased efficiency. By taking time-consuming data engineering tasks, such as code reviews, code generation, and warehouse optimizations, AI is allowing data engineers to focus on higher-impact work that requires their highly-curated skillset.

When writing a new SQL model used to take 2 hours, with AI it may take half of that time. When reviewing pull requests used to take 5 hours, with AI it make take only 30 minutes. Data engineers will continue to gain back incredibly valuable time to focus on business-defining data work.

AI, with its ability to help data engineers write more performant SQL or craft more efficient data pipelines and optimize data warehouses for cost, can also allow companies to be more efficient with their data infrastructure costs.

Short and long-term gains in innovation

Data teams and engineers who strategically leverage AI in their workflows open the doors to short and long-term innovation. By saving data teams considerable time typically spent on manual tasks, AI in enabling data practitioners to focus on innovation—better dashboards, predictive ML models, improved data governance and documentation, and more.

Increased accessibility and scalability

With AI assistance in code generation and reviews becoming more of a reality everyday, the barriers to contributing to data engineering work becomes lower. By lowering the barriers to data engineering work, data work can be accomplished more efficiently by people of different teams and skillsets, allowing AI to be a lever for both increasing accessibility to data work and allowing more scalable solutions to appear.

Challenges of implementing AI in data engineering

All of these gains with AI sound really great, and one day (probably in the near-to-medium term future), these use cases and benefits will be much more accessible and tangible for many data engineers. Today there remains challenges in implementing AI in data engineering, namely around data security and privacy concerns, organizational maturity, and data readiness.

Data security and privacy

With how simply new AI is to workflows, and AI-based SaaS companies like OpenAI (ChatGPT) and Anthropic (Claude) innovating at incredible rates, there come reasonable concerns from security and legal teams around the access these AI and LLM companies have. Maintaining data privacy and security in the AI-era may look like ensuring AI chats are not used by the AI companies to train their models, or guaranteeing all training data is masked and encrypted.

Organizational readiness

It’s hard to find a company today that is not on the “AI-hype train” (and for very fair reasons). However, if there is not the right infrastructure and practices in place to support AI in data engineering workflows, it may be challenging for AI to “take-off” internally.

Data quality and AI-preparedness

As equally important for organizational readiness is data readiness. If organizations are training and adjusting their own AI models, it’s never been more important for data quality to be high and data infrastructure to be organized. If data is not mature enough for AI, data engineers may not get the greatest possible benefit from implementing AI in their workflows.

A hidden challenge: It’s also very important for organizations to outline where AI won’t be impactful or useful. And while AI and LLMs will continue to get better with time, they are not perfect (please see the image below). AI today is best suited to automate manual, repetitive tasks from data engineers, such as code reviews or even data migrations, and leaving the more complex and specific work to the people.
Ah yes, the illusive "A AI Integration" company doing its finest(?) work

Conclusion

AI is reshaping the landscape of data engineering by automating repetitive manual tasks, enhancing efficiency, and driving innovation. While the benefits such as time savings, cost reductions, and increased accessibility are clear, organizations must also address challenges like data security, AI readiness, and maintaining high data quality.

As AI tools continue to evolve, their role in data engineering will continue to mature, enabling teams to focus on more strategic and impactful work.

In this article