AI in data engineering: Use cases, benefits, and challenges
Discover how AI in data engineering is transforming workflows with automation, code optimization, and data warehouse efficiency. Learn key use cases, benefits, and challenges of implementing AI-driven solutions in data teams.

AI is rapidly evolving how data engineers are building, managing, and optimizing data infrastructure. From automating code generation to accelerating data migrations, AI has the potential to massively enable data engineers to work faster and smarter.
While AIâs adoption in data engineering workflows is still âfinding its groove,â it presents today significant opportunities to drastically improve manual workflows, reduce costs, and catalyze innovation. In this post, weâll explore four key use cases where AI is making an impact, the benefits it brings to data teams, and the challenges organizations must navigate when implementing AI-driven solutions.
AI in data engineering use cases
While there are many (still expanding!) use cases for AI in data engineering, the focus of this article will be on four practical use cases of it:
- Code generation and optimization
- Automated code reviews
- Data warehouse performance and cost optimization
- Data migrations
Code generation and optimization
Probably one of the biggest (and buzziest) areas of AI being utilized in data engineering is around code generation. This may look like using tools such as GitHub Copilot to help generate SQL for data transformation, or using dbt Cloudâs in-app AI to quickly generate dbt models. As writing code is still a large manual process for many data engineers, relying on AI to aid in code generation and optimization will continue to be a growing area of interest and implementation for data teams.
Code reviews
One of the most time-consuming parts of the data and analytics engineering workflow is around manual code reviewsâchecking code for a variety of things:
- Consistency and adherence to code formatting standards
- Completeness
- Correctness
- Performance
- Impact on downstream models, assets, and data
- and more
Without AI, this translates to hours of works per pull request, back-and-forth between the PR opener and reviewer, and ultimately, time spent on manual work that can (and will likely be) automated by AI one day.
Datafold is pushing the boundaries of automated code reviews with our AI-Code Reviews that make PR reviews faster, clearer, and more actionable. Specifically, they support two functions using AI:
- Surface the most critical insights about each PR, so you know what requires attention at a glance.
- Ask and answer questions in a convenient chat interface to dig deeper into changes or impacts.
Data warehouse performance and cost optimization
As businesses continue to accumulate more data, data warehouse costs are only going up, and expanding data in a cost efficient way has never been more important. AI can and will be eventually used to help data engineers:
- Optimize query performance by offering more performant SQL suggestions to data transformations as they write and update code
- Suggestions and improvements to data transformation run schedules
- and more
This is still an area being explored and developed by data teams and SaaS companies, and will be interesting to keep an eye on as data team expenses (like all other teams) come under greater  pressure.
Data migrations
For migrations, there is enormous opportunity for AI to greatly accelerate the way data teams approach and execute migrations. But also in a practical way.
While AI will likely completely remove all manual work relating to migrations (someone still needs to turn on the new database!), or completely removing the need for outside consultants for large-scale migrations, weâll see AI and LLMs used to automate the most tedious parts of a migration: code translation and cross-database validation.
The Datafold Migration Agent (DMA) uses AI and LLMs to automatically convert code to the new SQL dialect or workflow of your choice, and fine-tune code until data parity is met between legacy and new systems. DMA ultimately uses AI and LLMs to support:
- Zero manual validation, at scale: DMA now automatically verifies every record across both legacy and new databases to ensure accuracy, without having data teams waste time creating cross-database comparisons for every migrated table.
- Lowers migration risks: With DMAâs AI fine-tuning itself until accuracy is met and parity is 100%, DMA is not only accelerating migration timelines massively, but lowering the risk of inaccurate data in the new system.
- Faster time-to-production: With end-to-end value-level comparisons done automatically by DMA, data teams also have auditable comparisons between systems to earn stakeholder sign-off faster.
Benefits of AI in data engineering workflows
The main benefit of implementing AI strategically in data engineering is around the considerable time and cost savings AI brings to teams.
Increased efficiency with automation
The biggest gain from implementing AI in data engineering workflows undoubtedly comes from increased efficiency. By taking time-consuming data engineering tasks, such as code reviews, code generation, and warehouse optimizations, AI is allowing data engineers to focus on higher-impact work that requires their highly-curated skillset.
When writing a new SQL model used to take 2 hours, with AI it may take half of that time. When reviewing pull requests used to take 5 hours, with AI it make take only 30 minutes. Data engineers will continue to gain back incredibly valuable time to focus on business-defining data work.
AI, with its ability to help data engineers write more performant SQL or craft more efficient data pipelines and optimize data warehouses for cost, can also allow companies to be more efficient with their data infrastructure costs.
Short and long-term gains in innovation
Data teams and engineers who strategically leverage AI in their workflows open the doors to short and long-term innovation. By saving data teams considerable time typically spent on manual tasks, AI in enabling data practitioners to focus on innovationâbetter dashboards, predictive ML models, improved data governance and documentation, and more.
Increased accessibility and scalability
With AI assistance in code generation and reviews becoming more of a reality everyday, the barriers to contributing to data engineering work becomes lower. By lowering the barriers to data engineering work, data work can be accomplished more efficiently by people of different teams and skillsets, allowing AI to be a lever for both increasing accessibility to data work and allowing more scalable solutions to appear.
Challenges of implementing AI in data engineering
All of these gains with AI sound really great, and one day (probably in the near-to-medium term future), these use cases and benefits will be much more accessible and tangible for many data engineers. Today there remains challenges in implementing AI in data engineering, namely around data security and privacy concerns, organizational maturity, and data readiness.
Data security and privacy
With how simply new AI is to workflows, and AI-based SaaS companies like OpenAI (ChatGPT) and Anthropic (Claude) innovating at incredible rates, there come reasonable concerns from security and legal teams around the access these AI and LLM companies have. Maintaining data privacy and security in the AI-era may look like ensuring AI chats are not used by the AI companies to train their models, or guaranteeing all training data is masked and encrypted.
Organizational readiness
Itâs hard to find a company today that is not on the âAI-hype trainâ (and for very fair reasons). However, if there is not the right infrastructure and practices in place to support AI in data engineering workflows, it may be challenging for AI to âtake-offâ internally.
Data quality and AI-preparedness
As equally important for organizational readiness is data readiness. If organizations are training and adjusting their own AI models, itâs never been more important for data quality to be high and data infrastructure to be organized. If data is not mature enough for AI, data engineers may not get the greatest possible benefit from implementing AIÂ in their workflows.
A hidden challenge:Â Itâs also very important for organizations to outline where AI wonât be impactful or useful. And while AI and LLMs will continue to get better with time, they are not perfect (please see the image below). AI today is best suited to automate manual, repetitive tasks from data engineers, such as code reviews or even data migrations, and leaving the more complex and specific work to the people.

Conclusion
AI is reshaping the landscape of data engineering by automating repetitive manual tasks, enhancing efficiency, and driving innovation. While the benefits such as time savings, cost reductions, and increased accessibility are clear, organizations must also address challenges like data security, AI readiness, and maintaining high data quality.
As AI tools continue to evolve, their role in data engineering will continue to mature, enabling teams to focus on more strategic and impactful work.
