July 14, 2025

The Slicer vs. The Kraken: What we built to migrate a 300K-character stored procedure

When a 300,000-character stored procedure broke the limits of legacy tools and even conventional LLMs, we built the Slicer—which analyzes procedural structure to identify logical boundaries, track variable scope, and preserve dependencies. Learn how Datafold’s Migration Agent delivers complex migrations faster and at a fraction of the cost.

No items found.

Gleb Mezhanskiy

The Slicer vs. The Kraken: What we built to migrate a 300K-character stored procedure

Most teams underestimate the complexity hiding in their legacy logic. When companies begin a migration from systems like Synapse or SQL Server to modern platforms like Databricks and dbt, one consistent source of risk is the stored procedures that have quietly accumulated over decades.

Many customers have told us the same thing at the start of a migration project: “We have a few hundred stored procedures, but most of them are pretty straightforward.” That may be true for the majority, but it only takes one or two outliers to consume weeks of work and stall an entire migration.

We recently saw this during a Synapse-to-Snowflake and dbt migration for a large manufacturing customer. There, we ran into one of the largest stored procedures we’ve ever seen: a 300,000-character behemoth that we named the Kraken.

Why most AI tooling falls short

Translating stored procedures with LLMs isn’t as simple as feeding in SQL and getting a clean dbt model back. There are two core challenges that consistently trip up teams.

First, these procedures often encode years of business logic: undocumented decisions, branching logic, and legacy dependencies. Translating them means understanding how variables interact across conditionals, how control flow affects data outcomes, and how outputs are constructed. Legacy migration tools, like static compilers, aren’t equipped for this: they operate at the syntax level, not the logic level. And LLMs, while good at generating syntax, struggle with this kind of structural reasoning.

Get migrations right the first time with our new guide on data migration best practices.

Learn strategies to mitigate risks, streamline processes, and deliver on-time and on-budget outcomes that earn stakeholder trust.

Second, LLMs have finite context windows. These are limits on how much code they can process in a single pass. GPT-4o, for example, caps out at 128,000 tokens. And even if you could fit such a large piece of code within the context window, the results would not be reliable. As input approaches the context window limit, the model’s attention becomes increasingly scattered: it forgets variable relationships, misapplies logic, or hallucinates code that looks plausible but fails in practice.

The Kraken hit both limits. At over 300k characters, it pushed the limits of any conventional tool. Solving this class of problem meant going beyond prompt engineering. We had to build our own system for slicing, translating, and validating complex procedural code at scale.

Building the Slicer to translate and validate logic at scale

To handle both the scale and structure of the Kraken, we added a new capability to the Datafold Migration Agent: the Slicer.

The Slicer is designed to handle the kinds of edge-case logic that generic AI tools and manual refactoring can’t reliably translate. Unlike naïve chunking methods, which split files by line count or token size, the Slicer analyzes procedural structure to identify logical boundaries, track variable scope, and preserve dependencies.

Each resulting slice is small enough to translate independently, but complete enough to retain its functional intent. Once sliced, each piece goes through semantic translation and is validated with Datafold’s Data Diffs to confirm that the translated output matches the original system’s behavior.

In the Kraken project, the translated slices were reassembled into a modular dbt project that replicated the original logic, but with nearly 400 dbt models–enough to populate a medium-sized project on its own.

Mini-Krakens are more common than you think

Not every company will encounter a 300,000-character procedure, but nearly all will hit a few that behave the same way at a smaller scale: 30k, 50k, 100k characters. These mini-Krakens often fly under the radar during the assessment phase and emerge late in the migration, precisely when timelines are tight and capacity is stretched.

When that happens, engineers are usually forced to triage each case manually. That leads to inconsistent translations, creeping costs, and high risk of regression.

This is where the Slicer provides strategic leverage. It treats migrating legacy logic like a software problem, something that can be solved programmatically with repeatable steps, automated data validation, and full traceability.

Let us handle the heavy lifting

Most data teams don’t specialize in stored procedure migration, and they shouldn’t. It’s a one-time, high-risk problem that takes your best engineers away from higher-impact work. Every week spent on procedural rewrites is a week not spent building.

That’s why we built the Datafold Migration Agent to translate and validate complex legacy logic at scale. With intelligent slicing, semantic translation, and automated validation, our Agent has helped teams modernize some of the most complex data pipelines in the industry, with full fidelity and confidence.

So bring us your most complex migration: Synapse to Snowflake, SQL Server to Databricks, Informatica XML to dbt—anything with deeply embedded logic that’s hard to untangle. Book a call with our team of data engineering experts here. We’d love a chance to meet (and tame) your Kraken.

Get migrations right the first time with our new guide on data migration best practices.

Learn strategies to mitigate risks, streamline processes, and deliver on-time and on-budget outcomes that earn stakeholder trust.