Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
By providing this information, you agree to be kept informed about Datafold's products and services.
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action
· 10 min read

SSIS to Snowflake migration: challenges, patterns, and AI automation

How to migrate SSIS packages to Snowflake: DTSX parsing, Script Task replacement, expression conversion, and the challenges teams actually hit.

Datafold
Datafold Solutions Engineering
Datafold
SSIS / SQL Server Snowflake
SSIS to Snowflake Migration
The complete technical guide

An SSIS to Snowflake migration is not a database migration. It’s a platform migration. SQL Server Integration Services has been the default ETL tool for Microsoft shops since 2005, and over two decades, teams have accumulated hundreds (sometimes thousands) of DTSX packages. These packages embed business logic in XML, C# Script Tasks, SSIS-specific expression syntax, and Windows-only execution dependencies. Moving to Snowflake means leaving the entire Windows ETL ecosystem behind.

The scope is real, but so is the pattern. The majority of SSIS Data Flow transformations (Derived Columns, Lookups, Conditional Splits, Aggregates) map to standard SQL constructs that AI-powered migration tools convert automatically. Script Tasks are the hard part. Everything else follows well-established conversion patterns.

The reason teams migrate now usually comes down to three things: SQL Server licensing costs that keep climbing, the operational burden of maintaining Windows servers just for SSIS, and the broader shift from ETL to ELT. Snowflake’s architecture eliminates the need for a separate ETL runtime by pushing transformations into the warehouse itself. You can get a migration assessment here.

Architecture comparison: SSIS vs Snowflake

SSIS operates as a push-based ETL engine. A DTSX package contains two layers: the Control Flow, which defines the execution sequence of tasks, and the Data Flow, which moves and transforms rows through an in-memory pipeline. Packages run on a Windows server via the Integration Services runtime, and SQL Server Agent handles scheduling. Connection Managers define how SSIS connects to sources and destinations using OLEDB, ADO.NET, ODBC, or flat file providers.

This architecture means SSIS does the heavy lifting outside the database. Data gets extracted, transformed in memory on the SSIS server, and then loaded into the target. Every row passes through the SSIS runtime.

Snowflake flips this model. Data lands in Snowflake first (via COPY INTO, Snowpipe, or third-party connectors), and transformations happen inside the warehouse using SQL. Snowflake’s native capabilities handle the full transformation and orchestration layer: views and stored procedures for logic, Snowflake Tasks and Streams for event-driven scheduling, and Snowflake Scripting for procedural control flow. Many teams also choose to use dbt for version-controlled SQL transformations, and Snowflake now offers a native dbt Core integration in Snowsight, so you don’t need to maintain separate dbt Core infrastructure. But dbt is optional, not required. There is no separate runtime server to maintain.

SSIS conceptSSIS implementationSnowflake equivalent
Package/pipeline definitionDTSX XML fileSQL scripts + Task DAGs (or dbt models)
Transformation engineIn-memory buffers on Windows serverSnowflake virtual warehouse (SQL compute)
SchedulingSQL Server AgentSnowflake Tasks (CRON) or Airflow
Change data captureCustom logic or CDC componentsSnowflake Streams
Data ingestionData Flow with OLEDB/ADO.NET sourcesCOPY INTO, Snowpipe, Fivetran/Airbyte
Reusable logicScript Components, custom Data Flow componentsSnowflake UDF (or dbt macro)
Error handlingPrecedence constraints + event handlersTask error notifications + stored procedure exception handling
AuthenticationWindows credential store / SSPIKey-pair auth, OAuth, or username/password
ConnectorsOLEDB, ADO.NET, ODBC, flat file providersSnowflake stages, Fivetran, Airbyte, native connectors

The architectural shift from push-based ETL to pull-based ELT is the single biggest mental model change. You stop thinking about “how do I move and transform this data” and start thinking about “the data is already here, now how do I query it.”

Migration challenges and how to solve them

DTSX packages are XML, not SQL

SSIS stores everything in the DTSX XML format. A single package file contains Control Flow tasks, Data Flow components, connection strings, variable definitions, expressions, and layout metadata all mixed together. You cannot just extract the SQL from a DTSX file because much of the logic is defined as XML attributes and component configurations, not as SQL statements.

A Derived Column transformation, for example, lives in the XML as a component with input and output columns, each with an SSIS expression attached as a property. There is no SQL equivalent sitting in the file waiting to be copied. The logic must be reconstructed from the XML structure. Migration tools like Datafold’s Migration Agent parse this XML automatically, extracting the transformation graph and generating equivalent SQL.

Script Tasks depend on .NET and the Windows runtime

SSIS Script Tasks and Script Components let developers write C# or VB.NET code with full access to the .NET framework. Teams use them for everything: calling REST APIs, parsing custom file formats, sending emails, doing regex transformations, and accessing Windows COM objects. This code runs on the Windows server where SSIS is installed.

None of this translates directly to Snowflake. You have three replacement options depending on complexity: Snowflake JavaScript stored procedures for lighter logic, Snowpark Python procedures for heavier data processing, or external functions that call an AWS Lambda or Azure Function for things like API calls and file processing. Script Tasks are the highest-effort items in any SSIS to Snowflake migration, but in our experience they typically account for less than 15% of total package logic. The rest converts through standard patterns.

SSIS expressions use a proprietary syntax

SSIS has its own expression language that looks nothing like SQL. It uses C-style type casts like (DT_WSTR, 10), references variables with @[User::VariableName], and has its own date functions (DATEADD, DATEDIFF, DATEPART that use the same argument order as T-SQL but require the datepart as a quoted string literal, e.g. DATEADD("day", 7, GETDATE())). These expressions appear everywhere: Derived Columns, variable assignments, precedence constraint conditions, and connection string parameterization.

Every one of these needs to be translated to Snowflake SQL or to a variable binding in your orchestration layer. The translation is not difficult for individual expressions, but the volume is what kills you. A package with 40 Derived Columns means 40 individual expression translations. Migration tools handle this pattern-matching work automatically for the standard expression functions.

Windows authentication has no cloud equivalent

SSIS relies on Windows Integrated Security (SSPI) for connecting to SQL Server and other Microsoft services. Packages often use the identity of the SQL Server Agent service account to authenticate, and the Windows credential store holds encrypted connection strings.

Snowflake uses key-pair authentication, OAuth, or username/password. You need to replace every connection manager in every package with a Snowflake-compatible auth mechanism. If your SSIS packages also connect to file shares via UNC paths, those need to move to cloud storage (S3, Azure Blob, GCS) with corresponding Snowflake external stages. This is straightforward but time-consuming to inventory.

Precedence constraints encode complex branching logic

SSIS Control Flow uses precedence constraints to define execution order and branching: on-success, on-failure, and on-completion. These constraints can also include expressions, creating conditional execution paths like “run Task B only if Task A succeeds AND variable X > 100.”

Snowflake Task DAGs support predecessor relationships and runtime conditional branching (via DAGTaskBranch), but they don’t natively support on-failure path routing to a different execution branch. For complex branching, you need to either encode the logic inside stored procedures (using Snowflake Scripting IF/ELSE blocks) or use an external orchestrator like Airflow, which has first-class support for branching and failure handling.

Connection managers cover a wide ecosystem

SSIS packages don’t just connect to SQL Server. A typical enterprise has packages using OLEDB connections to Oracle and DB2, flat file connections for CSV processing, FTP tasks for file transfer, SMTP for email notifications, and ADO.NET connections to various .NET-accessible data sources. Each connection manager type needs a replacement strategy.

Inventory every source system your SSIS environment connects to and find a replacement connector for each one. The connector ecosystem for Snowflake is mature. Between Fivetran, Airbyte, Snowflake’s native connectors, and external stages for file-based loads, coverage gaps are rare. If you’re coming from an environment with heavy FTP/flat-file processing, Snowpipe with auto-ingest on cloud storage replaces the file-watcher pattern that SSIS teams often build with ForEach Loop Containers.

Code translation examples

SSIS expression to Snowflake SQL

SSIS expressions use a proprietary syntax for type casting and string formatting. Here is a common date-formatting pattern:

SSIS Derived Column expression:

(DT_WSTR, 4)YEAR(GETDATE()) + "-" +
RIGHT("0" + (DT_WSTR, 2)MONTH(GETDATE()), 2) + "-" +
RIGHT("0" + (DT_WSTR, 2)DAY(GETDATE()), 2)

Snowflake SQL:

TO_VARCHAR(CURRENT_DATE(), 'YYYY-MM-DD')

The SSIS version is six lines of manual string concatenation with explicit type casts. Snowflake handles it with a single function call and a format string. This pattern (verbose SSIS expression collapsing into a single Snowflake function) repeats across most expression translations.

SSIS Lookup transformation to Snowflake JOIN

SSIS Lookups cache reference data in memory and match rows as they flow through the pipeline. In full cache mode, SSIS loads the entire reference table into memory before processing begins.

SSIS Lookup (conceptual):

Source: orders table
Lookup: customers table (full cache)
Match columns: orders.customer_id = customers.customer_id
Output: customers.customer_name, customers.region
No Match: redirect to error output

Snowflake SQL:

SELECT
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_name,
    c.region,
    CASE WHEN c.customer_id IS NULL THEN 'UNMATCHED' END AS lookup_status
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id;

One thing to watch: SSIS Lookups with partial cache or no-cache mode perform row-by-row lookups against the database. The SQL JOIN equivalent is typically faster because Snowflake processes the entire join at once. But as Snowflake’s SnowConvert documentation notes, SSIS Lookups return the first matching row based on read order, while SQL JOINs are non-deterministic when multiple rows match. If your lookup source has duplicates, you may need to add a ROW_NUMBER() window function to replicate the original behavior.

SSIS Conditional Split to Snowflake CASE

SSIS Conditional Split routes rows to different outputs based on expressions. This is often used to separate good data from bad data, or to route rows to different destination tables.

SSIS Conditional Split conditions:

Output 1 "HighValue":    [amount] > 10000
Output 2 "International": [country] != "US"
Default:                  (everything else)

Snowflake SQL (single query with CASE):

SELECT *,
    CASE
        WHEN amount > 10000 THEN 'HighValue'
        WHEN country != 'US' THEN 'International'
        ELSE 'Default'
    END AS route_category
FROM source_data;

If the original SSIS package routes each output to a different destination table, you can use INSERT INTO … SELECT with WHERE clauses, or create separate views for each output path. With dbt, splitting this into separate models (one per output) is a common approach.

SSIS Slowly Changing Dimension to Snowflake MERGE

SSIS includes a Slowly Changing Dimension (SCD) wizard that generates a Data Flow for Type 1 (overwrite) and Type 2 (history tracking) updates. The generated Data Flow is notoriously complex, often containing 10+ components chained together.

Snowflake MERGE + INSERT (Type 2 SCD):

-- Step 1: Expire changed records
MERGE INTO dim_customer AS target
USING staging_customer AS source
ON target.customer_id = source.customer_id
   AND target.is_current = TRUE
WHEN MATCHED AND (
    target.customer_name != source.customer_name
    OR target.address != source.address
) THEN UPDATE SET
    is_current = FALSE,
    effective_end_date = CURRENT_TIMESTAMP();

-- Step 2: Insert new current rows for changed and net-new records
INSERT INTO dim_customer (
    customer_id, customer_name, address,
    is_current, effective_start_date, effective_end_date
)
SELECT
    source.customer_id, source.customer_name, source.address,
    TRUE, CURRENT_TIMESTAMP(), NULL
FROM staging_customer AS source
LEFT JOIN dim_customer AS target
    ON target.customer_id = source.customer_id
   AND target.is_current = TRUE
WHERE target.customer_id IS NULL
   OR target.customer_name != source.customer_name
   OR target.address != source.address;

A MERGE cannot simultaneously UPDATE a matched row and INSERT a new row for the same source record, so SCD Type 2 requires two statements. These replace what was a 10-component Data Flow in SSIS. If you are using dbt, snapshots handle this two-step logic internally with a single configuration block.

SSIS Script Task (C#) to Snowflake stored procedure

A common SSIS Script Task pattern is calling a REST API and inserting the result into a table:

SSIS Script Task (C#):

public void Main()
{
    string url = Dts.Variables["User::ApiUrl"].Value.ToString();
    using (var client = new System.Net.WebClient())
    {
        string json = client.DownloadString(url);
        // Parse JSON, insert rows into destination
        Dts.TaskResult = (int)ScriptResults.Success;
    }
}

Snowflake external function + procedure:

-- Create external function pointing to API gateway
CREATE EXTERNAL FUNCTION fetch_api_data(url VARCHAR)
    RETURNS VARIANT
    API_INTEGRATION = my_api_integration
    AS 'https://my-gateway.execute-api.us-east-1.amazonaws.com/prod/fetch';

-- Call it in a stored procedure
CREATE OR REPLACE PROCEDURE load_api_data(api_url VARCHAR)
    RETURNS VARCHAR
    LANGUAGE SQL
AS
BEGIN
    INSERT INTO api_results (payload, loaded_at)
    SELECT fetch_api_data(:api_url), CURRENT_TIMESTAMP();
    RETURN 'Success';
END;

Snowpark Python procedures can also make HTTP calls using the requests library, but require an EXTERNAL ACCESS INTEGRATION with an allowlisted NETWORK RULE. The setup effort is comparable to external functions; choose Snowpark when you need richer response parsing or conditional logic around the API call.

Feature mapping reference

This table maps SSIS concepts to their Snowflake equivalents. “Direct translation” means the mapping is mechanical. “Redesign required” means you need to rethink the approach.

SSIS featureSnowflake equivalentTranslation difficulty
Derived Column transformationSQL expressions in SELECTDirect translation
Lookup transformation (full cache)SQL LEFT JOIN + COALESCEDirect translation
Lookup transformation (partial/no cache)SQL LEFT JOIN with ROW_NUMBER() for dedupModerate redesign
Conditional SplitSQL CASE expression or WHERE filtersDirect translation
Aggregate transformationSQL GROUP BY + aggregate functionsDirect translation
Sort transformationSQL ORDER BY (usually unnecessary)Drop it; Snowflake sorts as needed
Merge JoinSQL JOIN (INNER, LEFT, FULL)Direct translation
Union AllSQL UNION ALLDirect translation
SCD wizard (Type 2)Snowflake MERGE (or dbt snapshots)Moderate redesign
Script Task (C#/.NET)Snowpark Python procedure or external functionRedesign required
Script Component (data transform)Snowflake UDF or Snowpark Python UDFRedesign required
DTSX Package (orchestration)Snowflake Task DAG or Airflow DAGModerate redesign
SQL Server Agent JobSnowflake Task with CRON scheduleDirect translation
ForEach Loop ContainerSnowflake Scripting LOOP or Airflow dynamic tasksModerate redesign
SSIS variables / parametersSession variables or stored procedure parameters (or dbt vars)Direct translation
Event handlers (OnError, OnWarning)Alert integration + error logging tablesModerate redesign
Connection ManagersSnowflake stages, storage integrations, secretsRedesign required

Script Tasks and Script Components require the most manual effort. Everything else follows predictable conversion patterns.

Data type mapping: SSIS to Snowflake

SSIS uses its own internal data types (prefixed with DT_) for the Data Flow pipeline. These map to Snowflake types as follows.

SSIS typeSnowflake typeNotes
DT_BOOLBOOLEAN
DT_I1TINYINTSigned byte (-128 to 127).
DT_I2SMALLINT
DT_I4INT32-bit signed integer. Snowflake INT is NUMBER(38,0) under the hood.
DT_I8BIGINT64-bit signed integer.
DT_UI1SMALLINTUnsigned byte (0-255); widen to avoid overflow.
DT_UI2NUMBER(10,0)Unsigned 16-bit (0-65535); widen to INTEGER.
DT_UI4NUMBER(19,0)Unsigned 32-bit; widen to BIGINT.
DT_R4FLOATSingle-precision float.
DT_R8DOUBLEDouble-precision float.
DT_DECIMALNUMBER(p,s)
DT_NUMERICNUMBER(p,s)
DT_CYNUMBER(19,4)SSIS currency type; 8-byte scaled integer.
DT_STRVARCHARANSI string.
DT_WSTRVARCHARUnicode string. Snowflake VARCHAR is always Unicode.
DT_TEXTVARCHARANSI text stream.
DT_NTEXTVARCHARUnicode text stream.
DT_BYTESBINARY
DT_IMAGEBINARYBinary large object.
DT_GUIDVARCHAR(36)Snowflake has no native UUID. Store as VARCHAR(36) and generate with UUID_STRING().
DT_DBDATEDATE
DT_DBTIME2TIME
DT_DBTIMESTAMPTIMESTAMP_NTZDate and time without timezone.
DT_DBTIMESTAMP2TIMESTAMP_NTZHigh-precision date and time.
DT_DBTIMESTAMPOFFSETTIMESTAMP_TZDate and time with UTC offset.
DT_FILETIMETIMESTAMP_NTZWindows FILETIME (100ns intervals since 1601); convert to TIMESTAMP.

The type that causes the most silent issues is DT_DBTIMESTAMP. SQL Server DATETIME (which DT_DBTIMESTAMP maps from) has a precision of 3.33 milliseconds and rounds to increments of .000, .003, or .007 seconds. DATETIME2 and Snowflake TIMESTAMP_NTZ do not round this way. If your SSIS packages rely on DATETIME rounding behavior for deduplication or comparisons, you will get different results after migration.

Best practices for SSIS to Snowflake migration

Inventory your packages and find the dead ones. Run a query against SSISDB.catalog.packages (for the project deployment model used in SSIS 2012+) or msdb.dbo.sysssispackages (for legacy package deployment). Cross-reference with SSISDB.catalog.executions or SQL Server Agent job history to see which packages have actually executed in the last 6 months. In most organizations we work with, 30-50% of SSIS packages are either disabled, broken, or running against tables that no longer matter. Don’t migrate dead code.

Separate data movement from transformation. SSIS bundles extraction, transformation, and loading into a single Data Flow. In Snowflake, split these apart. Use Fivetran, Airbyte, or Snowpipe for data ingestion. Write transformations as SQL views, stored procedures, or dbt models. Snowflake’s native features (views, stored procedures, Tasks, Streams) can handle the full workload. dbt is a popular addition for teams that want version-controlled, testable SQL, and Snowflake offers native dbt Core hosting.

Replace Script Tasks early and test thoroughly. Script Tasks are the highest-effort items because they require rewriting C#/VB.NET in Python or JavaScript. Identify all Script Tasks in your inventory, categorize them by what they do (API calls, file processing, custom transformations, email notifications), and pick a replacement strategy for each category. Don’t leave these until the end of the project.

Validate with value-level data diff, not row counts. Row counts match when you have the right number of wrong rows. After migrating each package, run a column-level comparison between the SQL Server output and the Snowflake output. This catches type conversion issues, rounding differences, NULL handling discrepancies, and off-by-one errors in date logic that row counts will never surface. Datafold’s cross-database data diff was built for exactly this step.

How Datafold automates SSIS to Snowflake migrations

The challenges above have well-established solutions, and Datafold’s Migration Agent automates the majority of the work. The agent parses DTSX packages and extracts the embedded logic: Data Flow transformations, SSIS expressions, variable references, and connection configurations. It generates Snowflake SQL (and optionally dbt models) that reproduce the same data output as the original packages. Standard patterns like Derived Columns, Lookups, Conditional Splits, and Aggregates convert automatically. In our experience, 90%+ of mappings that follow standard patterns are converted without manual intervention. Engineers focus their time on Script Tasks and complex branching logic.

Validation is where the biggest time savings happen. Datafold runs automated cross-database data diffs that compare every column of every table between SQL Server (the SSIS output) and Snowflake (the new pipeline output). When Healthy Directions migrated from SQL Server to Snowflake, they used Datafold’s data diffs to validate hundreds of objects, catching discrepancies that manual testing would have missed. Ingersoll Rand completed a similar migration (400+ stored procedures, Azure Synapse to Snowflake + dbt) in 3 months, 4x faster than the next best alternative.

SSIS migrations tend to run alongside SQL Server to Snowflake migrations because the two systems are deeply intertwined. Datafold handles both the database objects and the ETL logic in a single project, with fixed pricing and a guaranteed timeline.

Get a migration assessment to scope your SSIS package inventory.

In this article