SSIS to Snowflake migration: challenges, patterns, and AI automation
How to migrate SSIS packages to Snowflake: DTSX parsing, Script Task replacement, expression conversion, and the challenges teams actually hit.
An SSIS to Snowflake migration is not a database migration. It’s a platform migration. SQL Server Integration Services has been the default ETL tool for Microsoft shops since 2005, and over two decades, teams have accumulated hundreds (sometimes thousands) of DTSX packages. These packages embed business logic in XML, C# Script Tasks, SSIS-specific expression syntax, and Windows-only execution dependencies. Moving to Snowflake means leaving the entire Windows ETL ecosystem behind.
The scope is real, but so is the pattern. The majority of SSIS Data Flow transformations (Derived Columns, Lookups, Conditional Splits, Aggregates) map to standard SQL constructs that AI-powered migration tools convert automatically. Script Tasks are the hard part. Everything else follows well-established conversion patterns.
The reason teams migrate now usually comes down to three things: SQL Server licensing costs that keep climbing, the operational burden of maintaining Windows servers just for SSIS, and the broader shift from ETL to ELT. Snowflake’s architecture eliminates the need for a separate ETL runtime by pushing transformations into the warehouse itself. You can get a migration assessment here.
Architecture comparison: SSIS vs Snowflake
SSIS operates as a push-based ETL engine. A DTSX package contains two layers: the Control Flow, which defines the execution sequence of tasks, and the Data Flow, which moves and transforms rows through an in-memory pipeline. Packages run on a Windows server via the Integration Services runtime, and SQL Server Agent handles scheduling. Connection Managers define how SSIS connects to sources and destinations using OLEDB, ADO.NET, ODBC, or flat file providers.
This architecture means SSIS does the heavy lifting outside the database. Data gets extracted, transformed in memory on the SSIS server, and then loaded into the target. Every row passes through the SSIS runtime.
Snowflake flips this model. Data lands in Snowflake first (via COPY INTO, Snowpipe, or third-party connectors), and transformations happen inside the warehouse using SQL. Snowflake’s native capabilities handle the full transformation and orchestration layer: views and stored procedures for logic, Snowflake Tasks and Streams for event-driven scheduling, and Snowflake Scripting for procedural control flow. Many teams also choose to use dbt for version-controlled SQL transformations, and Snowflake now offers a native dbt Core integration in Snowsight, so you don’t need to maintain separate dbt Core infrastructure. But dbt is optional, not required. There is no separate runtime server to maintain.
| SSIS concept | SSIS implementation | Snowflake equivalent |
|---|---|---|
| Package/pipeline definition | DTSX XML file | SQL scripts + Task DAGs (or dbt models) |
| Transformation engine | In-memory buffers on Windows server | Snowflake virtual warehouse (SQL compute) |
| Scheduling | SQL Server Agent | Snowflake Tasks (CRON) or Airflow |
| Change data capture | Custom logic or CDC components | Snowflake Streams |
| Data ingestion | Data Flow with OLEDB/ADO.NET sources | COPY INTO, Snowpipe, Fivetran/Airbyte |
| Reusable logic | Script Components, custom Data Flow components | Snowflake UDF (or dbt macro) |
| Error handling | Precedence constraints + event handlers | Task error notifications + stored procedure exception handling |
| Authentication | Windows credential store / SSPI | Key-pair auth, OAuth, or username/password |
| Connectors | OLEDB, ADO.NET, ODBC, flat file providers | Snowflake stages, Fivetran, Airbyte, native connectors |
The architectural shift from push-based ETL to pull-based ELT is the single biggest mental model change. You stop thinking about “how do I move and transform this data” and start thinking about “the data is already here, now how do I query it.”
Migration challenges and how to solve them
DTSX packages are XML, not SQL
SSIS stores everything in the DTSX XML format. A single package file contains Control Flow tasks, Data Flow components, connection strings, variable definitions, expressions, and layout metadata all mixed together. You cannot just extract the SQL from a DTSX file because much of the logic is defined as XML attributes and component configurations, not as SQL statements.
A Derived Column transformation, for example, lives in the XML as a component with input and output columns, each with an SSIS expression attached as a property. There is no SQL equivalent sitting in the file waiting to be copied. The logic must be reconstructed from the XML structure. Migration tools like Datafold’s Migration Agent parse this XML automatically, extracting the transformation graph and generating equivalent SQL.
Script Tasks depend on .NET and the Windows runtime
SSIS Script Tasks and Script Components let developers write C# or VB.NET code with full access to the .NET framework. Teams use them for everything: calling REST APIs, parsing custom file formats, sending emails, doing regex transformations, and accessing Windows COM objects. This code runs on the Windows server where SSIS is installed.
None of this translates directly to Snowflake. You have three replacement options depending on complexity: Snowflake JavaScript stored procedures for lighter logic, Snowpark Python procedures for heavier data processing, or external functions that call an AWS Lambda or Azure Function for things like API calls and file processing. Script Tasks are the highest-effort items in any SSIS to Snowflake migration, but in our experience they typically account for less than 15% of total package logic. The rest converts through standard patterns.
SSIS expressions use a proprietary syntax
SSIS has its own expression language that looks nothing like SQL. It uses C-style type casts like (DT_WSTR, 10), references variables with @[User::VariableName], and has its own date functions (DATEADD, DATEDIFF, DATEPART that use the same argument order as T-SQL but require the datepart as a quoted string literal, e.g. DATEADD("day", 7, GETDATE())). These expressions appear everywhere: Derived Columns, variable assignments, precedence constraint conditions, and connection string parameterization.
Every one of these needs to be translated to Snowflake SQL or to a variable binding in your orchestration layer. The translation is not difficult for individual expressions, but the volume is what kills you. A package with 40 Derived Columns means 40 individual expression translations. Migration tools handle this pattern-matching work automatically for the standard expression functions.
Windows authentication has no cloud equivalent
SSIS relies on Windows Integrated Security (SSPI) for connecting to SQL Server and other Microsoft services. Packages often use the identity of the SQL Server Agent service account to authenticate, and the Windows credential store holds encrypted connection strings.
Snowflake uses key-pair authentication, OAuth, or username/password. You need to replace every connection manager in every package with a Snowflake-compatible auth mechanism. If your SSIS packages also connect to file shares via UNC paths, those need to move to cloud storage (S3, Azure Blob, GCS) with corresponding Snowflake external stages. This is straightforward but time-consuming to inventory.
Precedence constraints encode complex branching logic
SSIS Control Flow uses precedence constraints to define execution order and branching: on-success, on-failure, and on-completion. These constraints can also include expressions, creating conditional execution paths like “run Task B only if Task A succeeds AND variable X > 100.”
Snowflake Task DAGs support predecessor relationships and runtime conditional branching (via DAGTaskBranch), but they don’t natively support on-failure path routing to a different execution branch. For complex branching, you need to either encode the logic inside stored procedures (using Snowflake Scripting IF/ELSE blocks) or use an external orchestrator like Airflow, which has first-class support for branching and failure handling.
Connection managers cover a wide ecosystem
SSIS packages don’t just connect to SQL Server. A typical enterprise has packages using OLEDB connections to Oracle and DB2, flat file connections for CSV processing, FTP tasks for file transfer, SMTP for email notifications, and ADO.NET connections to various .NET-accessible data sources. Each connection manager type needs a replacement strategy.
Inventory every source system your SSIS environment connects to and find a replacement connector for each one. The connector ecosystem for Snowflake is mature. Between Fivetran, Airbyte, Snowflake’s native connectors, and external stages for file-based loads, coverage gaps are rare. If you’re coming from an environment with heavy FTP/flat-file processing, Snowpipe with auto-ingest on cloud storage replaces the file-watcher pattern that SSIS teams often build with ForEach Loop Containers.
Code translation examples
SSIS expression to Snowflake SQL
SSIS expressions use a proprietary syntax for type casting and string formatting. Here is a common date-formatting pattern:
SSIS Derived Column expression:
(DT_WSTR, 4)YEAR(GETDATE()) + "-" +
RIGHT("0" + (DT_WSTR, 2)MONTH(GETDATE()), 2) + "-" +
RIGHT("0" + (DT_WSTR, 2)DAY(GETDATE()), 2)
Snowflake SQL:
TO_VARCHAR(CURRENT_DATE(), 'YYYY-MM-DD')
The SSIS version is six lines of manual string concatenation with explicit type casts. Snowflake handles it with a single function call and a format string. This pattern (verbose SSIS expression collapsing into a single Snowflake function) repeats across most expression translations.
SSIS Lookup transformation to Snowflake JOIN
SSIS Lookups cache reference data in memory and match rows as they flow through the pipeline. In full cache mode, SSIS loads the entire reference table into memory before processing begins.
SSIS Lookup (conceptual):
Source: orders table
Lookup: customers table (full cache)
Match columns: orders.customer_id = customers.customer_id
Output: customers.customer_name, customers.region
No Match: redirect to error output
Snowflake SQL:
SELECT
o.order_id,
o.order_date,
o.amount,
c.customer_name,
c.region,
CASE WHEN c.customer_id IS NULL THEN 'UNMATCHED' END AS lookup_status
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id;
One thing to watch: SSIS Lookups with partial cache or no-cache mode perform row-by-row lookups against the database. The SQL JOIN equivalent is typically faster because Snowflake processes the entire join at once. But as Snowflake’s SnowConvert documentation notes, SSIS Lookups return the first matching row based on read order, while SQL JOINs are non-deterministic when multiple rows match. If your lookup source has duplicates, you may need to add a ROW_NUMBER() window function to replicate the original behavior.
SSIS Conditional Split to Snowflake CASE
SSIS Conditional Split routes rows to different outputs based on expressions. This is often used to separate good data from bad data, or to route rows to different destination tables.
SSIS Conditional Split conditions:
Output 1 "HighValue": [amount] > 10000
Output 2 "International": [country] != "US"
Default: (everything else)
Snowflake SQL (single query with CASE):
SELECT *,
CASE
WHEN amount > 10000 THEN 'HighValue'
WHEN country != 'US' THEN 'International'
ELSE 'Default'
END AS route_category
FROM source_data;
If the original SSIS package routes each output to a different destination table, you can use INSERT INTO … SELECT with WHERE clauses, or create separate views for each output path. With dbt, splitting this into separate models (one per output) is a common approach.
SSIS Slowly Changing Dimension to Snowflake MERGE
SSIS includes a Slowly Changing Dimension (SCD) wizard that generates a Data Flow for Type 1 (overwrite) and Type 2 (history tracking) updates. The generated Data Flow is notoriously complex, often containing 10+ components chained together.
Snowflake MERGE + INSERT (Type 2 SCD):
-- Step 1: Expire changed records
MERGE INTO dim_customer AS target
USING staging_customer AS source
ON target.customer_id = source.customer_id
AND target.is_current = TRUE
WHEN MATCHED AND (
target.customer_name != source.customer_name
OR target.address != source.address
) THEN UPDATE SET
is_current = FALSE,
effective_end_date = CURRENT_TIMESTAMP();
-- Step 2: Insert new current rows for changed and net-new records
INSERT INTO dim_customer (
customer_id, customer_name, address,
is_current, effective_start_date, effective_end_date
)
SELECT
source.customer_id, source.customer_name, source.address,
TRUE, CURRENT_TIMESTAMP(), NULL
FROM staging_customer AS source
LEFT JOIN dim_customer AS target
ON target.customer_id = source.customer_id
AND target.is_current = TRUE
WHERE target.customer_id IS NULL
OR target.customer_name != source.customer_name
OR target.address != source.address;
A MERGE cannot simultaneously UPDATE a matched row and INSERT a new row for the same source record, so SCD Type 2 requires two statements. These replace what was a 10-component Data Flow in SSIS. If you are using dbt, snapshots handle this two-step logic internally with a single configuration block.
SSIS Script Task (C#) to Snowflake stored procedure
A common SSIS Script Task pattern is calling a REST API and inserting the result into a table:
SSIS Script Task (C#):
public void Main()
{
string url = Dts.Variables["User::ApiUrl"].Value.ToString();
using (var client = new System.Net.WebClient())
{
string json = client.DownloadString(url);
// Parse JSON, insert rows into destination
Dts.TaskResult = (int)ScriptResults.Success;
}
}
Snowflake external function + procedure:
-- Create external function pointing to API gateway
CREATE EXTERNAL FUNCTION fetch_api_data(url VARCHAR)
RETURNS VARIANT
API_INTEGRATION = my_api_integration
AS 'https://my-gateway.execute-api.us-east-1.amazonaws.com/prod/fetch';
-- Call it in a stored procedure
CREATE OR REPLACE PROCEDURE load_api_data(api_url VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
INSERT INTO api_results (payload, loaded_at)
SELECT fetch_api_data(:api_url), CURRENT_TIMESTAMP();
RETURN 'Success';
END;
Snowpark Python procedures can also make HTTP calls using the requests library, but require an EXTERNAL ACCESS INTEGRATION with an allowlisted NETWORK RULE. The setup effort is comparable to external functions; choose Snowpark when you need richer response parsing or conditional logic around the API call.
Feature mapping reference
This table maps SSIS concepts to their Snowflake equivalents. “Direct translation” means the mapping is mechanical. “Redesign required” means you need to rethink the approach.
| SSIS feature | Snowflake equivalent | Translation difficulty |
|---|---|---|
| Derived Column transformation | SQL expressions in SELECT | Direct translation |
| Lookup transformation (full cache) | SQL LEFT JOIN + COALESCE | Direct translation |
| Lookup transformation (partial/no cache) | SQL LEFT JOIN with ROW_NUMBER() for dedup | Moderate redesign |
| Conditional Split | SQL CASE expression or WHERE filters | Direct translation |
| Aggregate transformation | SQL GROUP BY + aggregate functions | Direct translation |
| Sort transformation | SQL ORDER BY (usually unnecessary) | Drop it; Snowflake sorts as needed |
| Merge Join | SQL JOIN (INNER, LEFT, FULL) | Direct translation |
| Union All | SQL UNION ALL | Direct translation |
| SCD wizard (Type 2) | Snowflake MERGE (or dbt snapshots) | Moderate redesign |
| Script Task (C#/.NET) | Snowpark Python procedure or external function | Redesign required |
| Script Component (data transform) | Snowflake UDF or Snowpark Python UDF | Redesign required |
| DTSX Package (orchestration) | Snowflake Task DAG or Airflow DAG | Moderate redesign |
| SQL Server Agent Job | Snowflake Task with CRON schedule | Direct translation |
| ForEach Loop Container | Snowflake Scripting LOOP or Airflow dynamic tasks | Moderate redesign |
| SSIS variables / parameters | Session variables or stored procedure parameters (or dbt vars) | Direct translation |
| Event handlers (OnError, OnWarning) | Alert integration + error logging tables | Moderate redesign |
| Connection Managers | Snowflake stages, storage integrations, secrets | Redesign required |
Script Tasks and Script Components require the most manual effort. Everything else follows predictable conversion patterns.
Data type mapping: SSIS to Snowflake
SSIS uses its own internal data types (prefixed with DT_) for the Data Flow pipeline. These map to Snowflake types as follows.
| SSIS type | Snowflake type | Notes |
|---|---|---|
| DT_BOOL | BOOLEAN | |
| DT_I1 | TINYINT | Signed byte (-128 to 127). |
| DT_I2 | SMALLINT | |
| DT_I4 | INT | 32-bit signed integer. Snowflake INT is NUMBER(38,0) under the hood. |
| DT_I8 | BIGINT | 64-bit signed integer. |
| DT_UI1 | SMALLINT | Unsigned byte (0-255); widen to avoid overflow. |
| DT_UI2 | NUMBER(10,0) | Unsigned 16-bit (0-65535); widen to INTEGER. |
| DT_UI4 | NUMBER(19,0) | Unsigned 32-bit; widen to BIGINT. |
| DT_R4 | FLOAT | Single-precision float. |
| DT_R8 | DOUBLE | Double-precision float. |
| DT_DECIMAL | NUMBER(p,s) | |
| DT_NUMERIC | NUMBER(p,s) | |
| DT_CY | NUMBER(19,4) | SSIS currency type; 8-byte scaled integer. |
| DT_STR | VARCHAR | ANSI string. |
| DT_WSTR | VARCHAR | Unicode string. Snowflake VARCHAR is always Unicode. |
| DT_TEXT | VARCHAR | ANSI text stream. |
| DT_NTEXT | VARCHAR | Unicode text stream. |
| DT_BYTES | BINARY | |
| DT_IMAGE | BINARY | Binary large object. |
| DT_GUID | VARCHAR(36) | Snowflake has no native UUID. Store as VARCHAR(36) and generate with UUID_STRING(). |
| DT_DBDATE | DATE | |
| DT_DBTIME2 | TIME | |
| DT_DBTIMESTAMP | TIMESTAMP_NTZ | Date and time without timezone. |
| DT_DBTIMESTAMP2 | TIMESTAMP_NTZ | High-precision date and time. |
| DT_DBTIMESTAMPOFFSET | TIMESTAMP_TZ | Date and time with UTC offset. |
| DT_FILETIME | TIMESTAMP_NTZ | Windows FILETIME (100ns intervals since 1601); convert to TIMESTAMP. |
The type that causes the most silent issues is DT_DBTIMESTAMP. SQL Server DATETIME (which DT_DBTIMESTAMP maps from) has a precision of 3.33 milliseconds and rounds to increments of .000, .003, or .007 seconds. DATETIME2 and Snowflake TIMESTAMP_NTZ do not round this way. If your SSIS packages rely on DATETIME rounding behavior for deduplication or comparisons, you will get different results after migration.
Best practices for SSIS to Snowflake migration
Inventory your packages and find the dead ones. Run a query against SSISDB.catalog.packages (for the project deployment model used in SSIS 2012+) or msdb.dbo.sysssispackages (for legacy package deployment). Cross-reference with SSISDB.catalog.executions or SQL Server Agent job history to see which packages have actually executed in the last 6 months. In most organizations we work with, 30-50% of SSIS packages are either disabled, broken, or running against tables that no longer matter. Don’t migrate dead code.
Separate data movement from transformation. SSIS bundles extraction, transformation, and loading into a single Data Flow. In Snowflake, split these apart. Use Fivetran, Airbyte, or Snowpipe for data ingestion. Write transformations as SQL views, stored procedures, or dbt models. Snowflake’s native features (views, stored procedures, Tasks, Streams) can handle the full workload. dbt is a popular addition for teams that want version-controlled, testable SQL, and Snowflake offers native dbt Core hosting.
Replace Script Tasks early and test thoroughly. Script Tasks are the highest-effort items because they require rewriting C#/VB.NET in Python or JavaScript. Identify all Script Tasks in your inventory, categorize them by what they do (API calls, file processing, custom transformations, email notifications), and pick a replacement strategy for each category. Don’t leave these until the end of the project.
Validate with value-level data diff, not row counts. Row counts match when you have the right number of wrong rows. After migrating each package, run a column-level comparison between the SQL Server output and the Snowflake output. This catches type conversion issues, rounding differences, NULL handling discrepancies, and off-by-one errors in date logic that row counts will never surface. Datafold’s cross-database data diff was built for exactly this step.
How Datafold automates SSIS to Snowflake migrations
The challenges above have well-established solutions, and Datafold’s Migration Agent automates the majority of the work. The agent parses DTSX packages and extracts the embedded logic: Data Flow transformations, SSIS expressions, variable references, and connection configurations. It generates Snowflake SQL (and optionally dbt models) that reproduce the same data output as the original packages. Standard patterns like Derived Columns, Lookups, Conditional Splits, and Aggregates convert automatically. In our experience, 90%+ of mappings that follow standard patterns are converted without manual intervention. Engineers focus their time on Script Tasks and complex branching logic.
Validation is where the biggest time savings happen. Datafold runs automated cross-database data diffs that compare every column of every table between SQL Server (the SSIS output) and Snowflake (the new pipeline output). When Healthy Directions migrated from SQL Server to Snowflake, they used Datafold’s data diffs to validate hundreds of objects, catching discrepancies that manual testing would have missed. Ingersoll Rand completed a similar migration (400+ stored procedures, Azure Synapse to Snowflake + dbt) in 3 months, 4x faster than the next best alternative.
SSIS migrations tend to run alongside SQL Server to Snowflake migrations because the two systems are deeply intertwined. Datafold handles both the database objects and the ETL logic in a single project, with fixed pricing and a guaranteed timeline.
Get a migration assessment to scope your SSIS package inventory.
Related resources
- Get a migration quote - fixed-price, guaranteed-timeline migration delivery from Datafold
- SQL Server to Snowflake migration guide - often co-migrated with SSIS packages
- Informatica to Snowflake migration guide - migrating the other major enterprise ETL platform
- The opportunity hidden in legacy ETL migrations - why companies are leaving Informatica, Talend, and SSIS behind