Informatica to Snowflake migration: challenges, patterns, and AI automation
How to migrate Informatica PowerCenter and IICS to Snowflake: mapping conversion, SQL translation, and validation. Covers differences between migrating from PC vs Cloud.
Informatica has been the default enterprise ETL platform for two decades, but it comes in two very different flavors: PowerCenter (on-prem) and Informatica Intelligent Cloud Services, or IICS (cloud-native). If you’re reading this, you’re probably running one or both and planning an Informatica to Snowflake migration. The migration path differs depending on which one you’re leaving behind.
PowerCenter stores transformation logic in proprietary XML mappings built through a Windows-based GUI. There is no SQL to copy. Every mapping, every workflow, every session variable must be reverse-engineered and rebuilt as Snowflake SQL, stored procedures, views, and Tasks — optionally using dbt as a transformation layer.
IICS Cloud Data Integration (CDI) is Informatica’s cloud-native platform. It uses a browser-based designer, stores mappings in a JSON-based format (not the same XML as PowerCenter), and runs jobs through Secure Agents deployed near your data sources. Some organizations run IICS alongside PowerCenter during a transition period, which means you may need to migrate from both at once.
That’s what makes this migration different from database-to-database moves like Oracle to Snowflake or Teradata to Snowflake. Those migrations are mostly about SQL dialect differences. An Informatica to Snowflake migration is a paradigm shift: from GUI-based ETL to SQL-first ELT. The good news is that AI-powered migration tools now automate the bulk of this work, turning what used to be a year-long manual project into something that can be done in weeks.
The driver behind most of these projects is straightforward. Informatica license renewals are expensive, with the company generating $1.6 billion in annual revenue from customers who often feel locked in. Cloud-first mandates are pushing teams off on-premises infrastructure. And the industry has moved from extract-transform-load to extract-load-transform, where raw data lands in the warehouse first and transformations happen in SQL. Snowflake is built for that pattern. Neither version of Informatica is.
PowerCenter vs IICS: what changes for migration
Before getting into the Snowflake migration itself, it’s worth understanding how PowerCenter and IICS differ, because the migration effort is not the same.
| Aspect | PowerCenter | IICS (CDI) |
|---|---|---|
| Deployment | On-prem servers | Cloud-hosted, Secure Agents run locally |
| Mapping format | XML stored in repository database | JSON-based, accessed via REST API |
| Designer | Windows desktop client (PowerCenter Designer) | Browser-based Mapping Designer |
| Execution engine | Integration Service (row-pipeline with grid partitioning) | Secure Agent with optional elastic Spark engine (CDI-E) |
| Connectors | Native adapters bundled with PowerCenter | Cloud-native connectors, marketplace add-ons |
| Orchestration | Workflow Manager (sessions, timers, events) | Taskflows, Schedules, and Monitor service |
| Export method | pmrep CLI or Repository Manager export | REST API export, or IICS Migration Utility |
The practical impact on your Snowflake migration:
If you’re migrating from PowerCenter, you export repository objects as XML using pmrep, parse the XML to extract transformation logic, and rewrite everything in SQL. This is the harder path because the XML format is complex and proprietary. Tools like Datafold’s Migration Agent parse this XML automatically.
If you’re migrating from IICS CDI, the mappings are JSON-based and accessible through Informatica’s REST API. The transformation logic is conceptually similar to PowerCenter (same Expression, Lookup, Joiner, Aggregator pattern), but the format is easier to parse programmatically. The cloud connectors in IICS may already point to cloud-accessible sources, which simplifies the ingestion layer.
If you’re migrating from both (common in enterprises that partially moved from PowerCenter to IICS before deciding to exit Informatica entirely), you’ll have two separate export processes and potentially different transformation patterns for the same data domains. Consolidating these into a single Snowflake pipeline is the right move, but plan for the additional complexity of reconciling duplicate logic.
One thing that’s the same regardless of source: neither PowerCenter nor IICS stores transformations as SQL. Both use visual, GUI-designed mappings with proprietary internal representations. The core challenge of reverse-engineering business logic from a visual ETL tool applies to both.
Architecture comparison: PowerCenter and IICS vs Snowflake
PowerCenter’s architecture revolves around a set of tightly coupled server-side components. The Repository Server stores all metadata: mappings, workflows, sessions, connection objects, and scheduler configurations. The Integration Service executes mappings at runtime, reading from sources, applying transformations through an in-memory row pipeline (with optional grid partitioning for parallelism), and writing to targets. The Client tools (Designer, Workflow Manager, Workflow Monitor) provide the GUI where developers build and manage everything.
A mapping in PowerCenter is a directed graph of transformations: Source Qualifier reads from a source, Expression transformations apply calculations, Lookup transformations join against reference tables, Aggregator transformations group data, and a Target transformation writes to the destination. Sessions wrap mappings with runtime configuration (connection strings, commit intervals, buffer sizes). Workflows chain sessions together with scheduling logic, event triggers, and dependency links.
Snowflake’s architecture looks nothing like this. Storage and compute are separated. Data lands in stages (S3, Azure Blob, GCS) and gets loaded via COPY INTO. Transformations happen in SQL through views, stored procedures, and Snowflake Scripting. Orchestration is handled by Snowflake Tasks and Streams for simple pipelines, or Airflow for anything more complex. Snowflake’s native capabilities can handle the entire transformation layer on their own. Many teams also choose to use dbt on top of Snowflake for version-controlled SQL transformations — and Snowflake now offers a native dbt Core integration in Snowsight, so you don’t need to maintain separate dbt Core infrastructure (though dbt Cloud remains the fuller-featured option for CI/CD and observability). But dbt is optional, not required.
| Informatica concept | PowerCenter | IICS (CDI) | Snowflake equivalent |
|---|---|---|---|
| Metadata store | Repository Server (database) | Cloud repository (REST API) | Snowflake metadata (or dbt project if using dbt) |
| Compute engine | Integration Service (grid-capable) | Secure Agent / CDI-E Spark | Virtual Warehouse |
| Mapping definition | XML in repository | JSON via REST API | SQL view, stored procedure (or dbt model) |
| Runtime config | Session + parameter files | Taskflow parameters | Task definition, session parameters (or dbt vars) |
| Orchestration | Workflow Manager | Taskflows + Schedules | Snowflake Tasks or Airflow |
| Reusable logic | Mapplets | Shared mappings | Snowflake UDF (or dbt macro) |
| Data loading | Source Qualifier | Source transformation | Stage + COPY INTO |
| Connectors | Bundled native adapters | Cloud marketplace connectors | Fivetran, Airbyte, or native connectors |
The key mental shift: both PowerCenter and IICS do everything in one platform. After migration, you’ll have separate tools for ingestion (Fivetran/Airbyte), storage and transformation (Snowflake), and optionally orchestration (Airflow or Snowflake Tasks). dbt is a popular addition for teams that want version-controlled, testable SQL transformations, but Snowflake’s native features (views, stored procedures, Tasks, Streams) can handle the full workload without it.
Migration challenges and how to solve them
Proprietary mappings have no SQL to port
This is the single biggest difference between an Informatica migration and a database migration. When you migrate from Oracle to Snowflake, you have SQL stored procedures you can read, parse, and translate. Informatica mappings are stored in a proprietary format with no underlying SQL.
For PowerCenter, that format is XML in the repository database. You export using pmrep or Repository Manager, then parse the XML to understand what each mapping does. The Transformation Guide describes each transformation type, but there’s no “export to SQL” button.
For IICS CDI, the format is JSON accessible via the platform REST API. The transformation types are similar (Expression, Lookup, Joiner, Aggregator), but the metadata structure is different from PowerCenter XML. If you’ve built tooling to parse PowerCenter exports, it won’t work on IICS exports without modification.
Done manually, this is months of work for a typical enterprise with 500+ active mappings. With AI-powered migration tools like Datafold’s Migration Agent, the parsing and initial SQL generation is automated, and engineers focus on reviewing and refining the output rather than writing everything from scratch.
Joiner, Aggregator, and Lookup transformations need SQL rewrites
PowerCenter’s Joiner transformation joins two data streams in memory. The Aggregator transformation performs group-by calculations with optional sorted input optimization. Lookup transformations fetch reference data, often with caching. Each of these maps to standard SQL, but the translation isn’t always one-to-one.
A Lookup configured with a static cache and a default value on no match behaves like a LEFT JOIN with COALESCE. A Lookup with dynamic caching for insert-or-update logic becomes a MERGE statement. An Aggregator with sorted input and incremental aggregation has no direct SQL equivalent because Snowflake processes entire result sets, not sorted row streams. These translations follow well-known patterns, and migration tools handle most of them automatically. The ones that need manual attention are the edge cases where Informatica-specific behavior (like sorted input optimization) affects output ordering.
Session parameters and variables don’t exist in Snowflake
PowerCenter sessions use parameter files (*.par) and session variables ($$) to control runtime behavior: date ranges for incremental loads, file paths, connection strings, commit intervals. These have no native Snowflake equivalent.
You’ll need to decide where these values live in your new stack. Snowflake offers session parameters and Snowflake Scripting variables for stored procedures, which handle most use cases natively. If you’re using dbt, its var() and env_var() functions provide project-level and environment-specific values. Airflow uses Variables and XCom for passing data between tasks.
Mapplets become UDFs or dbt macros
Reusable transformations (Mapplets) are one of PowerCenter’s better features. A Mapplet encapsulates a set of transformations that can be dropped into any mapping. In Snowflake, the native equivalent is a UDF (User-Defined Function) for pure SQL reuse. If you’re using dbt, macros serve the same purpose.
The catch: Mapplets can contain stateful logic (Sequence Generators, cached Lookups) that doesn’t translate cleanly into a stateless SQL function. You’ll need to identify which Mapplets are truly reusable SQL logic and which are wrappers around stateful behavior that requires a different approach. In practice, most Mapplets fall into the first category and convert to UDFs or dbt macros without issues.
Workflow orchestration needs a new home
PowerCenter Workflows handle scheduling, dependency management, event-based triggers, timers, and error handling. The Workflow Manager GUI makes it easy to chain sessions with on-success/on-failure links.
For simple linear pipelines, Snowflake Tasks with Streams work well. For anything with conditional branching, error handling, or cross-system dependencies, Airflow is the standard choice. Don’t try to replicate complex Workflow logic in Snowflake Tasks alone. Tasks are designed for straightforward scheduling, not for orchestration graphs with 50 nodes and retry logic.
Connector diversity requires new ingestion tools
PowerCenter’s connector library is vast: SAP, Salesforce, mainframes, flat files, FTP servers, web services, and dozens of databases. IICS has its own set of cloud-native connectors that overlap but aren’t identical to PowerCenter’s. When you move to Snowflake, you lose all of them.
The replacement is typically a dedicated EL (extract-load) tool like Fivetran or Airbyte for SaaS sources and databases, combined with Snowflake external stages for file-based loads. If you’re coming from IICS, you may have an easier time here because your sources are likely already cloud-accessible. PowerCenter environments often connect to on-prem systems behind firewalls, which adds an extra step of making those sources reachable from a cloud EL tool.
Inventory every source system your Informatica environment connects to and find a replacement connector for each one. The connector ecosystem for Snowflake is mature at this point. Between Fivetran, Airbyte, and Snowflake’s native connectors, coverage gaps are rare.
Code translation examples
Expression transformation to SQL CASE
Informatica’s Expression transformation uses the IIF and DECODE functions for conditional logic. Here’s a typical Expression port calculating a customer tier:
-- Informatica Expression port
IIF(ANNUAL_REVENUE > 1000000, 'Enterprise',
IIF(ANNUAL_REVENUE > 100000, 'Mid-Market',
IIF(ANNUAL_REVENUE > 10000, 'SMB', 'Startup')))
The Snowflake SQL equivalent:
-- Snowflake SQL
SELECT
customer_id,
annual_revenue,
CASE
WHEN annual_revenue > 1000000 THEN 'Enterprise'
WHEN annual_revenue > 100000 THEN 'Mid-Market'
WHEN annual_revenue > 10000 THEN 'SMB'
ELSE 'Startup'
END AS customer_tier
FROM customers;
The syntax change is minor. The real difference is that the Informatica version operates on a single row flowing through the pipeline, while the Snowflake version is a set-based operation across the entire table. This matters when the expression references previous row values or aggregated state.
Lookup transformation to SQL LEFT JOIN
A connected Lookup in PowerCenter fetches a value from a reference table, returning a default when no match is found. This is the most common transformation type in most repositories.
-- Informatica Lookup: match customer_id, return region_name
-- Condition: CUST.CUSTOMER_ID = LKP_REGION.CUSTOMER_ID
-- Default value on no match: 'Unknown'
-- Output port: REGION_NAME
In Snowflake:
-- Snowflake SQL
SELECT
c.customer_id,
c.customer_name,
COALESCE(r.region_name, 'Unknown') AS region_name
FROM customers c
LEFT JOIN region_lookup r
ON c.customer_id = r.customer_id;
This works for connected Lookups with static caching. Unconnected Lookups (called via :LKP() syntax inside expressions) are closer to a scalar subquery or correlated LEFT JOIN and require individual attention. If the original Lookup used dynamic caching for insert/update detection, you need a MERGE statement instead.
Router transformation to CASE with UNION ALL
PowerCenter’s Router transformation sends rows to different output groups based on conditions. It’s commonly used to split a data stream into insert, update, and delete paths.
-- Informatica Router groups:
-- Group 1 (NEW): LKP_EXISTING_ID IS NULL
-- Group 2 (CHANGED): LKP_EXISTING_ID IS NOT NULL AND LKP_CHECKSUM != SRC_CHECKSUM
-- Group 3 (DEFAULT): all remaining rows
One approach in Snowflake is separate CTEs:
-- Snowflake SQL
WITH classified AS (
SELECT
s.*,
e.id AS existing_id,
e.checksum AS existing_checksum,
CASE
WHEN e.id IS NULL THEN 'NEW'
WHEN e.checksum != s.checksum THEN 'CHANGED'
ELSE 'UNCHANGED'
END AS row_action
FROM staging s
LEFT JOIN existing_records e ON s.business_key = e.business_key
)
-- Use in downstream INSERT/MERGE operations
SELECT * FROM classified WHERE row_action = 'NEW';
If you’re using dbt, splitting this into separate models (one for inserts, one for updates) is usually cleaner than a single query. With native Snowflake, you’d use separate views or stored procedure steps for the same effect.
SCD Type 2 via Update Strategy to Snowflake MERGE
Informatica’s SCD Type 2 pattern uses a Lookup (to detect existing records), an Expression (to compare checksums and set flags), and an Update Strategy transformation (to route rows to DD_INSERT or DD_UPDATE). It’s a multi-transformation pipeline that’s tricky to build and harder to debug.
In Snowflake, you can use a MERGE statement natively. If you’re using dbt, dbt snapshots handle SCD Type 2 with a single configuration block. Here’s the Snowflake-native approach:
-- Step 1: Expire changed records
MERGE INTO dim_customer t
USING (
SELECT * FROM staging_customer
WHERE load_date = CURRENT_DATE()
) s
ON t.customer_id = s.customer_id AND t.is_current = TRUE
WHEN MATCHED AND t.checksum != s.checksum THEN
UPDATE SET t.end_date = CURRENT_TIMESTAMP(), t.is_current = FALSE;
-- Step 2: Insert new current rows for changed and net-new records
INSERT INTO dim_customer (customer_id, customer_name, start_date, end_date, is_current, checksum)
SELECT s.customer_id, s.customer_name, CURRENT_TIMESTAMP(), NULL, TRUE, s.checksum
FROM staging_customer s
LEFT JOIN dim_customer t
ON s.customer_id = t.customer_id AND t.is_current = TRUE
WHERE s.load_date = CURRENT_DATE()
AND (t.customer_id IS NULL OR t.checksum != s.checksum);
A Snowflake MERGE cannot simultaneously UPDATE a matched row and INSERT a new row for the same source record, so SCD Type 2 requires two statements. These replace what was a five-transformation pipeline in PowerCenter. If you’re using dbt, dbt snapshots handle the two-step logic internally with a single configuration block.
Feature mapping reference
This table maps PowerCenter concepts to their Snowflake-stack equivalents. “Direct translation” means the mapping is mechanical. “Redesign required” means you need to rethink the approach.
| PowerCenter feature | Snowflake-stack equivalent | Translation difficulty |
|---|---|---|
| Expression transformation | SQL CASE, COALESCE, built-in functions | Direct translation |
| Filter transformation | SQL WHERE clause | Direct translation |
| Lookup transformation (static) | SQL LEFT JOIN + COALESCE | Direct translation |
| Lookup transformation (dynamic cache) | SQL MERGE statement | Moderate redesign |
| Joiner transformation | SQL JOIN (INNER, LEFT, FULL) | Direct translation |
| Aggregator transformation | SQL GROUP BY + aggregate functions | Direct translation |
| Router transformation | SQL CASE + CTE (or separate dbt models) | Moderate redesign |
| Sorter transformation | SQL ORDER BY (usually unnecessary) | Drop it; Snowflake sorts as needed |
| Update Strategy transformation | SQL MERGE (or dbt snapshots) | Moderate redesign |
| Sequence Generator | Snowflake AUTOINCREMENT or SEQUENCE | Direct translation |
| Stored Procedure transformation | Snowflake stored procedure | Moderate redesign |
| Normalizer transformation | UNPIVOT (flat repeating columns) or FLATTEN (VARIANT/ARRAY) | Moderate redesign |
| XML Generator/Parser | Snowflake PARSE_XML, XMLGET | Moderate redesign |
| Mapplet (reusable) | Snowflake UDF (or dbt macro) | Redesign required |
| Workflow (scheduling) | Snowflake Task or Airflow DAG | Redesign required |
| Session parameter file | Snowflake session params (or dbt vars, Airflow Variables) | Redesign required |
| PowerCenter connector | Fivetran, Airbyte, or Snowflake stage | Redesign required |
Data type mapping: Informatica to Snowflake
Informatica PowerCenter has its own internal type system used during transformations. When moving to Snowflake, these internal types need to map to Snowflake’s native types. Most are direct, but a few deserve attention.
| Informatica type | Snowflake type | Notes |
|---|---|---|
| String | VARCHAR | No length cap in Snowflake. Informatica precision can be dropped. |
| Integer | NUMBER(10,0) | 32-bit signed integer. |
| Bigint | NUMBER(19,0) | 64-bit signed integer. |
| Small Integer | NUMBER(5,0) | |
| Decimal(p,s) | NUMBER(p,s) | Preserve precision and scale. 28-digit max applies only with PowerCenter high precision mode enabled; without it, Decimal promotes to Double (15 digits). |
| Double | DOUBLE | 64-bit IEEE 754. |
| Float | FLOAT | Snowflake FLOAT is always 64-bit IEEE 754 regardless of alias. Expect more precision than source. |
| Date/Time | TIMESTAMP_NTZ | Includes both date and time. Do not map to DATE or you lose the time component. |
| Date | DATE | Date-only type (no time component). |
| Timestamp | TIMESTAMP_NTZ | High-precision timestamp. |
| Binary | BINARY | Raw byte data. |
| Text | VARCHAR | CLOB equivalent. Snowflake VARCHAR defaults to 16 MB but supports up to 128 MB. |
| Nstring | VARCHAR | Unicode string. Snowflake VARCHAR is Unicode by default. |
| Ntext | VARCHAR | Unicode CLOB equivalent. |
| Raw | BINARY | Raw binary data. |
| Number(p,s) | NUMBER(p,s) | Preserve precision and scale. |
| Number (no precision) | DOUBLE | Acts as floating point. Do not map to NUMBER(38,18) unless you want fixed-point. |
| Boolean | BOOLEAN | Direct mapping. |
The most common type-related bug: Informatica Date/Time mapped to Snowflake DATE instead of TIMESTAMP_NTZ. This silently drops the time component. If downstream queries filter by hour or minute, they break. Always check whether the Informatica Date/Time field actually contains time values before choosing DATE vs TIMESTAMP_NTZ.
Best practices for Informatica to Snowflake migration
Catalog your active mappings first. Most PowerCenter repositories contain years of accumulated mappings, and a large portion of them are dormant. Before you migrate anything, export your workflow execution history and identify which mappings actually ran in the past 90 days. In our experience, 40-60% of mappings in a typical enterprise repository are no longer active. Migrating dead code is a waste of time and budget.
Prioritize by data domain, not by complexity. It’s tempting to start with the simplest mappings to get quick wins. Instead, pick a complete data domain (say, all customer-related pipelines) and migrate it end to end. This forces you to solve ingestion, transformation, orchestration, and validation for one coherent set of data. You’ll uncover integration issues early rather than discovering them when you try to connect individually migrated pieces.
Don’t replicate PowerCenter patterns in SQL. The goal isn’t to write SQL that does exactly what each PowerCenter transformation did in exactly the same order. The goal is to produce the same output data. Snowflake can do in a single SQL statement with window functions and CTEs what PowerCenter needed five chained transformations to accomplish. Redesign your logic to be set-based.
Validate with value-level data diff, not row counts. Row counts matching between source and target tells you almost nothing. Two tables can have identical row counts with completely different data in every column. You need value-level comparison across every column to confirm your translated logic produces the same results as the original PowerCenter pipeline.
How Datafold automates Informatica to Snowflake migrations
Everything described above, from parsing proprietary mappings to rewriting transformation logic to validating the output, is exactly what Datafold’s Migration Agent was built to do.
The agent reads both PowerCenter XML exports and IICS CDI mapping definitions, then generates Snowflake-compatible SQL or dbt models. It doesn’t just translate syntax. It refactors procedural ETL patterns into set-based SQL, collapsing multi-transformation pipelines into clean queries. Expression transformations, Lookups, Aggregators, Routers, and Update Strategy transformations are all handled automatically. The 90%+ of mappings that follow standard patterns are converted without manual intervention. Engineers focus their time on the remaining edge cases.
For validation, Datafold’s cross-database data diffing compares the actual output data between your Informatica targets and Snowflake tables, column by column, value by value. This catches the silent data mismatches (decimal precision, NULL handling, date truncation) that row-count checks miss entirely.
The result: teams using Datafold typically complete Informatica migrations 6x faster than traditional approaches, with fixed-price, outcome-based delivery. You get a guaranteed timeline and data parity, not an open-ended consulting engagement.
Related resources
- Get a migration quote - fixed-price, guaranteed-timeline migration delivery from Datafold
- SSIS to Snowflake migration guide - migrating Microsoft ETL to Snowflake
- Oracle to Snowflake migration guide - relevant if your PowerCenter sources include Oracle databases
- The opportunity hidden in legacy ETL migrations - why companies are leaving Informatica, Talend, and SSIS behind