Accelerating Change Data Capture with apply changes in Delta Live Tables (DLT): Simplifying SCD Type 1 & 2 Implementation

Accelerating Change Data Capture with apply changes in Delta Live Tables (DLT)

Accelerating Change Data Capture with apply changes in Delta Live Tables (DLT): Simplifying SCD Type 1 & 2 Implementation

Introduction:

Change Data Capture (CDC) is a crucial component of modern data engineering, enabling efficient tracking and processing of data changes from source systems. Traditionally, implementing CDC required complex and error-prone merge logic.

With Delta Live Tables (DLT) in Databricks, CDC can now be implemented in a declarative, scalable, and reliable manner using the apply_changes feature.

In this blog, we explore how to streamline CDC using apply_changes, focusing on practical implementations of Slowly Changing Dimensions (SCD) Type 1 and Type 2. We’ll also highlight key differences between the two types and the advantages of using DLT over traditional methods.

What is Change Data Capture (CDC)?

Change Data Capture (CDC) is a data integration technique that identifies and captures changes—such as inserts, updates, and deletes—in source systems and propagates them to downstream targets.

Instead of performing full data refreshes, CDC enables incremental updates, ensuring that data pipelines remain efficient, up-to-date, and synchronised in real time or scheduled batch processes.

Understanding Slowly Changing Dimensions (SCD): 

Slowly Changing Dimensions (SCD) is a technique used in dimensional modelling to manage and preserve historical data in dimension tables.

SCD Type 1: Overwrite Without History

  • Behaviour: Overwrites existing records with new data, without keeping historical values.
  • Use Case: Suitable when historical changes are not relevant, and only the latest information is required.

SCD Type 2: Track Full History

  • Behaviour: Inserts a new record for each change, preserving the previous data. Commonly implemented using flags (active/inactive), timestamps, or versioning.
  • Use Case: Ideal when it’s important to retain historical context—for example, tracking a customer’s address changes over time.

What is apply_changes in Delta Live Tables (DLT)?

The apply_changes() function in Delta Live Tables (DLT) simplifies the processing of Change Data Capture (CDC) records by automatically applying inserts, updates, and deletes to a target table.

It works by using specified primary keys and sequence columns (such as timestamps or version numbers) to determine how changes should be applied. This abstraction eliminates the need for complex merge logic, making it easy to implement SCD Type 1 and Type 2 strategies declaratively.

Syntax (Python):

Screenshot 2025 07 15 124355 - delta live tables

Syntax (SQL):

Screenshot 2025 07 15 124003 - delta live tables

Parameters of the apply_changes API:

Parameter Type Description Required
target str Name of the target table to be updated. Use create_streaming_table() to create this table before applying changes. Yes
source str The data source contains CDC records. Yes
keys list List of columns that uniquely identify each row. Example: [“emp_id”]. Yes
sequence_by str or col() A column used to determine the order of CDC events (e.g., based on timestamps or version numbers). Yes
apply_as_deletes str or expr() Expression to treat a CDC event as a DELETE. Example: expr(“dms_operation = ‘D'”). No
except_column_list list List of columns to exclude from the target table. Example: [“dms_operation”, “file_process_ts”]. No
stored_as_scd_type str or int Determines the SCD strategy. 1 for SCD Type 1 (overwrite), 2 for SCD Type 2 (historical tracking). Defaults to 1 if not specified. No
Implementing SCD Type 1 and SCD Type 2 Using apply_changes() in Delta Live Tables:
Delta Live Tables (DLT) enables a unified approach to implementing both SCD Type 1 and Type 2 via a single declarative API call—no separate pipelines or complex merge logic required
Implementing SCD Type 1 Using apply_changes() in Delta Live Tables:
To implement Slowly Changing Dimensions using Delta Live Tables, we typically follow a medallion architecture pattern, starting with the Bronze Table as the raw ingestion layer.
Bronze Table (Data Ingestion Layer):
The Bronze Table acts as the landing zone for raw source data, ingesting change records as-is, including inserts, updates, and deletes. This layer captures all incoming CDC events.
Screenshot 2025 07 15 124525 - delta live tables

I have loaded the full/historical data into the bronze table

SCD 4 - delta live tables

Then, Ingest CDC events (Insert, Update, Delete) incrementally from the source

SCD 5 - delta live tables
Silver Table with SCD Type 1:

To implement SCD Type 1 using apply_changes(), set the parameter stored_as_scd_type=1 in your function call. This configures the pipeline to overwrite existing records with incoming changes, without preserving any historical versions.

Screenshot 2025 07 15 124721 - delta live tables

After running this SCD Type 1 pipeline, the target table reflects the following changes:

SCD 7 1 - delta live tables

In the above screenshot:

  • New records have been inserted, including an employee with emp_id 116.
  • Existing records updated, like the employee with emp_id 105, whose salary changed.
  • Records deleted, including employees with emp_id 104 and 111.

Result: Changes overwrite existing employee records directly, without retaining any history.

Implementing SCD Type 2 Using apply_changes() in Delta Live Tables:

To implement SCD Type 2 with apply_changes(), the pipeline tracks full history by inserting new records for every change, while marking old records as inactive.

Silver Table with SCD Type 2 Tracking

By setting stored_as_scd_type=2 in the apply_changes() call, the pipeline preserves all historical changes along with effective time ranges.

Screenshot 2025 07 15 124852 1 - delta live tables

After running the SCD Type 2 pipeline, the target table contains full history changes over time

SCD 9 - delta live tables

To retrieve only the latest active records, you can filter based on the __END_AT column, which indicates when a record became inactive.

SCD 10 - delta live tables
Add a Derived is_active Flag for Fast Active-Row Queries:

Delta Live Tables (DLT) automatically manages the __START_AT and __END_AT columns to track the validity period of each record in SCD Type 2.

To simplify querying for the latest active records, you can add a derived boolean column, such as is_active, which indicates whether a record is currently active

Screenshot 2025 07 15 125151 1 - delta live tables

This allows you to easily filter for the latest record with:

SCD 12 - delta live tables

Result: Maintains a complete history of changes, with current active records clearly identified.

Benefits of apply_changes() in Delta Live Tables for CDC

  • Easy to Write: Use simple SQL or Python syntax to handle data changes without complex merge code.
  • Automatically Handles Late Data: Manages out-of-order and late-arriving data using a sequence column to ensure accurate processing.
  • Keeps Change History: Automatically tracks when data changes start and end, providing built-in audit capabilities.
  • Scales with Your Data: Dynamically adjusts to efficiently process large and growing datasets.
  • Built-in Data Quality Checks: Includes features to enforce data quality without additional coding.
  • Supports Flexible History Tracking: Easily implement SCD Type 1 (overwrite) or Type 2 (historical tracking) strategies.
  • Simplifies Maintenance: Reduces manual intervention, making pipelines easier to build and maintain over time.

Conclusion

Handling Change Data Capture (CDC) with Delta Live Tables using apply_changes() offers a modern, scalable, and declarative approach to building real-time, auditable, and historically aware data pipelines.

With minimal code, you can easily:

  • Manage inserts, updates, and deletes seamlessly.
  • Implement both SCD Type 1 and Type 2 strategies by setting just one parameter:
    • stored_as_scd_type = 1 → SCD Type 1: overwrite existing records, no history.
    • stored_as_scd_type = 2 → SCD Type 2: preserve history with managed __START_AT and __END_AT timestamps
  • Maintain a clean and efficient medallion architecture (bronze → silver → gold).

Whether you’re migrating legacy CDC logic or building new pipelines on Databricks, adopting Delta Live Tables with apply_changes() can significantly simplify your architecture and reduce operational overhead, enabling more reliable and maintainable data workflows.