Introduction
- As organizations ingest massive volumes of data into Delta Lake, query performance becomes critical, especially for dashboards, ad-hoc analysis, and downstream ETL jobs.
- One powerful technique to reduce query latency and improve data skipping is Z-Order Optimization.
In this article, let’s cover:
- What Z-Ordering is
- How to apply it to single vs. multiple columns
- A real-world implementation using PySpark in Databricks
What is Z-Order Optimization?
Z-Ordering is a powerful technique in Delta Lake that co-locates related data in storage to reduce file scans and boost query performance. It uses the Z-order curve to cluster similar values together, improving data skipping efficiency during query execution.
This is especially effective when filtering on high-selectivity columns like id, timestamp, or region.
Use Case: Optimizing Delta Tables with Z-Order
We’re applying Z-Ordering on two Delta tables:
- account_statuses – Z-Ordered by a single column: id
- buyer logs – Z-Ordered by multiple columns: buyer_mill_id, buyer_group_id, buyer_d, id
Implementation in PySpark (Databricks)
Z-Order on a Single Column

Use Case: Tables with high cardinality columns like id, email, or user_id.
2. Z-Order on Multiple Columns

Use Case: Analytical queries filtering on multiple business identifiers or hierarchical keys.
Why Use Multiple Columns?
Z-Ordering on multiple columns is useful when:
- Queries often filter on combinations of columns.
- The data has a natural hierarchy (e.g., mill → group → buyer → id).
- You want to reduce I/O for multi-filter joins or aggregations.
However, more is not always better. The Z-Order algorithm uses all listed columns to compute ordering, so adding unnecessary columns can degrade performance.
Best Practices
- Choose high-cardinality columns.
- Run OPTIMIZE after bulk inserts or daily loads.
- Use 1–3 columns for best balance.
- Avoid low-cardinality columns like gender or country_code unless partitioned.
- Automate Z-Order as part of ETL or DLT pipelines.
Checking Z-Order History
You can verify that Z-ORDER was applied using:
DESCRIBE HISTORY dev.app_os_db_silver.buyler_logs;

Look for operations where ZORDER BY appears in the userMetadata column
Automate with Multiple Tables
You can loop over many tables with varied Z-Order columns:

Conclusion
Z-Order optimization is a powerful technique for tuning read performance in Delta Lake. Whether optimizing on a single column or multiple, applying Z-ORDER strategically can lead to significant savings in time, cost, and compute resources.
By understanding your query patterns and organizing your data accordingly, Z-Ordering can become a vital part of your performance engineering toolkit, helping you build faster and more efficient data pipelines.