When working with large-scale data in Databricks Delta Lake, it’s common to create copies of tables for testing, development, or archival purposes. However, not all copies are created equal. In Delta Lake, shallow copy and deep copy serve different purposes and have very different behaviors — both in terms of performance and data isolation.
In this post, we’ll break down:
- What shallow and deep copies mean in Delta Lake
- How to create them
- When to use each
- Common pitfalls
What is a Shallow Copy?
A shallow copy of a Delta table creates new Delta table metadata, but the data files are referenced, not duplicated. Think of it as a symbolic link to the original data — it’s fast, space-efficient, but not independent.
- Only metadata is copied — data files are shared with the source table
- Very fast to create (almost instant)
- Minimal storage overhead
- Both the source and the shallow copy point to the same physical data files
- Ideal for:
- Read-only data exploration
- Schema evolution testing
- Quick development or experimentation
- Not safe for destructive changes — changes in one table affect the other
Creating a Shallow copy



Or, with a specific version:




What is a Deep Copy?
A deep copy creates a complete and independent copy of both metadata and data files. It’s a true, physical duplication.
- Metadata and data files are fully copied
- Results in a completely independent copy
- Slower to create (due to physical data duplication)
- High storage usage
- Provides full data isolation
- Ideal for:
- Backup and recovery
- Sandboxed or destructive testing
- Migrating data across environments
- Safe for modifying data — does not affect the source table
How to Create a Deep Copy:



Conclusion:
Modifying a Shallow Copy:– If you delete or overwrite data in a shallow copy, it may impact the original table unless you create a deep copy first.
Misusing in Production:– Shallow copies are great for quick duplication, but using them in workflows that modify data or depend on isolation can cause unwanted side effects.
Real-World Use Cases
Use Shallow Copy When:
- You need a fast copy for a quick query test
- You want to explore the schema at a specific version
- You’re developing read-only dashboards
Use Deep Copy When:
- You’re archiving data
- You want to test destructive transformations safely
- You’re moving data across environments (e.g., dev → prod)