Audit, Analyze, Optimize: The Power of Databricks System Tables

Audit, Analyze, Optimize: The Power of Databricks System Tables

Audit, Analyze, Optimize: The Power of Databricks System Tables

What Are Databricks System Tables?

Databricks system tables are specialized metadata tables that provide insights into various aspects of your Databricks environment, including query execution, pipeline performance, and resource utilization. These tables are essential for monitoring, auditing, and optimizing your data workflows.

main diagram - databricks system tables

Understanding Query History

The system.query.history table logs all SQL queries executed across your Databricks workspaces. This includes queries run via SQL warehouses, notebooks, and jobs. By querying this table, administrators can: Query history system table reference | Databricks Documentation

  • Identify long-running or failed queries.
  • Monitor user activity and resource consumption.
  • Audit data access and modifications.

Example Query:

understanding query history - databricks system tables

This query retrieves all queries executed by a specific user since July 1, 2025, providing insights into their activities and performance. Query history system table reference | Databricks Documentation

Monitoring Pipeline Costs

Databricks provides the system.billing.Usage table to track the cost associated with various workloads, including Delta Live Tables (DLT) pipelines. This table records DBU (Databricks Unit) consumption, allowing organizations to: Databricks Documentation

Example Query:

monitor pipeline cost - databricks system tables

This query calculates the daily DBU consumption for a specific DLT pipeline during July 2025, aiding in cost analysis and optimization. Databricks Documentation

Analyzing Compute Usage

The system.billing.usage table also captures compute resource utilization, including clusters, SQL warehouses, and serverless compute. By analysing this data, organizations can: Monitor costs using system tables | Databricks Documentation

  • Determine the most resource-intensive workloads.
  • Optimize cluster configurations.
  • Ensure efficient use of resources.
compute usage - databricks system tables

This query provides a daily breakdown of DBU consumption by node type, helping identify which instance types are most cost-effective. Databricks Documentation

Best Practices for Managing System Tables

To effectively leverage Databricks system tables:

  • Access Control: Restrict access to system tables to authorized users only.
  • Data Retention: Implement policies to archive or purge old data to maintain performance.
  • Regular Audits: Schedule periodic reviews of system table data to identify anomalies or inefficiencies.
  • Automation: Automate the extraction and analysis of system table data for continuous monitoring.

System Tables:

Table

Description

system.access.audit

Regional audit logs: monitors user activity and service events.

system.access.table_lineage​ & column_lineage

Tracks data read/write lineage at table and column levels.

system.billing.usage

Global DBU usage by hour, workspace, SKU, and compute resource.

system.billing.list_prices

SKU list prices (Type-2 SCD): supports historical pricing queries.

system.compute.clusters, node_types, node_timeline

Historical and real-time cluster and node metadata, used for cost/performance analysis.

system.compute.warehouses, warehouse_events

Logs SQL warehouse details and scaling events.

system.lakeflow.* tables

Pipeline/job metadata and execution timelines.

Others:

Predictive optimization, assistant usage, network access events, and more

Conclusion

Databricks system tables are powerful tools for gaining visibility into your data workflows, monitoring performance, and optimizing costs. By effectively utilizing these tables, organizations can ensure efficient and cost-effective operations within the Databricks environment.

Power BI Integration Example

To visualize and analyse Databricks system table data in Power BI:

  1. Connect Power BI to Databricks:
    • Use the Databricks ODBC or JDBC connector to establish a connection between Power BI and your Databricks workspace.
  2. Import System Table Data:
    • Query the desired system tables (e.g., system.query.history, system.billing.usage) within Power BI to import the data.
  3. Create Visualizations:
    • Utilize Power BI’s visualization tools to create dashboards and reports that provide insights into query performance, cost analysis, and resource utilization.

This integration allows for real-time monitoring and analysis of your Databricks environment within Power BI.

-Korrakuti Nikhil Kumar
Senior Data Engineer