Reducing Dataset Size Without Sacrificing Insight

Introduction:

Power BI empowers organizations to turn data into actionable insights. However, as datasets grow in complexity and volume, performance bottlenecks, slow report loads, and publishing restrictions can arise.

One common misconception is that reducing dataset size means compromising on analysis or losing valuable data. But with smart modeling and design strategies, you can shrink your dataset size while still delivering rich, insightful reports.

Business Challenge:

Any organization building a Power BI report sourcing from millions of transactional records. The model started lagging as:

Dataset exceeded size limits (especially with Power BI Pro)
The report took 20+ seconds to load visuals
User filters and slicers became unresponsive
Publishing to the Power BI Service resulted in timeouts

They could not simply remove historical or granular data as business leaders relied on this data for trend analysis, forecasting, and segmentation.

Business Use Case:

I recently worked on a Retail Sales Report where the dataset was 1.5 GB in memory, with over 100M rows in the fact table and 20 dimension tables. The initial report load time was around 40 seconds, which was not ideal for a fast-paced business environment.

The business needed a flexible yet optimized reporting experience with:

Daily transaction-level data for the last 3 months
Monthly summarized data for the last 5 years

Proposed Solution:

Here is how I cut the dataset size without impacting reporting capabilities:

Aggregation Tables (Summary Tables)

Created a summarized table for sales by month.
Used DAX and Power Query to summarize only older data.
Retained granular detail only for the last 3 months.

How to create Aggregation Tables in PBI:

Step 1: Identify Granularity of Aggregation:

Decide the level of summarization, such as:

By Product and Month
By Region and Category

Step 2: Create the Aggregation Table (in Power Query or DAX):

Go to Transform Data.
Reference your fact table.
Use Group By to summarize:
Group by: Product, Month
Aggregations: Sum(Sales), Sum(Quantity), etc.
Rename the query to something like Sales_Agg.

Step 3: Set Up Relationships:

Connect your aggregation table to the same dimension tables (Date, Product, etc.) used in your original fact table.
Do not connect it to the main fact table.

Step 4: Use in Visuals:

Use the same dimensions (Product, Date, etc.) in visuals.

Incremental Refresh

Implemented incremental refresh on the fact table.
Only new or changed data was refreshed daily.
Applied RangeStart and RangeEnd parameters in Power Query.

Prerequisites for Incremental Refresh

Your table must contain a date/time column.
You’re using Power BI Pro (with Premium capacity) or Power BI Premium/Per User (PPU).
The dataset must be published to the Power BI Service for incremental refresh to take effect.
You must enable loading to the data model (can’t disable loading for the table).

How to apply Incremental Refresh in PBI:

Step 1: Import Data in Power BI Desktop:

Go to Home > Transform Data.
Load the table that contains a Date or DateTime column (e.g., OrderDate).

Step 2: Filter with Parameters in Power Query

In Power Query Editor, create two parameters:
RangeStart (DateTime) → e.g., 2023-01-01
RangeEnd (DateTime) → e.g., 2025-01-01
Use these parameters to filter your table. This enables Power BI to partition data during refresh.

Step 3: Set Up Incremental Refresh in Desktop

Go to Model View.
Right-click on your table → Incremental refresh and real-time data.
Set the refresh policy:
Store rows from the past → e.g., 5 years
Refresh rows in the last → e.g., 10 days (change detection window)
Click Apply.

Step 4: Publish to Power BI Service:

Save your PBIX file.
Publish to a Premium workspace in the Power BI Service.

Step 5: Configure Scheduled Refresh:

In Power BI Service, go to:
Workspace > Dataset Settings > Scheduled refresh

Enable and configure:
- Refresh frequency (daily, hourly, etc.)
- Credentials for the data source

Power BI will now only refresh new or updated partitions based on the OrderDate filter.

Column Reduction & Data Type Optimization

Removed unnecessary columns (e.g., raw IDs, unused flags)
Changed column types from Text to Categorical, and from Float to Decimal Number

Star Schema Restructure

Flattened snowflake schema into a cleaner star schema.
Combined lightly used dimension tables.

Filter-Driven Load (User Filtering)

Used parameterized queries in Power Query to only load relevant data.

Conclusion:

Reducing the dataset size in Power BI does not mean reducing insight. With a thoughtful approach like aggregating old data, leveraging incremental refresh, and optimizing data models, you can maintain analytical depth and ensure blazing-fast performance.

References:

https://learn.microsoft.com/en-us/power-bi/guidance/power-bi-optimization

Reducing Dataset Size Without Sacrificing Insight