Data Warehousing, BI and Data Science

31 January 2019

Column Store Index

Filed under: Data Warehousing — Vincent Rainardi @ 8:57 am

I write this article specifically to answer this question:
Do column store indices improve query performance in data warehousing?

So I’m not looking at data loading performance, and I’m looking specifically data warehousing, i.e. querying fact tables. From past experience I know that the answer is yes if the fact tables are large and the query is selective. Otherwise the answer is no. But I don’t have the numbers. So with this article I would like to measure it. Thank you to Nadir Doctor for suggesting this topic.

Experiment Design

I will first query a small fact table (50k rows) which is indexed the normal way, i.e. clustered on PK, non-clustered on SKs. For this I will use the internet sales fact table on Adventure DW. I’ll execute two queries. The first one will select all rows, and the second one will select only 0.1% of the rows.

I will then create column store indices on this fact table, and execute the same two queries. I expect that there will be no performance improvement because the fact table is small.

Then I’ll add more rows to the internet sales fact table so that it becomes 10 million rows and execute the same queries. I expect there won’t be performance improvement on the select all, but a significant improvement on the selective query.

To summarise:
Experiment Design.JPG

Note: The reason I put the “select all” in there because that’s how OLAP tools works, e.g. Power BI, SSAS, Qlikview. They select all rows from the fact table. Where reporting tool such as SSRS works on selective query (Power BI can be selective too).

SQL Queries and DDL

The query I use is as follows:

select F.SalesOrderNumber, F.SalesOrderLineNumber, DC1.FirstName + ' ' + DC1.LastName as CustomerName, DP.ProductAlternateKey, DP.ModelName, DC2.CurrencyAlternateKey as Currency, F.UnitPrice, F.OrderQuantity, F.SalesAmount
from FactInternetSales F
left join DimCustomer DC1 on F.CustomerKey = DC1.CustomerKey
left join DimProduct DP on F.ProductKey = DP.ProductKey
left join DimSalesTerritory DST on F.SalesTerritoryKey = DST.SalesTerritoryKey
left join DimCurrency DC2 on F.CurrencyKey = DC2.CurrencyKey
where DST.SalesTerritoryCountry = 'United Kingdom' and F.OrderDateKey/100 = 201401

For select all rows I use the same query but without the where clause.

The DDL for create the indexes are as follows:

create clustered columnstore index PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber on dbo.FactInternetSales
create clustered index PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber on dbo.FactInternetSales(SalesOrderNumber, SalesOrderLineNumber)
create nonclustered index FactInternetSales_CustomerKey on dbo.FactInternetSales(CustomerKey)
create nonclustered index FactInternetSales_ProductKey on dbo.FactInternetSales(ProductKey)
create nonclustered index FactInternetSales_SalesTerritoryKey on dbo.FactInternetSales(SalesTerritoryKey)
create nonclustered index FactInternetSales_CurrencyKey on dbo.FactInternetSales(CurrencyKey)

Results

results
Query performance numbers are in milliseconds

So as expected, the column store indices only improve the query performance if the table is large and the query is selective.

Other Considerations

Apart of the query performance, of course we also have to consider the maintenance, i.e. how much effort we need to spend to keep the column store indices up-to-date. Is it a lot more than row store indices? (that is beyond the remit of this article)

References:

  1. Column Store Index – Design Guidance: link
  2. Column Store Index – Data Warehouse: link
  3. Niko Neugebauer on Column Store: link
Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: