I wrote about performance tuning on SSAS about 5 months ago. That was on both processing performance and query performance. This time I would like to write only about processing performance but in a bit more detail.
So the usual case is something like “my cube processing takes 5 hours, can we reduce it to 2 hours?” Now what do we need to do? The wrong thing to do is blaming the hardware. The right thing to do is to systematically find out which part of the cube takes a long time to process.
Step 1. Identify the Bottleneck
When it takes a long time to process your cube database, the first thing to do is to get information about how long each dimension and measure group take. The purpose of this “time measurement” exercise is to identify the top issues, i.e. which part of the cube takes the longest time to process.
This can be done in 2 ways: a) using either BIDS or SSMS to process the cube, or b) use SSIS processing cube task. The advantage of using SSIS is that the process is logged so you can get the timing information in detail. But the disadvantage is that it’s slightly less readable than the process dialog box on BIDS and SSMS. If you have complete control of the servers (meaning that you can run / process anything you like, whenever you like, as in Dev SSAS server), I would recommend to process it in SSMS first to quickly identify the top issues, and then if you need the detail you can use SSIS.
The “processing output” window gives us information about the time it took to process each dimension and each measure group. If your cube contains 10 dimension, and dimension 1 to 9 took 1 minute each but dimension 10 took 30 minutes, then take a note that dimension 10 is an issue and you will need to look at it.
If your cube contains 10 measure groups (MG), and MG1 to MG9 took 5 minutes each but MG10 took 3 hours, make a note that MG10 is an issue and you will look at MG10 later. Apologies with pointing out the really obvious above, but I want us to be clear on how to identify which ones are the issues.
What if MG10 takes 3 hours and MG9 takes 1 hour, do you need to worry about MG9? Well at the moment, no. But when you can get MG10 to process in 30 minutes, then MG9 becomes an issue. This is because the MGs are processed in parallel (see next paragraph*). So after reducing MG30 to 30 minutes, MG9 is now the bottleneck.
*The number of parallel threads depends on the “Maximum numbers of connections” settings on the Data Source, the default is 10). Each thread processes not an MG, but partitions within the MG. So if MG1 has 30 partitions, AS might not process MG2 to MG10 during the first minute, because all 10 threads are being used to retrieve data for MG1 partitions.
Step 2. Dimensions
Now that you know that the bottleneck is (for example) in Dimension 10 and Measure Group 10, we could work on them. Let’s do dimension first.
When processing a dimension, AS does “select distinct” on each attribute. Say your dimension has 3 attributes: attr1, attr2, attr3. Your dimension table (dim10) has 4 columns: dimkey, attr1, attr2, attr3. To process attr1, AS basically does “select distinct attr1 from dim10”. Then it does “select distinct attr2 from dim10”, and so on.
So far so good. But if your dimension is fed not from a physical table, but from a view, and this view joins 20 physical tables, then it could take a long time to process the dimension. Why? I’ll explain why.
Say your dim10 is created from view10 and it has 40 attributes. View10 takes 3 minutes to run, joining 20 tables, returning 100,000 rows. To process attribute1, SSAS does “select distinct attribute1 from view10”, which takes 3 minutes. Then it does “select distinct attr2 from view10”, which takes another 3 minutes. And so on until attr40. So it will take 3×40 = 120 minutes = 2 hours for SSAS to process dim10.
So what do we do when we face this situation? We need to materialise that view. Open the view definition, which is something like “create view view10 as select … from … inner join … left join … where …” Create a table to store the data on this view. This is can be done simply by using “select into”, i.e. “select … into table10 from … inner join … left join … where …” This will create table1 that contains view10 rows.
Now before you can process the dim, you need to repoint dim10 on the DSV from view10 to table10. Right click the dim10 table on DSV, choose “Replace Table” and choose “With Other Table”. Select table10 and click OK, now process the cube. It will finish in 3 minutes. Why? Because the data is ready on table10. SQL Server doesn’t need to run view10 every time to get the data from the underlying 20 tables.
What if your DSV doesn’t use a view, but using a named query instead?
Same thing. Materialise the named query. Copy the SQL on the named query and paste it onto a blank query window on SSMS. Add “into table X” to the select statement to create a physical table and then execute it. Then repoint the named query on the DSV to table X.
Big Dimension Table
If the materialised dim table is under 100,000 rows, select distinct would finish in a few seconds. But if it’s 100 million rows (e.g. dim_customer in CRM data warehouse) then it could take 20-30 minutes to do select distinct on each column. In this case, we need to index that column. It is best to index the column separately/individually, i.e. if your dimension has 5 attributes, create 5 different indexes. Each index contains 1 key column. This is because SSAS issues a “select distinct” on each column individually. Don’t add extra columns like when you are creating a “covering index”. Index only the columns that are used by the cube. Don’t use columns that are not used by the cube.
Step 3. Measure Groups
Having tuned the dimensions, we now tackle the measure groups. The keyword here is partitioning. We need to partition both the table and the cube.
To partition the table, first create additional file groups (FG). Most databases use only 1 FG. We need to create 8 or 16 new FGs, depending on the disks on the server. If we only have 1 logical volumes, then create between 6 and 9 additional FGs. This number comes from experience. This number works for both RAID 5 and RAID 10 disks. If you want a precise, very optimum setting, then you will have to try different numbers of FGs e.g. 4, 8, 12 for example, then create the partitioned table, and try querying the table to see the performance.
If you have multiple logical volumes, you are lucky. You will get better query performance. Say you have 4 logical volumes. Create 16 FGs and place them into round robin fashion into those volumes, i.e. FG1 on vol1, FG2 on vol2, FG3 on vol3, FG4 on vol4, FG5 on vol1, FG6 on vol2, and so on.
Details of Partitioning
You can use the GUI but if you want to script it, the command to create the FGs is:
USE master GO ALTER DATABASE DataWarehouse ADD FILEGROUP FG2 ALTER DATABASE DataWarehouse ADD FILEGROUP FG3 ALTER DATABASE DataWarehouse ADD FILEGROUP FG4 etc
It’s a matter of taste but I always start from FG2 because we already have 1 FG, i.e. the Primary file group.
Now you need to create the additional database files and add them to those FGs:
ALTER DATABASE DataWarehouse ADD FILE ( NAME = N'Data2', FILENAME = N'H:\Path\DataWarehouse2.ndf', SIZE = 20GB , FILEGROWTH = 5GB ) TO FILEGROUP [FG2] ALTER DATABASE DataWarehouse ADD FILE ( NAME = N'Data3', FILENAME = N'H:\Path\DataWarehouse3.ndf', SIZE = 20GB , FILEGROWTH = 5GB ) TO FILEGROUP [FG3] etc
Note: calculate the size and growth based on the size of the original table (right click table name, reports, standard reports, disk usage by top table). If the original table is 200 GB, and you have 8 FGs, then 200/8=25 so set the initial size = 30 GB with 5 GB growth increment. Don’t use %, use absolute as the growth is more controlled. Large increment is generally better, provided you have the disk space. Both the size and the increment must be uniform across all FGs. This is because we intend to distribute the data equally across those FGs.
Now create the partition function and partition scheme:
USE DataWarehouse GO CREATE PARTITION FUNCTION PF_MONTH (int) AS RANGE LEFT FOR VALUES ( 1, 2, 3, … 98, 99, 100) CREATE PARTITION SCHEME PS_MONTH AS PARTITION PF_MONTH TO ( FG2, FG3, FG4, FG5, FG6, FG7, FG8, FG9, FG2, FG3, FG4, … FG2, FG3, FG4, FG5, FG6 )
The number of FGs on the partition scheme = the number of values on the partition function plus 1, i.e. in the above example, the number of FGs should be 101.
Try to keep the number of rows in each partition under 10 million. 2-5 million would be ideal but 20 million is not good. It’s too big and too slow. This is the physical table partitioning we are talking about, not cube/measure group partitioning. As far as possible, the distribution of the data should be uniform. Meaning that the number of rows is roughly the same in all partitions. It’s not good to put 10 million rows on partition 1, but only 1 million on partition 2. So pick your partition key carefully. A column with many distinct values is better for partitioning key. Tailor the partitioning function ranges to group them equally. Sometimes we can’t distribute the rows equally because it depends on how the data is spread across the partitioning key, but try your best because the loading performance and the query performance of that partitioned table depend on this distribution profile.
Next create the table and place it on the partition scheme:
CREATE TABLE Partitioned_Fact_Table ( Fact_Key bigint IDENTITY(1,1) NOT NULL, Month1_Key int, Dim1_Key int, Dim2_Key int, etc Measure1 money, Measure2 decimal(18,6), etc CONSTRAINT [PK_Partitioned_Fact_Table] PRIMARY KEY CLUSTERED ( [Month_Key], [Fact_Key] ASC ) ) ON PS_MONTH(Month_Key)
Now load the data in:
INSERT INTO Partitioned_Fact_Table WITH (TABLOCK) ( [columns] ) SELECT [columns] FROM Original_Fact_Table WITH (NOLOCK)
a) The purpose of TABLOCK is so that the insert is done minimally logged. Meaning that the load will be faster.
b) The purpose of NOLOCK is so that the query is pessimistic (dirty read is OK), allowing other queries to run on the source table.
c) If you have 500 million rows this query could run for 4 hours, depending on your RAIDs. So try with 1 million rows first, i.e. SELECT TOP 1000000 FROM …, and see how much time a million takes. If it takes 30 seconds to do 1 million then it will be 120 million per hour. If it’s more than 100 million rows, you need to add approx 30 minutes to this calculation as there is an overhead at the beginning of the process. So 360 million is 3.5 hours.
d) During the loading process do not issue select count(*) to get the number of rows as this interfere with the “select into” that is running. Do not right click on target table name on the explorer and choose properties, storage to get the row count either. That also interferes. If you want to know the progress (number of rows), query the partition system view:
select sum(P.rows) from sys.partitions P with (nolock) join sys.tables T with (nolock) on P.object_id = T.object_id where T.Name = 'Partitioned_Fact_Table'
e) During the first 30-45 minutes that query on d) above return 0. Meaning that SQL Server doesn’t immediately load the table, but there’s a delay. But after it loads the first row it will then grow steadily and quickly. Quicker than SSIS workflow. Using SSIS workflow the rows immediately landed on the target, but the pace is lower. I’ll blog this (with numbers) when I got a chance later.
f) Remember that the partition key must be part of the primary key. Even if your PK candidate is unique across partition, you will still need to include the partition key in the primary key.
Step 4. Indexes
If your MG query involves a “group by” or a join, the best thing to do is to go back to your data warehouse designer and ask for that summary fact table to be created. Overall it is more efficient to do it that way than you (cube developer/designer) having to issue a “group by” and join query, which is very costly. ETL tools like Informatica and DataStage (WebSphere) has specific transformation for calculating aggregates and doing joins and they operate very efficiently. Doing it at SQL engine directly for a large volume of data is looking for trouble: a) your tempdb will grow enormously. We are talking 200, 400 GB here, b) SQL engine will need to sort the data before the “group by” can take place and this takes a lot of time. We are talking 1-2 hours or even 4 hours, before the first row is returned.
Well enough said. If after all that warning you still insist to do the aggregation and join on SSAS, read on. You will need to index your partitioned fact table. Here are some of the principles:
a) Index columns used in the joins. Index them individually. If for example you have 2 joins, the first one hits the fact table at column 3 and the second one at column 5, then create 2 indexes. One for column 3 and one for column 5. SQL needs to sort the column before entering the join. These indexes will make that sorting quicker.
b) Index the columns used in the “group by”. This needs to be a “covering index”, meaning that one index with many keys. The order of the keys needs to match the order of the group by. The purpose of doing this is the speed up the sorting. More than 50% of the cost is on the sorting. Sometime 80%. So it is a “must” that you have a covering index for your “group by” query. Otherwise the query would be painfully slow as SQL is doing table scan. If your “group by” on the partition query is not “big buckets” to “small buckets” then reorder them.
c) If the “sum” column on your “group by” query is only 1 column, add this column to the covering index. This prevents the SQL engine from going to the base table to get the measure, as the measure column is conveniently included in the index file.
d) Selectivity. In indexing, selectivity is the name of the game. If it’s not selective it’s not used. Selective means: out of 1 million rows, SQL select only 10 rows. If out of 1m rows SQL select 500k that’s not selective, index won’t be used. So how do you ensure selectivity? Number of distinct values. If it’s only 2 or 5, don’t bother indexing it (there are exceptions, for example if it’s involved in multiple key). If it’s 15k distinct values it’s will be used (again there are exceptions, for examples if it’s out of date, or multiple key).
e) Sort. The main purpose of an index is to sort a column. Before SQL can arrive at a certain value, it needs to sort it first. So if you see a sort icon on the query plan, think if indexing that column would help or not.
f) Maintenance. This is the must important thing of indexing. So please remember this well. It’s all very well indexing here and there, but remember that you have to maintain that index. Every insert on the fact table SQL needs to update the index (in DW we don’t usually delete or update the fact table, only on dimension table we do those). Inserting a row to the table page is quick – it goes to the last page. But then SQL needs to insert that row into the index as well, and that’s slow. So if you have 24 indexes on your fact table, totalling of 150 GB, bear in mind that your fact table load will suffer. If without any index your daily load on that fact table is 1 hour, with those 24 indexes your load could become 3 hours. So weigh the benefit of each index against the cost of maintaining it. How do you know how much is the benefit and how much is the cost? Simple. Measure it! For the benefit: execute your query before and after indexing. For the cost of maintenance: do your insert before and after indexing.
Verify with estimated execution plan. I know it’s only an estimated, not actual, but SQL engine is very good at predicting what’s going to happen. 99% of the time it’s usually true. So you don’t need to do the actual execution plan. But if you have to verify using the actual plan, don’t use the original partitioned fact table. You could be spending hours if you do that. Instead, do a “select into” to create a copy of that big table but load only 1 partition into it. Or, you can use the original table, but limit the query to only use 1 partition by adding a where clause. The purpose of this exercise is so that we get an actual execution plan that mimic the true data behaviour, but at a fraction of the time.
What to see in the estimated execution plan?
a) The “group by” is doing “index seek”, not “table scan”
b) The join sorting is using the index.
c) General observations such as: total cost (left most icon), parallelism (crucial when dealing with partitioned table), sort icons (would an index help)
It might be difficult to interpret the total cost. For guidance: 5000 means 3 minutes. Well, you know as well as I do that it’s never right or precise, because it depends on the hardware. But it’s useful as guidance. So if the cost is 50,000 it’s half an hour. And if it’s 500,000 you start to worry and look for ways to optimise your query.
Step 5. Queries hitting the same table
See if any of the DSV queries (particularly the MGs/Measure Groups) are hitting the same table or not. Example: MG1, MG2, MG3 and MG4 are all querying table1. Or worse: MG1 and 2 are querying table1, and MG3 and 4 are aggregating table1 (aggregating means “group by”).
SSAS processing is parallel. Because of that if you have 10 partitions on MG1 and 10 partitions on MG2 and they are hitting the same table, some waiting will occur. Thread 1 can’t query table1 because it is being queried by tread 4, 5, 6 and 7. I’m not talking about SQL, I’m talking about I/O, i.e. physical disk. SQL would happily issue 5 threads hitting the same table at the same time, but the RAID stripes are access sequentially. While that stripe is being read by thread1, other thread needs to wait (not if that stripe has been cached into the controller’s memory or if SQL has put the data in the server memory – hence having 128 GB RAM in your DW SQL Server helps a lot in speeding up the queries). This prolongs the processing time. From the cube processing side, the ideal solution is to have a second table. But that’s not efficient from modelling and ETL point of view, so that doesn’t usually occur in real life. So usually you have to survive with only 1 table.
What can we do to resolve this? Break that MG into as many partitions as possible. And partition the underlying table into as many partitions as possible. So those MGs partitions will be hitting DIFFERENT table partitions at the same time.
As usual I welcome any feedback and comments on email@example.com
Vincent Rainardi 18/11/2010