One of the feature in SQL Server 2008 that is useful for data warehousing is backup compression. In data warehousing, backup is usually an integral part of the overnight ETL / data loading process. In the overnight warehouse ETL, there are 4 points where backup or restore is normally located:
- Backup of the source system, and restore it in the warehouse server.
- Backup of stage database, in case we need to reload today’s data incremental extract. Or yesterday’s.
- Backup of the ODS, after it is updated successfully.
- Backup of the data marts, after they are updated successfully.
- Backup of the main warehouse, after loading. In some DW implementations, not only one but several backups, usually differential.
- Restore of DW backup, in SSD (scalable shared databases) – technique to speed up DW queries by having multiple read-only copies of the warehouse.
- Restore of DW backup, in DR server, in the event of a disaster, or during routine DR tests.
Ok, that’s more like 7 points rather than 4.
Now, the thing about DW backup is: it’s a big database. The main DB of a DW varies from 50 GB to 5 TB*. The backup time varies from 10 mins to 3 hours*. If your DW is at the high end of this range, you will already use backup compression would give a big benefit. And we don’t only keep 1 copy of backup on disk. Many DW operations keep 3-5 copies on disk, and many more on tapes. So if backup compression can save 50% of disk space, and 50% of backup time, well, it would be useful. It would be very useful. It could be a reason to upgrade to 2008. It could be a reason to switch to SQL Server.
*well that probably covers about 90% of the population. There are exceptions, as always. Remember we are talking SQL Server data warehousing, and today is 2010. If today was 2005, we will be talking about a completely different sizes of DW. Probably in the range of 5 to 50 GB. A million row fact table in 2005 was something to talk about. Today, there’s nothing special about a billion row fact table. Remember we are talking SQL Server. SQL Server! Hmmm, how fast SQL DW grew in the last 5 years. In 2000 the concept of SQL Server DW wasn’t even there. Back then the word “SQL Server” and “data warehouse” were never in the same sentence. In 2000 people compared SQL Server with Microsoft Access. In 2000 DW was something that normally associated on Teradata, DB2, at least Oracle. But not SQL Server. And now SQL Server PDW hosting 50 TB DW. Ah, something to blog about next time: “SQL Server data warehousing: then, now and future”.
True that there are backup compression products out there, such as LightSpeed. So if your DW is > 500 GB the question is not about whether to compress or not to compress. You have to compress, period. Both for speed / time saving and for the disk/space saving. The question is: which software do you use to compress. Well the answer is sort of obvious really. Try those products and use the one that gives the highest compression ratio in the shortest possible backup time. With one usual note: budget permitting.
But there is one big advantages of using SQL Server backup compression, compared to any backup compression software: anybody can restore backup. If you use Litespeed for example, to restore the backup you also need to have LiteSpeed. So you end up with installing Litespeed on many servers. Copying databases from production to development SQL server for example, you need to have LiteSpeed installed on both servers. Restoring prod backup in DR server for example, you need to have LiteSpeed installed on DR.
But using 2008 backup compression it comes as standard. And it’s only a matter of time before all SQL Servers in the company are upgraded to 2008, right? So in the end everybody will throw away their backup compression software, and replace it with SQL Server. IF (and that’s quite a big IF) SQL Server backup compression can squeeze the database to a good size and do it fast.
In 2008 R2, backup compression will drop to Standard edition. Currently (R1) it is in Enterprise edition. A compressed backup can be restored by all editions of 2008. Obviously 2005 can’t restore them.
So how do you do a compressed backup? Simple, just add “with compression”, like this:
backup database Test to disk = ‘c:\test2.bak’ with compression
Or, use GUI: Right click the database, Tasks, Backup:
Then click Options pane, and select Compressed Backup:
And how to restore it?
Simple, same way really. No difference whether the backup we are restoring is compressed or not:
restore database from disk = ‘c:\Test.bak’
And using GUI:
Now, let’s test how small the database becomes after compression.
For this purpose I prepare a database Test, which is a dimensional database, containing 2 fact tables and 3 dimension tables. The fact tables contain 1 million rows each with 2 measures, and the dimension table 1000 rows each. I’m going to backup this database without compression first. Then with compression. I will repeat each test 3 times, and average the results.
— Backup Compression
backup database Test to disk = ‘c:\test.bak’
28.579 sec (7.543 MB/sec), 221,492 KB
24.225 sec (8.898 MB/sec), 221,492 KB
26.956 sec (7.997 MB/sec), 221,492 KB
backup database Test to disk = ‘c:\test2.bak’ with compression
18.211 sec (11.837 MB/sec), 84,259 KB
20.129 sec (10.709 MB/sec), 84,259 KB
19.628 sec (10.982 MB/sec), 84,259 KB
So the above test shows that:
a) The size of the backup is consistent. It doesn’t change each time we backup.
b) The backup time varies. It can be up to 18% difference (24.225 sec compared to 28.579 sec)
c) Using backup compression the disk space is 62% less (137,233/221,492)
d) Using backup compression the backup time is 27% less (7.26/19.32)
Point c above (saving 62% disk space) is very useful for data warehousing. For some companies it could be the reason to upgrade to 2008. Or even to switch to SQL Server (if your current data warehouse platform does not have a backup compression). Because in data warehousing we are talking Terabytes, not Gigabytes, 62% less disk space saves a lot of money.
Point d is also very useful for data warehousing. As I specify at the beginning of this article, there are many points where backup is used in the overnight data warehouse load process. So if these points are 27% faster it would decrease the load window, which sometimes is a make or break factor in data warehousing. For example, it could decrease your load window from 6 hours to 5 hours.
One last thing before I end this article, you can set the SQL Server instance to compressed backup by default. Meaning that if you don’t specify it will be compressed by default.
The T-SQL command is:
exec sp_configure ‘backup compression default’, ‘1’
And using GUI on SSMS: Right click on the server instance, properties, database settings, compress backup: