Data Warehousing and Business Intelligence

8 May 2010

Why a 400 GB Data Warehouse Requires 20 TB of Disks

Filed under: Data Warehousing — Vincent Rainardi @ 7:43 am
Tags:

A lot of people wonders, why to host a 400 Gigabytes data warehouse, the production SQL Server requires 20 Terabytes of disks. For example, say your BI Project Manager requires you to size your data warehouse and you came up with something like this:

  • 5 base fact tables, 1-30 million rows each: 10 GB
  • 10 snapshot tables, 100-300m rows each: 74 GB
  • 10 aggregate tables, 10-200k rows each: 2 GB
  • 20 dimension tables, 12 small (under 1000 rows), 5 medium (1-100k rows) and 3 large (>100k rows): 2 GB

Total data: 78 GB. (rounded: 80 GB)

  • Materialised/indexed views, 25% of data = 25% x 80GB = 20 GB
  • Indexes = 1.5x data: 120 GB
  • Unused + Unallocated space: 2x data: 160 GB (see here for background re what they are)

Estimated database size: 80+20+120+160 = 380 GB (rounded: 400 GB)

Growth for 2 years: 3x (1.2 TB). So, DDS database size = 1.6 TB.
Say NDS/ODS, Stage, marts: 1 TB

Total: 2.6 TB

PS.
DDS = Dimensional Data Store, i.e. the star/snowflake schema
NDS = Normalized Data Store, aka Enterprise DW, see my book, chapter 2.
ODS = Operational Data Store

LUN / disk carving

  • Disk = 146 GB unformatted (137 GB formatted), SAS 15k RPM, about £175 each.
  • 20 data volumes, RAID 10, 20×4 disks = 80 disks (total capacity 20x137GB=2.7TB)
  • 2 log volumes, RAID 10, 2×4 disks = 8 disks
  • Tempdb files: 20 cores, 1 file for each core, 1 on each volume, shared on data vol
  • 2 backup volumes, RAID 5, 2×25 disks = 50 disks (24x137GB=3.2 TB usable space)
  • Hot standby: 4
  • Total disks: 80+8+50+4=142 disks

Total: 142 disks. Raw disks capacity: 142x146GB = about 20 TB.
That’s just for prod, I haven’t mentioned DR, dev, UAT and Prod Support!

An example is RamSan 5000 (SSD), which cost £1m (ref)! A more conventional SAN example is CX4. Cheapest route is DAS (direct attach), probably £70-80k e.g. about £2.5k per enclosure (25 bays, controller incl.) – need 6 of them, plus the disks.

Notes:

  • Log files can go to the data disks (and vice versa), but not for the same database. But for best result seperate them.
  • ODS/NDS and DDS both have partitioned tables: need to split each data disk into several volumes (6 to 10 volumes).
  • Many companies are now starting using solid state disk for DW, 500x faster, about £500-£600 for 256 GB (ref). If we use SSD we don’t need to do RAID 10, just RAID 1 or 5. Read/write speed = 200 MB/s. I think it’s only a matter of time before all everybody uses SSD.

To recap, initial calc of the “data warehouse” 400 GB.
Need to cater for 2 years growth (3x): 1.6 TB.
Add ODS/NDS, Staging, data marts: 2.6 TB. (and metadata, standing data, control DB, audit/DQ)
To make 2.6 TB if we use RAID 10, we are looking at roughly 20 TB of raw disks (including the backup volume).

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: