Data Warehousing and Business Intelligence

23 December 2009

Select count(*) For A Very Big Table

Filed under: Data Warehousing,SQL Server — Vincent Rainardi @ 10:08 pm

select count(*) can take 1 minute if the table has millions of rows with no indexes*. This is because SQL Server does table scan. An example of this situation is a stage table for the fact table in the data warehouse. We don’t index the table because we read the data in its entirity.

To avoid table scan when counting the rows on tables without index, read the row_count column from sys.dm_db_partition_stats. This DMV contains the row count information for each partition in the database. And page count too. For example:

select ps.row_count from sys.dm_db_partition_stats ps
join sys.tables t on ps.object_id = t.object_id
where t.name = ‘TableName’ and ps.index_id in (0,1)

Notes: (there are always notes to everything eh 🙂 )

  1. The (0,1): for heap it’s 0, for clustered-indexed table it’s 1. If a table has more than 1 index, the row_count column are the same for all indexes (well it has to be!), so we just take index_id 0 or 1.
  2. If table is partitioned we will have several rows on this DMV for index_id 0 or 1. Each row has different partition_number. So we need to sum the row_count like this:

select sum(ps.row_count) from sys.dm_db_partition_stats ps
join sys.tables t on ps.object_id = t.object_id
where t.name = ‘TableName’ and ps.index_id in (0,1)

A 10 millions staging table could take 1 minute if we do select count(*) from TableName, but with sys.dm_db_partition_stats it only takes 1 second. It’s quite handy for data warehousing where we deal with large stage tables.

In SQL 2000 row count is in sysindex and in 2005 & 2008 this is still available:

select i.rowcnt from sysindexes i
join sys.tables t on t.object_id = i.id
where t.name = ‘TableName’ and indid in (0,1)

Notes (told you there is always a note to everything 🙂 )

  1. Notice there are no . on sysindex but there is a . on sys.tables. In SQL 2000 we had sysindex and sysobjects. In 2005 & 2008 practically speaking sysindexes is replaced by sys.indexes, and sysobjects is replaced by sys.tables, sys.views, sys.procedures and others (sys.key_contstraints, sys.foreign_keys etc). Well,  not exactly ‘replaced’, as the old views are still available.
  2. Using sysindexes we don’t have to worry about partitions. Whether the table is partitioned or not there is only 1 row having indid 0 or 1.

*Lots of people think that a heap is a table without any index. No, a heap is a table without a clustered index. So a heap can have a non-clustered index. If a table don’t have an index, is it a heap? Yes. Because if the table doesn’t have any index, that means it doesn’t have a clustered index. But I can not write the title of this article as “select count(*) is slow on heaps” because select count(*) is fast on a heap that has a non-clustered index.

Advertisements

1 Comment »

  1. Congratulations ! Great Effort. Miami USA

    Comment by Miguel — 13 May 2011 @ 6:50 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: