Data Warehousing, BI and Data Science

11 November 2015

Indexing Fact Tables

A bit on Primary Key

Yesterday a friend asked me why there was no primary key on a fact table. I explained that we did have a fact table surrogate key (FSK) on that fact table, but I made it as a unique non-clustered index because we needed the physical ordering (the clustered index) to be on the snapshot date as it was a daily periodic snapshot fact table, queried mostly by using the snapshot date.

The purpose of having a primary key (PK) is to enforce uniqueness in one column of the table. We can achieve the same thing, by creating a unique key, hence we do not need a PK in that fact table.

We need to pause and think, if in the fact table we need a unique identifier of the rows. If we need to refer a fact row from another fact row (self referencing), then we do need a PK, which is usually a single column bigint FSK. But this unique identifier single column bigint FSK does not have to be an FK, it can be a non-clustered unique index.

The second purpose of having a PK is to enforce not null. This is really not the function of the PK, but more of a requirement of a PK. A PK requires that the column must be not-nullable. So not-nullable is a property of the column itself, not a property of the PK. And we implement this non-nullability when declaring the column on the table DDL.

We need to bear in mind that a PK has nothing to do with clustered or non-clustered indexes. SQL Server will automatically implement a PK as either a unique clustered index (UCI) or a unique non-clustered index (UNCI), depending on whether a clustered index already exists.

The Clustered Index

A clustered index does two things:

  1. Sorting the table physically
  2. Act as the row locator in non-clustered indexes

Point a) is for the performance of the queries. If we don’t partition a periodic snapshot fact table on the snapshot date, the next best thing is to cluster the table on the snapshot date.

But point a) is also for the performance of the update and delete. It is rare, but in some cases we need to update a periodic snapshot fact table (PSFT). So far I only found 1 case where I need to update a PSFT, out of about 150 PFSTs that I have created over the last 10 years. When updating fact table, it is absolutely crucial that the partitioning key, or the clustered fact table if you don’t have it partitioned, to be on the business date, plus the columns used as the joining criteria between the fact staging table and the PSFT. The clustered index should not be on the query criteria columns. It is the job of the non-clustered index to support the query.

Point b) means that the narrower the clustered key, the smaller the non-clustered indexes. Some people think that the narrow clustered key means that the non-clustered index will also have better performance but in my opinion and observation this performance increase is negligible.

So that’s the clustered index. For an insert-only PSFT we put the clustered index on the snapshot date plus the query criteria column to support the query performance. For an insert-and-update PSFT we put the clustered index on the joining criteria of the update statement.

For example, if the joining criteria of the update is snapshot date key + order ID (a degenerate dimension, the PK in the source table), whereas the query criteria is snapshot date key + account key, then for insert-only PSFT the clustered index should be snapshot date key + account key whereas for update PSFT the clustered index should be on snapshot date key + order ID.

The join SQL engine takes when updating the fact table depends on not only the clustered index of the PSFT, but also on the clustered index of the fact staging table (the source of the merge). If we do use the Merge command, we should convert it to update & insert commands. See my article here (link) about the Merge command’s inefficiency.

Partitioning

We can replace the physical ordering functionality above with partitioning. It is common and it make sense to partition a PSFT, I agree. But I would recommend not to partition the fact table when we create it, but later on. This is because of these reasons:

  1. We should not spend the time unless it is required, and when we create the fact table we don’t yet know if the partitioning is required.
  2. When the table is populated and queried we can learn about its condition and behaviour. Much, much better than imagining. For example, is the partitioning required to support query performance, or loading performance?
  3. We may have purging on the fact table, limiting the volume so that we don’t need to partition it because the performance is OK.
  4. We may need to create a new column for the partitioning key.
  5. When we create the fact table, we don’t yet know how the data will be distributed and queried. Purging can change the data distribution. We don’t know the query pattern, for example, 90% of the queries might be on today’s data so we should put it into a separate partition.
  6. Point e above dictates the partition maintenance strategy, i.e. whether we have partitioning functions splitting the table into 10,000 partitions or to 100 partitions with a “sliding window” maintenance. At the creation time, we have very limited knowledge of this.

Non Clustered Indexes

Each surrogate key is ideally indexed. Not combined as 1 index, but as separate indexes. All as non-clustered indexes (NCI). Say we have order date key, customer key, product key, store key, ship date key. Then we create 1 NCI on order date key, 1 NCI on customer key, 1 NCI on product key, 1 NCI on store key and 1 NCI on ship date key.

We should not combine these NCIs into 1 NCI because the second, third, and forth column of the combined NCI will not be used. Because their ordering is not sequential.

See also two of my articles which are related:

  • Indexing Fact Tables in SQL Server (link)
  • Primary Key and Clustered Index in the Fact Table (link)
Advertisements

27 December 2014

SQL Server Scripts Library

Filed under: SQL Server — Vincent Rainardi @ 8:05 am
Tags:

A lot of times we need to find out how to do things in SQL Server. Like searching for a procedure that contains certain word, find out indexes which are not used, fix orphaned users, and disable an index. In this article I’m trying to list down, as many as possible, the scripts for doing everyday stuff in SQL Server.


/* SQL Server Script Library
Author: Vincent Rainardi
Date written: 4th Dec 2014
Last updated: 2nd Feb 2015

CREATE DATABASE OBJECTS
1. Create a database
2. Create a filestream database
3. Drop a database
4. Rename a database
5. Create a login
6. Delete a login
7. Create a user
8. Delete a user
9. Create a schema
10. Drop a schema
11. Create a table
12. Create a memory-optimized table
13. Delete a table
14. Create a filetable
15. Delete a filetable
16. Create a view
17. Delete a view
18. Create an index
19. Create a filtered index
19a. Force using index
20. Create a spatial index
21. Delete an index
22. Create statistics
23. Show statistics
23a. When the statistic is last updated
24. Update statistics
25. Delete a statistics
26. Create a constraint
26a. Rename a constraint
27. Drop a constraint
28. Create a primary key
29. Delete a primary key
30. Create a foreign key
31. Delete a foreign key
32. Disable a foreign key
33. Enable a foreign key
34. Disable an index
35. Enable an index
36. Rename an index
37. Create a columnstore index
38. Delete a columnstore index
39. Add a column
40. Drop a column
41. Rename a column
42. Change data type of a column
43. Create a computed column
44. Create a synonym
45. Drop a synonym
46. Enable partitioning on the server
47. Disable partitioning on the server
48. Create a partitioned table
49. Drop a partitioned table
50. Create a partitioned index
51. Delete a partitioned index
52. Create an indexed view
53. Drop an indexed view
54. Add an extended property
55. Delete an extended property
56. Enable CDC on the server
57. Disable CDC on the server
58. Enable CDC on a table
59. Disable CDC on a table
59a. View the changed rows in the CDC table
60. CDC functions
61. CDC stored procedures
62. Enable Change Tracking on the database
63. Disable Change Tracking on the database
64. Enable Change Tracking on a table
65. Disable Change Tracking on a table
66. Change Tracking functions
67. Create a trigger
68. Delete a trigger
69. Disable a trigger
70. Enable a trigger

DATABASE DEVELOPMENT
1. Select rows
2. Update rows
3. Insert rows
4. Delete rows
5. Grouping rows
6. Inner join
7. Left and right join
8. Full outer join
9. Join hint
10. Like operator
11. Collation
12. Correlated subquery
13. Select into
14. Insert Select
14a. Output an SP into a table
15. Update from
16. Delete from
17. Truncate table
18. Cast
19. Convert
20. NULLIF
21. ISNULL
22. Coalesce
23. Union and union all
24. Declare a variable
25. Create a temp table
26. Drop a temp table
27. Create global temp table
28. Drop global temp table
29. Create a table variable
30. Memory-optimized table variables
31. Looping on a table variable
32. Delete from a table variable
33. Create a stored procedure
34. Execute a stored procedure
35. Drop a stored procedure
36. Create a scalar function
37. Create a table valued function
38. Create a CLR function
39. Delete a CLR function
40. Cross apply
41. Outer apply
42. While
43. Continue
44. Case when
45. If else
46. Try catch
47. Merge
48. Upsert
49. Checksum
50. String functions
51. Date functions
52. Set date format
53. Transaction
54. Pivot
55. Unpivot
56. Print the current time
57. Using row number
58. Find duplicate rows
59. Remove duplicate rows
60. Rank and dense rank
61. Choose function
62. IIF function
63. Except
64. Intersect
65. Dynamic SQL
66. Using Cursor
67. Cursor stored procedures
68. Put output of SP into a table
69. Select for XML
70. Prepare XML document
71. Remove XML document
72. Querying XML data
73. Option recompile
74. Bitwise operation
75. Waitfor
76. Applock

DATABASE ADMINISTRATION
1. Create a maintenance plan
2. Backup database
3. Restore database
4. Get database state
5. Get database size
6. Detact a database
7. Attach a database
8. Add a file group
9. Add a file to a database
10. Delete a file from a database
11. Move a database file
12. Inrease the size of a database
13. Reorganise an index
14. Rebuild an index
15. Check for fragmentation
16. Check size of log file
17. Check Log Sequence Number (LSN)
18. Check the last the a job run
19. Check the last time a table is accessed
20. Check the last time a user logged in
21. Create a database snapshot
22. Drop a database snapshot
23. Check for blocking
24. Check for locks
25. Check for deadlocks
26. Set deadlock priority
27. Set lock timeout
28. Check for orphaned users
29. Fix orphaned users
30. Find out the largest tables
31. Find out database space used
32. Get SQL Server version
33. Get memory used
34. Get number of connections
35. Create a job
36. Create a job step
37. Create a job schedule
38. Delete a job schedule
39. Delete a job step
40. Delete a job
41. Create a linked server to a SQL Server
42. Create a linked server to a SSAS
43. Drop a linked server
44. Find out who is connected
45. Create a role
46. Drop a role
47. Add a member to a role
48. Remove a member from a role
49. Grant permission to a stored procedure
50. Deny execution on a stored procedure
51. Grant permission to a table
52. Deny permission on a table
52a. Find out permissions
53. BCP
54. Create a trace
55. Delete a trace
56. Truncate transaction log
57. Bulk import
58. Bulk export
59. Enable data compression on a table
60. Disable data compression on a table
61. Checkpoint
62. Setup database mirroring
63. Monitoring database mirroring
64. Pausing a database mirroring session
65. Resuming a database mirroring session
66. Removing database mirroring
67. Create a server audit
68. View a SQL Server audit log
69. Cryptographic functions
70. List of all databases
71. List of columns in a table
72. List of columns in an index
73. List of statistics in an index
74. List of partitions in a table
75. Display view definition
76. Display table details
77. Find a column
78. Find an index
79. Find a view
80. Find a stored procedure
81. Find tables without a primary key
82. Find tables without an index
83. Find tables which were modified today
84. Find primary keys columns
85. Find foreign keys columns
86. Find columns with certain data types
87. Find a constraint
88. Find a trigger
89. Find a synonym
90. Find all stored procedures using a certain table
91. Find parameters in a stored procedure
92. Find owners the tables in a database
93. Find tables owned by a certain users
94. Detect page splits
95. Change isolation levels
96. Delete data from a large table
97. Row count on a large table
98. Max is too slow
99. Set row count
100. Display how long it takes to run a query
101. Display disk activity when a query runs
102. Kill a connection
103. Policy stored procedures
104. DB engine stored procedures
105. Security stored procedures

*/

-- PART 1. CREATE DATABASE OBJECTS
-- 1. Create database
CREATE DATABASE Database1
ON PRIMARY
( NAME = 'Database1_Data',
FILENAME = 'E:\MSSQL\Data\Database1_Data.mdf',
SIZE = 10 GB, MAXSIZE = 100 GB, FILEGROWTH = 5 GB
),
FILEGROUP SECONDARY
( NAME = 'Database1_Index',
FILENAME = 'F:\MSSQL\Data\Database1_Index.mdf',
SIZE = 4 GB, MAXSIZE = 30 GB, FILEGROWTH = 2 GB
)
LOG ON
( NAME = 'Database1_Log',
FILENAME = 'G:\MSSQL\DATA\Database1_Log.ldf',
SIZE = 1 GB, FILEGROWTH = 512 MB
)
GO
-- Suffix: KB, MB, GB, TB
-- Data file: min 5 MB to accomodate Model DB, max 16 TB
-- Log file: min 512 KB, max 2 TB
-- File growth: min 64 KB, rounded to nearest 64 KB

-- 2. Create a filestream database

-- To enable filestream: Configuration Manager, SQL Server Properties, FILESTREAM tab, then:
EXEC sp_configure filestream_access_level, 2
RECONFIGURE
GO

CREATE DATABASE Database2
ON PRIMARY
( NAME = 'Database2_Data',
FILENAME = 'E:\MSSQL\Data\Database2_Data.mdf',
SIZE = 10 GB, MAXSIZE = 100 GB, FILEGROWTH = 5 GB
),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM
( NAME = 'Database2_Filestream',
FILENAME = 'F:\MSSQL\Filestream\Database2'
)
LOG ON
( NAME = 'Database2_Log',
FILENAME = 'G:\MSSQL\DATA\Database2_Log.ldf',
SIZE = 1 GB, FILEGROWTH = 512 MB
)
GO

ALTER DATABASE [Database2] SET FILESTREAM (DIRECTORY_NAME = N'Database2') WITH NO_WAIT
GO

-- 3. Drop database
IF EXISTS (SELECT * FROM SYS.DATABASES WHERE NAME = 'Database1')
DROP DATABASE Database1
-- Compatibility view: SYSDATABASES

-- 4. Rename a database
ALTER DATABASE Database1 MODIFY Name = Database2
GO

-- 5. Create a login
CREATE LOGIN [DOMAIN\SQL Guy] FROM WINDOWS WITH DEFAULT_DATABASE=[Database1]
GO

-- 6. Delete a login
IF EXISTS (SELECT * FROM SYS.SYSLOGINS WHERE NAME = 'DOMAIN\SQL Guy')
DROP LOGIN [DOMAIN\SQL Guy]
GO
-- Compatibility view: SYSLOGINS

-- 7. Create a user
CREATE USER [DOMAIN\SQL Guy] FOR LOGIN [DOMAIN\SQL Guy]
GO

-- 8. Delete a user
IF EXISTS (SELECT * FROM SYS.SYSUSERS WHERE NAME = 'DOMAIN\SQL Guy')
DROP USER [DOMAIN\SQL Guy]
GO
-- Drop user automatically drop their memberships
-- Compatibility view: SYSUSERS

-- 9. Create schema
CREATE SCHEMA Schema1 AUTHORIZATION [dbo]
GO -- Create schema must be the only statement in the batch

-- 10. Drop a schema
IF EXISTS (SELECT * FROM SYS.SCHEMAS WHERE NAME = 'Schema1')
DROP SCHEMA Schema1
GO

-- 11. Create a table
CREATE TABLE dbo.Table1
( Column1 INT NOT NULL IDENTITY (1, 1),
Column2 NVARCHAR(10),
Column3 DATE,
Column4 DECIMAL(14, 5),
CONSTRAINT PK_Table1 PRIMARY KEY CLUSTERED (Column1 ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
) ON [PRIMARY]
GO

-- 12. Create a memory-optimized table
CREATE TABLE dbo.Table2
( Column1 INT NOT NULL IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
Column2 INT INDEX NC_Column2 NONCLUSTERED HASH WITH (BUCKET_COUNT=1000),
Column3 DATETIME2
) WITH (MEMORY_OPTIMIZED=ON)
GO

-- 13. Delete a table
IF OBJECT_ID('dbo.Table1') IS NOT NULL
DROP TABLE dbo.Table1
GO
-- Compatibility view: SYSOBJECTS

-- 14. Create Filetable
CREATE TABLE dbo.FileTable1 AS FILETABLE
WITH
( FILETABLE_DIRECTORY = 'E:\MSSQL\Folder1',
FILETABLE_COLLATE_FILENAME = database_default
)
GO

-- 15. Delete Filetable
IF OBJECT_ID('dbo.FileTable1') IS NOT NULL
DROP TABLE dbo.FileTable1
GO

-- 16. Create a view
CREATE VIEW View1 AS
SELECT T1.Column1, T2.Column2, T1.Column3
FROM Table1 T1
JOIN Table2 T2 on T1.Column1 = T2.Column2
WHERE T2.Column3 = 'Active'
-- Create view must be the only statement in the batch

-- 17. Delete a view
IF OBJECT_ID('dbo.View1') IS NOT NULL
DROP VIEW dbo.View1

-- 18. Create an index
CREATE INDEX NC_Table1_Column2Column4 ON dbo.Table1 (Column2 DESC,Column4 ASC) --This is NONCLUSTERED
CREATE CLUSTERED INDEX CI_Table1_Column1 ON dbo.Table1 (Column1) --Can take a long time if the table has 1 million rows

-- 19. Create a filtered index
CREATE INDEX FI_Table1_Column2Column4WithColumn3NotNull
ON dbo.Table1 (Column2 DESC,Column4 ASC)
WHERE Column3 IS NOT NULL

-- 19a. Force using index
SELECT Column2, Column3 FROM TABLE1
WITH ( INDEX(FI_Table1_Column2Column4WithColumn3NotNull) )
WHERE Column3 = 'FKA'

-- 20. Create a spatial index
CREATE SPATIAL INDEX SI_Table1_Column1
ON Table1 (Column1)
USING GEOMETRY_GRID
WITH
( Bounding_Box = (0,0,100,100),
GRIDS = LOW, LOW, MEDIUM, HIGH,
CELLS_PER_OBJECT = 64,
PAD_INDEX = ON
)

-- 21. Delete an index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'NC_Table1_Column2Column4')
DROP INDEX NC_Table1_Column2Column4 ON dbo.Table1
-- Compatibility view: sysindexes

-- 22. Create statistics
CREATE STATISTICS Stat1
ON Table1(Column1, Column2) --By default query optimizer determines the statistically significant sample size

CREATE STATISTICS Stat1
ON Table1(Column1, Column2)
WHERE Column3 > '2014-01-01' --Filtered Statistics

CREATE STATISTICS Stat1
ON Table1(Column1, Column2)
WITH SAMPLE 20 PERCENT --The optimizer scans all rows on a data page so the actual number is not 20%

-- 23. Show statistics
DBCC SHOW_STATISTICS ('dbo.Table1', Index1) --to specify schema use single or double aposthrophy
DBCC SHOW_STATISTICS (Table1, Index1) WITH HISTOGRAM --RANGE_ROWS = number of rows in each step/bucket
DBCC SHOW_STATISTICS (Table1, Index1) WITH STAT_HEADER --Density is not used, Average key length is in bytes
DBCC SHOW_STATISTICS (Table1, Index1) WITH DENSITY_VECTOR --Density = 1 / (Distinct Values - 200)

SELECT * FROM sys.stats WHERE object_id = object_id('Table1')
SELECT * FROM sys.stats_columns WHERE object_id = object_id('Table1')

-- 23a. When the statistic is last updated
SELECT Name, stats_date(object_id,stats_id) FROM sys.stats WHERE object_id = object_id('Table3') --Can be used for both index and statistic
SELECT Name, stats_date(object_id,index_id) FROM sys.indexes WHERE object_id = object_id('Table3') and Type_Desc <> 'HEAP' --For index

-- 24. Update statistics
UPDATE STATISTICS Table1(Index1)
EXEC sp_updatestats -- All tables

-- 25. Delete a statistics
DROP STATISTICS Table1.Stat1 --Can't drop statistics of an index

-- 26. Create a constraint
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 UNIQUE (Column2, Column3) --Unique constraint. This creates a unique index.
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 CHECK (Column2 < 10) --Check constraint
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 PRIMARY KEY (Column1) --Primary Key constraint
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 FOREIGN KEY (Column1) REFERENCES Table2(Colomn1) --Foreign Key constraint
ALTER TABLE Table1 ALTER COLUMN Coloumn1 INT NOT NULL --NOT NULL constraint

-- 26a. Rename a constraint
EXEC sp_rename 'Constraint1', 'Constraint2' --Single quotes are annoying in WordPress

-- 27. Drop a constraint
ALTER TABLE Table1 DROP CONSTRAINT Constraint1

-- 28. Create a primary key
ALTER TABLE Table1 ADD CONSTRAINT PK1 PRIMARY KEY CLUSTERED (Column1, Column2) --Best: clustered PK on an identity column

-- 29. Delete a primary key
ALTER TABLE Table1 DROP CONSTRAINT PK1 --Takes a long time to recreate a Clustered PK

-- 30. Create a foreign key
ALTER TABLE Table1 ADD CONSTRAINT FK1 FOREIGN KEY (Column1) REFERENCES Table2(Colomn1) --Table2 must have PK on Column1

-- 30a. View foreign keys information
SELECT * FROM sys.foreign_keys
--Compatibility view: sys.sysforeignkeys

SELECT FK.name AS FKName,
 object_name(FK.parent_object_id) AS ChildTable,
 col_name(FK.parent_object_id, FKC.constraint_column_id) as ChildColumn,
 object_name(FK.referenced_object_id) AS ParentTable, --In ANSI standard, "Parent table" is the table with the PK. But in SQL Server, "Parent table" is the table with the FK.
 col_name(FKC.referenced_object_id, FKC.referenced_column_id) AS ParentColumn
FROM sys.foreign_keys FK
JOIN sys.foreign_key_columns AS FKC ON FK.object_id = FKC.constraint_object_id 

-- 31. Delete a foreign key
ALTER TABLE Table1 DROP CONSTRAINT FK1

-- 32. Disable a foreign key
ALTER TABLE Table1 NOCHECK CONSTRAINT FK1 --This is just disabling for UPDATE and INSERT, but not for Replication
ALTER TABLE Table1 NOCHECK CONSTRAINT ALL --Disable all FKs on Table1 (not PK, Check & Unique constraints)
--Disable all FKs on all tables: EXECUTE sp_msforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL' 

-- 33. Enable a foreign key
ALTER TABLE Table1 WITH CHECK CHECK CONSTRAINT FK1 --Enabling for UPDATE and INSERT, not for Replication
ALTER TABLE Table1 WITH CHECK CHECK CONSTRAINT ALL --Enable all FKs on Table1 (not PK, Check & Unique constraints
--Enable all FKs on all tables: EXECUTE sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL'

-- 34. Disable an index
ALTER INDEX Index1 ON Table1 DISABLE --if the Clustered Index is disabled, NC indexes will be automatically disabled
ALTER INDEX ALL ON Table1 DISABLE --disable all indexes on Table1

-- 35. Enable an index
ALTER INDEX Index1 ON Table1 REBUILD
ALTER INDEX ALL on Table1 REBUILD --enable all indexes on Table1. Can be used for all indexes.
CREATE INDEX Index1 ON Table1(Column2) WITH (DROP_EXISTING = ON) --Cannot be used for indexes for Unique Constraint
DBCC DBREINDEX (Table1, Index1) --Can be used for all indexes

-- 36. Rename an index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1')
 EXEC sp_rename 'Table1.Index1', 'Index2', 'INDEX'

-- 37. Create a columnstore index
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1')
 CREATE COLUMNSTORE INDEX Index1 ON Table1(Column1)

-- 38. Delete a columnstore index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1')
 DROP INDEX Table1.Index1

-- 39. Add a column
IF NOT EXISTS 
( SELECT * FROM sys.columns 
 WHERE name = 'Column1' and object_id = object_id('Table1')
)
ALTER TABLE Table1 ADD Column1 INT

-- 40. Drop a column
IF EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column1' and object_id = object_id('Table1')
)
ALTER TABLE Table1 DROP COLUMN Column1

-- 41. Rename a column 
IF EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column1' and object_id = object_id('Table1')
)
EXEC sp_rename 'Table1.Column1', 'Column2', 'COLUMN'

-- 42. Change data type of a column
IF EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column1' and object_id = object_id('Table1')
)
ALTER TABLE Table1 ALTER COLUMN Col1 DECIMAL (9,2)

-- 43. Create a computed column
IF NOT EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column3' and object_id = object_id('Table1')
)
ALTER TABLE Table1 ADD Column3 AS Column1 * Column2

-- 44. Create a synonym
IF NOT EXISTS (SELECT * FROM sys.synonyms WHERE name = 'Synonym1')
 CREATE SYNONYM Synonym1 FOR Server1.Database1.Schema1.Table1

-- 45. Drop a synonym
IF EXISTS (SELECT * FROM sys.synonyms WHERE name = 'Synonym1')
 DROP SYNONYM Synonym1 

-- 46. Enable more than 15000 partitions (on 2008 R2)
EXEC sp_db_increased_partitions 'Database1', 'ON' --We can replace ON with TRUE
EXEC sp_db_increased_partitions 'Database1' -- to check

-- 47. Disable more than 15000 partitions (on 2008 R2)
EXEC sp_db_increased_partitions 'Database1', 'OFF' --We can replace OFF with FALSE. Ensure no tables has more than 1000 partitions first. Set recovery model to simple to break the log chain.

-- 48. Create a partitioned table 
CREATE PARTITION FUNCTION PF1 (INT) AS RANGE LEFT FOR VALUES (20140101, 20140201, 20140301)
GO
CREATE PARTITION SCHEME PS1 AS PARTITION PF1 
ALL TO ([PRIMARY]) --Ideally on different file groups located on different disks
GO
CREATE TABLE Table1 
( Column1 INT, 
 Column2 VARCHAR(20)
) ON PS1 (Colomn1)
GO

-- 48a. Find out if a table is partitioned or not
SELECT * FROM sys.tables T
JOIN sys.indexes I on T.object_id = I.object_id and I.[TYPE] in (0,1) --0 is heap, 1 is clustered index
JOIN sys.partition_schemes PS on PS.data_space_id = I.data_space_id
WHERE T.Name = 'Table1'

-- 49. Drop a partitioned table
DROP TABLE Table1
GO
DROP PARTITION SCHEME PS1
GO
DROP PARTITION FUNCTION PF1
GO

-- 50. Create a partitioned index
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('Table1'))
 CREATE INDEX Index1 ON Table1(Column1, Column2) --Must include the partitioning column. If we don't, SQL Server will add it automatically.
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('Table1'))
 CREATE CLUSTERED INDEX Index1 ON Table1(Column1, Column2) --Must include the partitioning column. 

-- 51. Delete a partitioned index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('Table1'))
 DROP INDEX Table1.Index1

-- 52. Create an indexed view
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('View1', 'VIEW'))
 CREATE UNIQUE CLUSTERED INDEX Index1 ON View1(Column1)
-- The view must not contain LEFT/RIGHT JOIN, SELECT in FROM, UNION, DISTINCT, MIN/MAX, CTE, COUNT, ORDER BY, PIVOT, OVER, APPLY.
-- The view must be created using WITH SCHEMABINDING. 
-- The first indexed view must be unique clustered.

-- 53. Drop an indexed view
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('View1', 'VIEW'))
 DROP INDEX View1.Index1

-- 54. Add an extended property
IF NOT EXISTS (SELECT * FROM fn_listextendedproperty('Description',NULL,NULL,NULL,NULL,NULL,NULL))
EXEC sp_addextendedproperty 
 @name = N'Description', --Add extended property on the database
 @value = 'Risk Data Warehouse';

IF NOT EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Customer',NULL,NULL))
EXEC sp_addextendedproperty 
 @name = 'Description', --Add extended property on a table
 @value = 'This table stores the customer details', 
 @level0type = 'SCHEMA', @level0name = 'dbo',
 @level1type = 'TABLE', @level1name = 'Customer'

IF NOT EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Account','COLUMN','Balance'))
EXEC sp_addextendedproperty 
 @name = 'Description', --Add extended property on a column
 @value = 'This column contains the balance of the account on a particular date',
 @level0type = 'SCHEMA', @level0name = 'dbo',
 @level1type = 'TABLE', @level1name = 'Account',
 @level2type = 'COLUMN',@level2name = 'Balance'

-- 54a. View an extended property
SELECT O.Name as ObjectName, C.Name as ColumnName, EP.Name as ExtPropName, EP.Value as ExtPropValue
FROM sys.extended_properties EP
JOIN sys.objects O on O.object_id = EP.major_id
LEFT JOIN sys.columns C on C.object_id = EP.major_id and C.column_id = EP.minor_id
WHERE EP.class_desc = 'OBJECT_OR_COLUMN' --View the extended properties of all tables and columns

SELECT * FROM fn_listextendedproperty(NULL,NULL,NULL,NULL,NULL,NULL,NULL) -- View the extended properties of the database
SELECT * FROM fn_listextendedproperty(NULL,'SCHEMA','dbo','TABLE','Table1',NULL,NULL) --View the extended properties of Table1

-- 55. Delete an extended property
IF EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Customer',NULL,NULL))
EXEC sp_dropextendedproperty 
 @name = 'Description', --Delete an extended property on a table
 @level0type = 'SCHEMA', @level0name = 'dbo',
 @level1type = 'TABLE', @level1name = 'Customer'

IF EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Account','COLUMN','Balance'))
EXEC sp_dropextendedproperty 
 @name = 'Description' --Delete an extended property on a column
 ,@level0type = 'SCHEMA', @level0name = 'dbo'
 ,@level1type = 'TABLE', @level1name = 'Account'
 ,@level2type = 'COLUMN', @level2name = 'Balance';

-- 56. Enable CDC on the database
EXEC sys.sp_cdc_enable_db --Enable the current DB for CDC

-- 57. Disable CDC on the database
EXEC sys.sp_cdc_disable_db --Disable the current DB for CDC

-- 58. Enable CDC on a table
IF EXISTS 
( SELECT * FROM sys.tables 
 WHERE object_id = object_id('dbo.Table') AND is_tracked_by_cdc = 0 --Check that the table has not been enabled for CDC
)
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Table1', @role_name = NULL
 --CDC table will be created on the default filegroup

EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Table1', 
@role_name = 'Role1', --To control access to the change data
@filegroup_name = 'Filegroup1', --CDC table will be created on Filegroup1
@supports_net_changes = 1 --Generate this function: cdc.fn_cdc_get_net_changes_<capture_instance>. When a row had several changes applied to it, this function returns the final state of the row.

-- 59. Disable CDC on a table
IF EXISTS 
( SELECT * FROM sys.tables 
 WHERE object_id = object_id('dbo.Table1') AND is_tracked_by_cdc = 1 --Check if the table is enabled for CDC
)
EXEC sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'Table1', @capture_instance = 'dbo_Table1'

--Get the capture instance:
SELECT capture_instance FROM cdc.change_tables WHERE object_id = object_id('cdc.dbo_Table1_CT')

-- 59a. View the changed rows in the CDC table
select * from cdc.dbo_Table1_CT --$operation: 1 = delete, 2 = insert, 3 = before update, 4 = after update

-- 60. CDC functions
DECLARE @from_lsn binary(10), @to_lsn binary(10)
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Table1') --dbo.Table1 will return 0 hexa, which in turn will cause the get_all_changes function to say: An insufficient number of arguments were supplied.
SET @to_lsn = sys.fn_cdc_get_max_lsn()
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Table1 (@from_lsn, @to_lsn, 'all') --Replacing 'all' with 'all update old' gives us the old values as well, on updated rows.

select sys.fn_cdc_has_column_changed ( 'dbo_Table1','Column3' , 0x0F )

-- 61. CDC stored procedures
EXEC sys.sp_cdc_add_job @job_type = 'capture'
EXEC sys.sp_cdc_add_job @job_type = 'cleanup', @start_job = 0, @retention = 1440 --Retention period is in minutes
EXECUTE sys.sp_cdc_change_job @job_type = 'capture', @maxscans = 1000, @maxtrans = 15 --Update the job parameters
EXEC sys.sp_cdc_help_jobs --List CDC jobs
EXEC sys.sp_cdc_start_job --Starts the capture job
EXEC sys.sp_cdc_stop_job --Stops the capture job
EXEC sys.sp_cdc_drop_job @job_type = 'capture' --Drop the CDC Capture job
EXECUTE sys.sp_cdc_get_captured_columns @capture_instance = 'dbo_Table1'
EXECUTE sys.sp_cdc_get_ddl_history @capture_instance = 'dbo_Table1'

-- 62. Enable Change Tracking on the database
ALTER DATABASE Database1 SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON)

-- 63. Disable Change Tracking on the database
ALTER DATABASE Database1 SET CHANGE_TRACKING = OFF

-- 64. Enable Change Tracking on a table
ALTER TABLE dbo.Table1 ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON)

-- 65. Disable Change Tracking on a table
ALTER TABLE dbo.Table1 DISABLE CHANGE_TRACKING

-- 66. Change Tracking functions
SELECT CHANGE_TRACKING_CURRENT_VERSION()

SELECT CT.Column1, T1.Column2, T1.Column3, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM dbo.Table1 AS T1
RIGHT JOIN CHANGETABLE(CHANGES dbo.Table1, 0) AS CT --Change 0 with the output of CHANGE_TRACKING_CURRENT_VERSION(). CHANGETABLE must be aliased.
 ON T1.Column1 = CT.Column1 -- The PK of Table1

-- 67. Create a trigger
CREATE TRIGGER Trigger1 ON dbo.Table1 AFTER INSERT, UPDATE, DELETE AS
INSERT INTO Table1_Audit SELECT * FROM Inserted

-- 68. Delete a trigger
DROP TRIGGER Trigger1

-- 69. Disable a trigger
DISABLE TRIGGER Trigger1 on dbo.Table1

-- 70. Enable a trigger
ENABLE TRIGGER Trigger1 on dbo.Table1


-- PART 2. DATABASE DEVELOPMENT

-- 1. Select rows
SELECT Column1, Column2 FROM Table1 WHERE Column1 = 3

-- 2. Update rows
UPDATE Table1 SET Column2 = 'A' WHERE Column1 = 3

-- 3. Insert rows
INSERT INTO Table1 (Column1, Column2) VALUES (1,'A'), (2,'B'), (3,'C')

-- 4. Delete rows
DELETE FROM Table1 WHERE Column1 = 3
DELETE TOP(1000) FROM Table1 WHERE Column1 < '31 Dec 2015'
-- 5. Grouping rows
SELECT Column2, sum(Column3) FROM Table1 GROUP BY Column2 ORDER BY Column2 Desc

-- 6. Inner join
SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 --Only include rows exist in both tables

-- 7. Left and right join
SELECT T1.Column1, T1.Column2, T2.Column2, T3.Column3 FROM Table1 T1
LEFT JOIN Table2 T2 on T1.Column1 = T2.Column1 --Only include rows exists in Table1
RIGHT JOIN Table3 T3 on T1.Column1 = T3.Column1 --Only inlcude rows exists in Table3

-- 8. Full outer join
SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
FULL OUTER JOIN Table2 T2 on T1.Column1 = T2.Column1 --Include all rows from Table1, and all rows from Table2

-- 9. Join hint
SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER HASH JOIN Table2 T2 on T1.Column1 = T2.Column1 --Forces SQL Server to use Hash Join (good for large table)

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 OPTION (HASH JOIN) --Alternative way of forcing Hash join

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER MERGE JOIN Table2 T2 on T1.Column1 = T2.Column1 --Forces SQL Server to use Merge Join (good for large table which is sorted)

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 OPTION (MERGE JOIN) --Alternative way of forcing Merge join

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER LOOP JOIN Table2 T2 on T1.Column1 = T2.Column1 --Forces SQL Server to use Loop Join (good for small tables)

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 OPTION (LOOP JOIN) --Alternative way of forcing Loop join

-- 10. Like operator
SELECT Column1, Column2 FROM Table1 WHERE Column2 LIKE '%MOTOR%'

-- 11. Collation
SELECT T1.Column1, T2.Column2 FROM DB1.dbo.Table1 T1
JOIN DB2.dbo.Table2 T2 ON T1.Column1 COLLATE SQL_Latin1_General_CP1_CI_AS = T2.Column1 --Joining tables from different DBs with different collation

-- 12. Correlated subquery (a subquery which depends on the outer query)
SELECT * FROM Table1 T1 WHERE NOT EXISTS 
(SELECT T1.Column1 FROM Table2 T2 WHERE T1.Column1 = T2.Column1)

-- 13. Select into
SELECT * INTO Table2 FROM Table1 WHERE Column1 = 2 --Table2 will be created if it doesn't exist. If it exists we get an error.

-- 14. Insert Select
INSERT INTO Table2 (Column1, Column2)
SELECT Column1, Column2 FROM Table1

-- 14a. Output an SP into a table
sp_configure 'show advanced options', 1; --Ad Hoc Distributed Queries is an advanced option. We need to set Show Advanced Option to Yes so we can see the Ad Hoc Distributed Queries option.
RECONFIGURE;
sp_configure 'Ad Hoc Distributed Queries', 1; --To set SQL Server to allow ad hoc access to distributed query
RECONFIGURE;

SELECT * INTO #Temp
FROM OPENROWSET('SQLNCLI', 'Server=Server1;Trusted_Connection=yes;', 'EXEC Database1.dbo.SP1') --OpenRowSet uses Ad Hoc distributed query

-- 15. Update from
UPDATE Table1
SET Column2 = T2.Column2
FROM Table1 T1
JOIN Table2 T2 on T1.Column1 = T2.Column1
WHERE T1.Column3 = 'A'

-- 16. Delete from
DELETE T1
FROM Table1 T1
JOIN Table2 T2 on T1.Column1 = T2.Column1
WHERE T2.Column2 = 'A'

-- 17. Truncate table
TRUNCATE TABLE Table1

-- 18. Cast
SELECT cast(Column1 as Varchar) FROM Table1

-- 19. Convert
SELECT convert(Varchar, Column1) FROM Table1
SELECT convert(Varchar(50),NULL) FROM Table1 --To set the data type of a temp table column
SELECT convert(Varchar,DateColumn,120) FROM Table1 --We can't specify the Style using Cast (120 means YYYY-MM-DD hh:mi:ss)

-- 20. NULLIF
SELECT 1/nullif(Column1,0) FROM Table1 --To avoid division by 0

-- 21. ISNULL
SELECT isnull(Column1,0) FROM Table1 --Change NULL with 0

-- 22. Coalesce
SELECT coalesce(col1, col2, col3) FROM Table1 --Returns the first not null column

-- 23. Union and union all
SELECT Col1, Col2 FROM Table1
UNION 
SELECT Col1, Col2 FROM Table2 --Rows with same values are only displayed once. Rows with all NULL are displayed.

SELECT Col1, Col2 FROM Table1
UNION 
SELECT Col1, Col2 FROM Table2 --Rows with same values are only displayed many times. Rows with all NULL are displayed.

-- 24. Declare a variable
DECLARE @Var1 Varchar(20) = 'String1'
SELECT @Var1

-- 25. Create a temp table
IF object_id('TempDB..#TempTable1') is not null DROP TABLE #TempTable1
SELECT Column1 as NewColumn1, Convert(Varchar(50),Column2) as NewColumn2 INTO #TempTable1 FROM Table1 -- To dictate the data type of NewColumn2
-- Or specify columns and data types:
IF object_id('TempDB..#TempTable1') is not null DROP TABLE #TempTable1
CREATE TABLE #TempTable1
( Column1 Int,
 Column2 Varchar(20)
)

-- 26. Drop a temp table
IF object_id('TempDB..#TempTable1') is not null DROP TABLE #TempTable1

-- 27. Create global temp table
IF object_id('TempDB..##TempTable1') is not null DROP TABLE ##TempTable1
SELECT Col1, Col2 INTO ##TempTable1 FROM Table1
-- Or specify columns and data types:
IF object_id('TempDB..##TempTable1') is not null DROP TABLE ##TempTable1
CREATE TABLE ##TempTable1
( Col1 Int,
 Col2 Varchar(20)
)

-- 28. Drop global temp table
IF object_id('TempDB..##TempTable1') is not null DROP TABLE ##TempTable1

-- 29. Create a table variable
DECLARE @Var1 TABLE
( Column1 Int,
 Column2 Varchar(20)
)
INSERT INTO @Var1 (Column1, Column2) VALUES (1,2)
SELECT * FROM @Var1

-- 30. Memory-optimized table variables (2014)
CREATE TYPE dbo.Table1 AS TABLE
( Column1 Int NOT NULL,
 Column2 Varchar(20) NOT NULL,
 INDEX Index1 HASH (Column1) WITH (BUCKET_COUNT = 128)
) WITH (MEMORY_OPTIMIZED = ON)
DECLARE @Var1 dbo.Table1
INSERT @Var1 (Column1, Column2) VALUES (1,2)
SELECT * FROM @Var1

-- 31. Looping on a table variable
DECLARE @RowCount Int, @LoopCount Int, @Column1 Int, @Column2 Varchar(20)
SELECT @RowCount = Count(*) FROM @Var1
SET @LoopCount = 1
WHILE @LoopCount <= @RowCount
BEGIN
 SELECT TOP 1 @Column1=Column1, @Column2=Column2 FROM @Var1 ORDER BY Column1
 SELECT @Column1, @Column2
 SET @LoopCount = @LoopCount + 1
 DELETE FROM @Var1 WHERE Column1 = @Column1
END

-- 32. Delete from a table variable
DELETE FROM @Var1

23 November 2014

SQL Server 2014 Installation

Filed under: Business Intelligence,SQL Server — Vincent Rainardi @ 4:42 pm
Tags: ,

I’m installing SQL Server 2014 on a new laptop and it has not been an easy experience.

First of all, the Books Online is not included and when I tried to “Install Content from Online” I had the issue below (I include the solution).

Secondly, there is no BIDS! BIDS needs to be downloaded from this link, and installed into Visual Studio 2013. But my Visual Studio is 2015 (Preview), so I had to install VS 2013 first.

After installing VS 2013 and downloading SSDT BI (new name for BIDS), when I installed SSDT BI it said that the architecture was incorrect: my SQL Server 2014 was 64-bit, and the SSDT BI download page said that it could do 32 or 64 bit. Bit in fact, the SSDT BI installation file was only for 32 bit (SSDTBI_x86_ENU.exe). There is no SSDT installation for 64 bit, because Visual Studio is only available in 32 bit.

The trick is, as “g2server” pointed out here: when installing SSDT BI, on the Installation Type we need to choose “new instance”, not “existing instance” (even though you already installed both SQL Server 2014 and VS 2013).

All sorted now. Hopefully with this article those of you who want to install SQL Server 2014 documentation and SSDT BI don’t have to spend time as much as me.

Symptom: open SQL Server 2014 documentation, Microsoft Help Viewer 1.1 opened up, Install Content from Online, chose the content (SQL Server 2014 Books Online, click Update button. Help Library Manager said “Updating Local Library, Analyzing package delta” for 2 mins then displayed this error message: An exception has occured. See event log for details.

The Event log:

An error occurred while updating local content: Microsoft.Help.CacheLib.CacheLibBitsException: Retrieving the COM class factory for component with CLSID {4991D34B-80A1-4291-83B6-3328366B9097} failed due to the following error: 80080005 Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE)). —> System.Runtime.InteropServices.COMException: Retrieving the COM class factory for component with CLSID {4991D34B-80A1-4291-83B6-3328366B9097} failed due to the following error: 80080005 Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE)). at Microsoft.Help.CacheLib.Downloader.CreateManager() — End of inner exception stack trace — at Microsoft.Help.CacheLib.Downloader.CreateManager() at Microsoft.Help.CacheLib.Downloader.Jobs() at Microsoft.Help.CacheLib.Downloader.CancelAllPendingJobs() at Microsoft.Help.CacheLib.DocumentationCache.DownloadWebContent(IEnumerable`1 webDownloads, Int32 totalDownloads, Int32 completedDownloads, String workingDirectory, StatusCallback`1 statusCallback) at Microsoft.Help.CacheLib.DocumentationCache.Download(ChangeDescription change, StatusCallback`1 statusCallback) at Microsoft.Help.CacheLib.DocumentationCache.Update(VendorName vendorName, Boolean checkForOnlineUpdates, StatusCallback`1 statusCallback, CertificateCheckCallback certCheckCallback) at Microsoft.Help.CacheLib.CacheManager.<>c__DisplayClass24.<UpdateAsync>b__23() at Microsoft.Help.CacheLib.AsyncOperationRunner.Run(Object state)

Source: https://support.microsoft.com/kb/870655?wa=wsignin1.0

In Register, HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems:

%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=sxssrv,4 ProfileControl=Off MaxRequestThreads=16

Added 512 like this: Windows SharedSection=1024,20480,768,512

Still the same error.

Solution: download 2012 and 2014 documentation mhsa files from

SQL Server 2014: http://www.microsoft.com/en-gb/download/confirmation.aspx?id=42557

Product Documentation for Microsoft SQL Server 2014 for firewall and proxy restricted environments

SQL Server 2012: https://www.microsoft.com/en-us/download/confirmation.aspx?id=347

Product Documentation for Microsoft SQL Server 2012 for firewall and proxy restricted environments

30 August 2012

Make yourself sysadmin on SQL 2008 Express

Filed under: SQL Server — Vincent Rainardi @ 4:20 pm
Tags:

How to give yourself sysadmin permission on SQL Server 2008 Express

Caviat: you need to be a member of local admin in the computer where the SQL Express is installed.

Here’s how:
1. Stop SQL Express using Configuration Manager (or Windows services)

2. Start SQL Express in single user mode:
cd \Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Binn
sqlserv.exe -sSQLEXPRESS -m
It will say “SQL Server is now ready for client connections”

3. Open another cmd box, go to the SQL Express binn folder like point 2 above and type
osql -S ComputerName\SQLExpress -E
(you do need the ComputerName\ in front)
This logs you in using Windows Credential.
Note: you can’t login using Mgt Studio (if you have install Mgt Studio 2008).

4. create a login for your windows login
1> CREATE LOGIN [domain1\windowslogin1] FROM WINDOWS
2> GO

5. Assign your login to sysadmin role
1> exec sp_addsrvrolemember @loginname=’domain1\windowslogin1′, @rolename=’sysadmin’
2> go

6. Shutdown SQL Express
1> shutdown
2> go

7. exit/close cmd prompt Window

8. Start SQL Express as normal using SQL Server Configuration Manager, i.e. start up parameters:
-dc:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\DATA\master.mdf;-ec:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Log\ERRORLOG;-lc:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\DATA\mastlog.ldf

9. Connect to SQL Express as normal using SSSM

Voila, you are a sysadmin

Thanks to Chris Randall:
http://blogs.ameriteach.com/chris-randall/2009/12/11/sql-server-2008-forgot-to-add-an-administrator-account.html (pity the screenshots are cropped)

Ref: Connect to SQL Server When System Administrators Are Locked Out
http://msdn.microsoft.com/en-us/library/dd207004.aspx

22 April 2011

T-SQL: Exists versus In – Performance Comparison

Filed under: SQL Server — Vincent Rainardi @ 4:25 pm
Tags:

In this test I’m going to query which numbers exist in Table1 but not Table2, using “Exist” and “In”.

Exist Query:

select * from Table1 A where not exists
( select * from Table2 B where B.col1 = A.col1 )

In Query:

select * from Table1 where col1 not in
( select col1 from Table2 )

There will be 2 tests:

  • Test 1: Table2 is large
  • Test 2: Table2 is small

I believe that in Test 1 “exist” is a lot faster, but in Test 2 “in” is slightly faster than “exist”

Test 1: Table 2 is large

Data Setup:
Table1 contains numbers from 1 to 100,000.
Table2 contains numbers from 10 to 100,000.

Expected results: Exist is faster than In.

Script to populate both tables with data:

truncate table table1;

declare @i int
set @i = 1
while @i <= 100000
begin
insert into Table1 values (@i, @i)
set @i = @i + 1
end;

truncate table table2;

declare @i int
set @i = 10
while @i <= 100000
begin
insert into Table2 values (@i, @i)
set @i = @i + 1
end;

Results:

  • Exist Query: average of 167 ms.
  • In Query: average of 1892 ms.

This is as I predicted.

Test 2: Table 2 is small

Data Setup:
Table1 contains 100,000 rows. Col 1 contains numbers from 0 to 50.
Table2 contains 49 rows. Col 1 contains numbers from 2 to 50.

Expected results: In is slightly faster than Exist.

Script to populate both tables with data:

truncate table table1;

declare @i int
set @i = 1
while @i <= 100000
begin
insert into Table1 values (@i%50, @i)
set @i = @i + 1
end;

select * from Table1

truncate table table2;

declare @i int
set @i = 2
while @i <= 50
begin
insert into Table2 values (@i, @i)
set @i = @i + 1
end;

select * from Table2

Results:

  • Exist Query: average of 219 ms.
  • In Query: average of 253 ms.

So I was wrong. Even if Table 2 is small, “Exist” is still faster than “in” (slightly). But if Table 2 is large, “Exist” is a lot faster than “in”. So it is better to use Exist. Always.

Null in Date Column

Filed under: SQL Server — Vincent Rainardi @ 6:14 am
Tags:

I was having one of those blunder moment when a few days ago I said that date column is not nullable. I was in SSIS, in a Derived Column transform and was writing something one of those conditional statements: “a ? b : c”. I put “null” in c and got an error hence my silly conclusion that date column is not nullable. But of course it is. Any column is nullable. And in SSIS it’s not “null”, but we write it using NULL functions, e.g. “NULL(DT_DBDATETIME)”.

20 April 2011

T-SQL: Exists versus In

Filed under: SQL Server — Vincent Rainardi @ 6:31 am
Tags:

I keep forgetting the syntax for “Not Exists” so I thought I’d write a post for it.

select * from Table1 A where not exists
( select * from Table2 B where A.Col1 = B.Col1 ) 

The above query is known as “correlation” query.

You need to have the “A” and “B” aliases in the code above. Otherwise you will have to type in the whole table name:

select * from Table1 where not exists
( select * from Table2 where Table2.col1 = Table1.col1 ) 

Well, not really. You can get away with not specifying the Table2 alias. But you have to specify Table1’s:

select * from Table1 where not exists
( select * from Table2 where col1 = Table1.col1 ) 

If you take away the Table1, it will return nothing, because there is no correlation between the two tables:

select * from Table1 where not exists
( select * from Table2 where col1 = col1 ) 

Exists versus In

And while I’m at it I might as well explain the benefit of “Exists” compared to “In”. And the disadvantage.

Here’s “In”: (it’s always easier to explain by example)

select * from Table1 where col1 not in
(select col1 from Table2) 

If we want to compare 2 columns:

select * from Table1
where col1 not in (select col1 from Table2)
  and col2 not in (select col2 from table2) 

Whereas using “Exists” to compare 2 columns:

select * from Table1 A where not exists
( select * from Table2 B where A.Col1 = B.Col1 and A.Col2 = B.Col2 ) 

So comparing 2 columns using “In” is not the same as using “Exists”. The “In” is not really comparing 2 columns.

To make the “In” really comparing 2 columns we need to do this:

select * from Table1
where convert(varchar,col1) + '|' + convert(varchar,col2) not in
(select convert(varchar,col1) + '|' + convert(varchar,col2) from Table2)

Above, we concatenate the two columns and compare the concatenated string.

Query Performance

Due to time limitation I thought I’d write the performance in another post. But considering that performance is the most important factor for some people, I will write the principles here, and leave the proofing of it for another time.

It is my believe that if the result set to compare to is small, then In is faster. But if the result set to compare to is large, then join is faster.

For example, compare these 2 SQL Statements, one for Exists, one for In:

 select * from Table1 A where not exists
( select * from Table2 B where A.Col1 = B.Col1 ) -- Query1

select * from Table1 where col1 not in
( select col1 from Table2 ) -- Query2 

If Table2 is small (say 10 rows), then the subquery in Query 2 will run fast, SQL will pin the result in memory and the lookup happens in memory, hence fast. But if Table2 is large (say 4 million rows) then SQL will put the result on disk, and the lookup will be searching the disk, hence slow.

On the other hand, the correlation is implemented as join and is very efficient dealing with 2 large set. Say both Table1 and Table2 contains 4 million rows. If (and that’s a big if) Col1 is indexed on both tables, Query 1 will run fast because Col1 is the join criteria.

The question I had for years is the efficiency of Query 2 to deal with small data set. Say Table 1 is 4 million rows and Table 2 is 10 rows. Can Query 2 beat Query 1? In SQL 7 and 2000 I believe this was the case. But in 2008 I don’t believe that is the case. It depends on the index. And the logic for hash join in SQL Server 2008 has improved a lot. We have 3 types of hash join in SQL Server: In-Memory hash join, Grace hash join and Recursive hash join (see here). Quote: “SQL Server starts by using an in-memory hash join and gradually transitions to grace hash join, and recursive hash join, depending on the size of the build input.” Note that SQL Server doesn’t understand the size of the table if the statistics are out of date, hence it is important to keep the statistics up-to-date.

17 April 2011

Inserting Multiple Rows in One SQL Statement

Filed under: Analysis Services — Vincent Rainardi @ 6:42 am
Tags:

In SQL Server 2008 we can insert multiple rows in one SQL statement:

create table table1
( col1 int, col2 int );

insert into table1 (col1, col2)
values (1, 1), (2, 2), (3, 3);

Don’t forget the semicolon. This will be required in SQL Server 2011.

2 April 2011

Collation Conflict When Querying a Linked Server

Filed under: SQL Server — Vincent Rainardi @ 7:45 am
Tags:

Sometimes we want to query a table on a linked server, to compare with data from the local SQL Server like this:

select * from LinkServer1.Database1.dbo.Table1
where Column1 in (select Column2 from Schema1.Table2)

When we do that we may get this error:

Msg 468, Level 16, State 9, Line 1
Cannot resolve the collation conflict between “SQL_Latin1_General_CP1_CI_AS” and “Latin1_General_CI_AS” in the equal to operation.

This is because the collation is different between the two SQL Servers. To solve this convert the collation using “collate database default” like this:

select * from LinkServer1.Database1.dbo.Table1
where Column1 collate database_default in
(select Column2 from Schema1.Table2)

5 March 2011

SSIS: Importing a File with Dynamic Columns

Filed under: SQL Server,SSIS — Vincent Rainardi @ 9:25 am
Tags: ,

One of the most difficult things to deal with in the ETL is importing a file with dynamic columns. The file does not have a set number of columns. The number of columns changes from time to time. For example, the file can have 5 columns, or 20 columns, or anything in between.

Below are some examples of such a file:

File1:
Col1,Col2,Col3t1,Col3t2,Col3t3
a,a,1,1,1
b,b,2,2,2
c,c,3,3,3

In the above example the number of columns for Column3 is 3. But it could be 4, 5, 6 or more, we don’t know.

Before we look at the ETL side, we need to look at the data modelling side first. What kind of target table should we create to host these files.

Data Modelling

Essentially there are 2 choices.

1. We could make it normalised like this:

Table1:
Col1,Col2,Col3Type,Col3Value
a,a,t1,1
b,b,t2,1
c,c,t3,1
a,a,t1,2
b,b,t2,2
c,c,t3,2
a,a,t1,3
b,b,t2,3
c,c,t3,3

In principle we design the structure so that we have a fixed number of columns. It doesn’t matter how many columns in the source file, we will always have 4 columns in the table.

The drawback of this design is when we query it. If we need to produce File1 from Table1, we would need to pivot the table like this:

select Col1, Col2, t1 as Col3t1, t2 as Col3t2, t3 as Col3t3
from
( select Col1, Col2, Col3Type, Col3Value
from Table3
) P1
pivot
( sum(Col3Value) for Col3Type in (t1, t2, t3)
) as P2

Output:
Col1,Col2,Col3t1,Col3t2,Col3t3
b,a,1,1,1
a,b,2,2,2
c,c,3,3,3

If we have 2 tables like that we need to join them together on the common column(s) and it becomes even trickier. If we are dealing with 1 million rows column, we risk a serious performance degradation. This might not be a problem for a data mart, but if it is for a transaction system where millisecond response time is expected, this design does not serve the purpose.

Alternative Design

The alternative is to find out how many columns Col3 is. We know it could be 3 to 5, but could it be 100 columns. Spending 5 minutes with the subject matter expert to understand the characteristic of Col3, is well worth it. I have seen this dynamic columns case 3 times (the first time was about 12 years ago) and on every case they followed the book/theory (to normalise) and everytime they regretted it. On the second case (about 8 years ago) when I was involved in troubleshooting the performance issue, in the end we changed the table structure to denormalised the table structure.

So, I would recommend to find out how many columns Col3 is. 90% of the chance is: it is not that many. The response I usually get from the SME is something like this: “Most likely Col3 varies between 3 and 5 columns, it certainly won’t exceed 8”. There is a natural meaning of this data, that’s why the range is limited. For example: Col3 could be the number of teams, or the number of projects, or the number of web sites, or the number of product types. In the business world most of the data is within a certain range. You can’t have 100 teams for example. It’s physically impossible for a company as it is limited by budget constraints.

So once we have that information (the maximum number of columns), then we create the table according to this limitation. Let’s assume that the SME said that the max is 8 columns. So we design Table1 with 10 columns (just in case) as follows:

create table table1
( Col1 varchar(10),
  Col2 varchar(10),
  Col3t1 int,
  Col3t2 int,
  Col3t3 int,
  Col3t4 int,
  Col3t5 int,
  Col3t6 int,
  Col3t7 int,
  Col3t8 int,
  Col3t9 int,
  Col3t10 int
)

Unless it is a staging table, don’t forget to add a PK column. It is a good design to be able to uniquely identify a row at all times. In this example I don’t add a PK column because the target is a staging table.

A table structure with a fixed number of columns like this makes our lives a lot easier. It directly mimic the structure of the source file, and able to accommodate the dynamic nature of Col3.

The Data Load

I’m going to explain the ETL for the denormalised table, i.e. the one with a fixed 10 columns for Col3. The ETL for the normalised table, i.e. the table with 4 columns will have to wait until next time I’m afraid, as I have a few other posts to write next week, plus a presentation for SQLBits 8, Advanced Dimensional Modelling. If you need the ETL for the normalised table with 4 columns please contact me.

Because of the dynamic nature of Col3 in the source file, the best way is to use the Script Component as a Source. There are a lot of sources on the internet Script Component. This post shows you the basics. This post (by Lee Everest) shows you how to use a Script Component to split the rows in the source file. Similarly, this MSDN post shows how to import a delimited source file using a Script Component.

The main concept is as follows: First create a Connection Manager for a File Connection (no need to use Flat File connection). Name this connection File1, like this:

Then create a workflow, and drag the Script Component into the workflow. When asked, specify as source.

Then:

  • On the Connection Managers tab, set the connection to File1 (that’s the middle column). On the first column, name it as “Conn1”.

  • On the Inputs and Outputs tab, rename the “Output 0” as File1Output. Then create the Output Columns one by one, as per the structure of Table2:

    Col1, Col2, Col3t1, Col3t2, … Col3t10.


  • In this example we will convert the data type for col3 from string to int not in the Script Component, but later on using Derived Column / Data Conversion transformation. So for all columns, set the data type to String, length 10.

  • On the Script tab, click on the Edit Script and edit it as follows. Don’t forget to add “Using System.IO;” and if you want to use MessageBox for debugging you need to add “using System.Windows.Forms;”. Remember that you can’t debug an SSIS Script Component (see my post here), but you can debug SSIS Script Task.
private StreamReader SR;
private string File1;

public override void AcquireConnections(object Transaction)
{
  // Get the connection for File1
  IDTSConnectionManager100 CM = this.Connections.File1Conn;
  File1 = (string)CM.AcquireConnection(null);
}

public override void PreExecute()
{
  // Create a reader for File1
  base.PreExecute();
  SR = new StreamReader(File1);
}

public override void CreateNewOutputRows()
{
  // Declare variables
  string nextLine;
  string[] columns;
  char[] delimiters;
  int Col3Count;
  String[] Col3Value = new string[10];

  // Set the delimiter
  delimiters = ",".ToCharArray();

  // Read the first line (header)
  nextLine = SR.ReadLine();

  // Split the line into columns
  columns = nextLine.Split(delimiters);

  // Find out how many Col3 there are in the file
  Col3Count = columns.Length - 2;
  // MessageBox.Show(Col3Count.ToString());

  // Read the second line and loop until the end of the file
  nextLine = SR.ReadLine();
  while (nextLine != null)
  {
    // Split the line into columns
    columns = nextLine.Split(delimiters);
    {
      // Add a row
      File1OutputBuffer.AddRow();

      // Set the Col3Value variable to the values in the file
      for (int i = 0; i <= Col3Count - 1; i++)
      {
        Col3Value[i] = columns[i + 2];
      }

      // Set the values of the Script Component output according to the file content
      File1OutputBuffer.Col1 = columns[0];
      File1OutputBuffer.Col2 = columns[1];
      File1OutputBuffer.Col3t1 = Col3Value[0];
      File1OutputBuffer.Col3t2 = Col3Value[1];
      File1OutputBuffer.Col3t3 = Col3Value[2];
      File1OutputBuffer.Col3t4 = Col3Value[3];
      File1OutputBuffer.Col3t5 = Col3Value[4];
      File1OutputBuffer.Col3t6 = Col3Value[5];
      File1OutputBuffer.Col3t7 = Col3Value[6];
      File1OutputBuffer.Col3t8 = Col3Value[7];
      File1OutputBuffer.Col3t9 = Col3Value[8];
      File1OutputBuffer.Col3t10 = Col3Value[9];

    }

    // Read the next line
    nextLine = SR.ReadLine();
  }
}

public override void PostExecute()
{
  // Close the reader
  base.PostExecute();
  SR.Close();
}
  • Save the script, close SSIS script editor, click OK to get out of the Script Transformation Editor, back to the Data Flow.
  • I would suggest not to put the output into a table in the database directly, but put the output to a file first to check the result. This way you don’t need to worry about data conversion between string and int etc. Create a flat file connection with 12 columns. All the columns are string, with length = 10, as follows:

  • Run it. If it fails, check the source file and make it like this:

    Also check the script, make sure it is like above.

  • Create an OLE DB Destination or SQL Server Destination (see my post here re why practically speaking we can’t use SQL Server Destination). But create the table first by executing the DDL I wrote above. Note that if you are using SQL Server Destination (because you are doing it to your local instance) then notice that on Windows 7 or Vista you need to run BIDS as Administrator, otherwise you’ll get this error: Unable to prepare SSIS bulk insert, so something like that.
  • Create a data conversion to convert Col3 columns from string to int, like this:

  • Connect the output of the data conversion to the DB and map the DC output columns to the DB columns like this:

  • Now execute it again, and check the target table:

  • Overall the Data Flow looks like this:

As usual I welcome any question and discussion at vrainardi@gmail.com. Vincent, 5/3/11.


Next Page »

Blog at WordPress.com.