Data Warehousing and Data Science

14 January 2016

Temporal Tables in SQL Server 2016

Filed under: Data Warehousing,SQL Server — Vincent Rainardi @ 8:40 am

The temporal table feature in SQL Server 2016 is a breakthrough in data warehousing, because we no longer need to do anything to store the historical values. This cuts down the development effort of a data warehouse by approximately 15% (half of the ETL development effort which is about 30%; the other 70% is analysis, design, database development, BI/report development, management).

Usage scenarios: https://msdn.microsoft.com/en-gb/library/mt631669.aspx

  • Data audit/forensic
  • Point-in-time analysis (time travel)
  • Data anomaly detection
  • Slowly changing dimension (type 4!)
  • Repairing data corruption at row level

Up to now we always have to build a data warehouse, which stores a) the historical changes of attribute values from the source system, and b) fact snapshots over time. With temporal tables, we no longer need to create a data warehouse. We can do a) and b) above within the source system itself, as demonstrated by the above article. This effectively kills the idea of creating a data warehouse, increasing the time we saved from 15% to 95%*.

*This remaining 5% of effort reflects the extra time required to setup the temporal tables in the source system’s SQL Server, along with memory-optimized tables to store the current data in memory, and the full history of changes on the disk. That arrangement gives us the optimum balance between the performance and the cost.

The above article (https://msdn.microsoft.com/en-gb/library/mt631669.aspx) demonstrates that using temporal tables we can do SCD, and we can do snapshot fact tables (called Time Travel). The SCD that temporal tables do is not type 2, but type 4. Type 4 is an old technique where the data changes are kept in the history table. This keeps the original table small and compact as it only contains the latest version.

 

 

18 December 2015

SQLBits XV Liverpool 4-7th May 2016

Filed under: Data Warehousing,Event,SQL Server — Vincent Rainardi @ 10:17 pm

I’ve posted 8 sessions for SQLBits XV in Liverpool. Hopefully one of them got chosen so I can to present it.

  1. Populating a Fact Table using SSIS
  2. Testing a SQL Server Data Warehouse
  3. Indexing Fact Tables in SQL Server
  4. SQL Server 2016 Data Warehousing & BI Features (deleted as it’s duplicate with another speaker)
  5. Using a SQL Server Data Warehouse for CRM
  6. Using a SQL Server Data Warehouse for BPM
  7. Loading a Dimension Table in SSIS
  8. T-SQL: Back to Basic

1. Populating a Fact Table using SSIS

In this session I will be doing a demo on my article “Populating a Fact Table using SSIS”: https://dwbi1.wordpress.com/2012/05/26/how-to-populate-a-fact-table-using-ssis-part1/ by going through it step by step on the screen. Giving the audience plenty of time to understand the mechanics and to ask questions.

When populating a fact table, people come across issues like these:

  • Where do I populate my fact table from?
  • How do I get the dimension keys to put into my fact table?
  • Where can I get the data for the measures columns?
  • With what do I populate the snapshot date column?
  • What is the primary key of my fact table?
  • The source table doesn’t have a primary key of the source table. What should I do?
  • I can’t find the row in the dimension table, what should I do?

As always, the best way to explain is by using an example. So in this session I’m going to do the following, and hopefully by the end of the session the above issues / questions in the audience’s mind will be answered.

  • Describe the background on the company and the data warehouse
  • Create the source tables and populate them
  • Create the dimension tables and populate them
  • Create the fact table (empty)
  • Build an SSIS package to populate the fact table, step by step

2. Testing a SQL Server Data Warehouse

The usual issue when we test a data warehouse is the sheer volume of data, which makes the normal testing method doesn’t work. Over the years, various ways have been developed to provide solutions to this problem, from a manual process to automated process.

In this session I will be demo-ing 3 approaches for testing SQL Server-based data warehouses:

  1. Using Manual Comparison
  2. Using Excel
  3. Using SSIS – SQL Server – SSAS

Along the way I will show the challenges, the mechanics, and the solutions.

Using Excel we will need to sort the data first. And then we use different formula for compare string columns, date column, logical column and numeric columns. We also need to incorporate the tolerance levels. Finally we can present the data quality for each column.

To verify the data in the data warehouse using the 3rd approach, first we need to match the rows to the source system. Then we verify the attribute columns. Then we verify the measure columns. SSIS facilitates the flow of data from both the source system and the warehouse into the test area, which is in SQL Server. A series of stored procedures then do the matching process in SQL Server, and compare the attribute columns and measure columns. The SSAS enables the testers to dissect the compared data, to find out which specific data areas are causing the issue.

3. Indexing Fact Tables in SQL Server

Index Fact Tables in SQL Server are different to Oracle or DB2 because of clustered index. Some people say that it is better to create a clustered index on the fact key, then non-clustered on each dimension key. Some say that it is better to use the clustered index on the snapshot date column. Of course the considerations are different between periodic snapshot fact tables, accumulative snapshot fact tables and transaction fact tables.

In this session I will go through the principles in indexing the 3 types of fact tables, including physical ordering, slimness of the clustered index, multi-column clustered index, a PK doesn’t have to be a clustered index, which dimension key column to index, when to include a column in an index, and of course partitioning, i.e. indexing partitioned fact tables. As always, it is better to explain by example than theory so I will give an example on each principles so that we can understand how it is applied in practice. For example: the performance comparison. I will also add my own “lessons learned”, i.e. mistakes I’ve made in the past so you can avoid making the same mistakes.

The title is “in SQL Server” because principles I will be explaining in this session are only applicable specifically to SQL Server data warehouses. It does not apply to Oracle, DB2 or Teradata data warehouses.

4. SQL Server 2016 Data Warehousing & BI Features

In this session I would like to go through the new features in SQL Server 2016 which are applicable to data warehousing and BI such as DB Engine: Stretch database, JSON, column-store indexes, memory-optimized tables, live query statistics, query store, temporal tables; SSAS: DBCC for SSAS, SSAS Extended event GUI, new DAX functions, tabular model partitions, roles in tabular; SSIS: custom logging in SSIS, Excel 2013 data sources, OData V4 data source, R in Execute SQL Task, HDFS connections, Auto Adjust Buffer Size, reusable control flow template; SSRS: Pin Report Items, project template for SSRS, PPTX format, PDF based printing, custom parameter pane, HMTL 5, Report Builder UI, Tree Map, Sunburst Charts, High DPI, Subscription; MDS: data compression at entity level, super user function, CRUD permissions, transaction log.

I may not be able to fit all of the above points into the session, but will try to cover as many as possible. It is essentially a session which goes through the new features in SQL 2016. Wherever possible I will try to demo the feature rather than talking about it on PowerPoint. Because by seeing we learn a lot more than by discussing them in theory. Also for each feature I will mention how they can be used in data warehousing or BI development. If a feature already exist in 2014, but it is enhanced in 2016 I will show only the improvement, and not going through the fundamentals of that feature.

5. Using a SQL Server Data Warehouse for CRM

This session is taken from my article “Using a Data Warehouse for CRM”, which I wrote with Amar Gaddam: https://dwbi1.wordpress.com/2010/01/28/using-data-warehouse-for-crm/

A data warehouse is not only for BI. It can also be used from Customer Relationship Management (CRM). In fact, a DW is the best platform for doing CRM. In this session I will show how to use a data warehouse built in SQL Server for doing core CRM activities such as: 1. Single Customer View, 2. Permission management, 3. Campaign segmentation, 4. Campaign Results, 5. Customer Analysis, 6. Personalisation, 7. Customer Insight, 8. Customer Loyalty Scheme.

If you don’t work in CRM and not familiar with the above concepts, don’t worry. I will explain them one by one during this sessions. Although it would be help a lot if you read my article above, before coming to this session.

For each point I will try to show the table design in SQL Server DW so we can all see how they are implemented, not just the theory. Due to time limit I may not be able to cover all the above points, but I promise I will try to cover as much as possible.

6. Using a SQL Server Data Warehouse for BPM

BPM = Business Performance Management. A Data Warehouse is not only used for BI. It is also used for CRM and BPM. In this session I will show how to use a Data Warehouse for BPM using Balanced Scorecard. Before the session I will have built a Data Warehouse on SQL Server, for BPM, and during the session I will show the design of this DW.

The Data Warehouse will contain Sales data, Financial data, Customer Service data, and Order Processing data. Each of this data will form part of the Balanced Scorecard. In addition to Fact and Dimension tables, a Data Warehouse used for BPM contains one additional area which stores the KPI scores. I will show how this area is designed, and how the KPI values and score are calculated.

As a take away, I hope the audience can learn how a DW is used outside BI, how the additional area are designed and built, and how it is all implemented on a SQL Server platform. I will also show how the SSIS packages that populate the Data Warehouse from the source system, and the SSRS reports which shows the KPIs, the Balanced Scorecard and Performance Scoring calculation.

7. Loading a Dimension Table in SSIS

We can load a dimension table in SSIS using a) SCD Transformation, b) Merge Join + Conditional Split, c) Change Data Capture, d) Merge command, and e) Upsert using Execute SQL. In this session I will be showing/demo-ing these 5 approaches on the screen one by one, then compare them in terms of efficiency/performance, code clarity/maintenance, and ease of build. It is based on my article: https://dwbi1.wordpress.com/2015/09/09/loading-a-dimension-table-using-ssis/

SCD Transformation and Merge Join + Conditional Split are both using row-by-row operation hence not efficient compared to Upsert. CDC is a mechanism to extract the data changes. To load a dimension table we need to read the CDC output table, and update or insert into the dimension table based on the _$Operation column. The Merge command is buggy, has concurrency issues, requires an index to support performance, and does the Insert twice.

In every data warehouse project, we need to load many dimension tables. So this is a fundamental knowledge to know, for those of us who uses SQL Server and SSIS for your warehouse.

8. T-SQL: Back to Basic

In SQL Server we usually look forward to working with the latest features, such as SQL 2016, Power BI, ML, Azure, etc. So much so that we often forget the basics such as creating/modifying a constraint, PK, FK, index, trigger, partitioned table, synonym; joins, correlated subquery, update from, output an SP into a table, cast, NULLIF, variables, temp tables, table variables, cross apply, while, case; string/date functions, row number, transaction, except, rank, find duplicate rows, etc.

This session is intended to be a refresher, i.e. we know all the above but we forget. We will go back to basic. I will cover 3 sections: a) creating database objects, b) database development, c) database administration. This session is based on my article: https://dwbi1.wordpress.com/2014/12/27/sql-server-scripts/. So this session will consists of many short SQL scripts (T-SQL) and only T-SQL, i.e. I won’t be using the GUI. I won’t be able to cover every single script in that article (there are over 250 in total!), but I will pick the important ones, and avoid the ones which are similar to the others.

11 November 2015

Indexing Fact Tables

A bit on Primary Key

Yesterday a friend asked me why there was no primary key on a fact table. I explained that we did have a fact table surrogate key (FSK) on that fact table, but I made it as a unique non-clustered index because we needed the physical ordering (the clustered index) to be on the snapshot date as it was a daily periodic snapshot fact table, queried mostly by using the snapshot date.

The purpose of having a primary key (PK) is to enforce uniqueness in one column of the table. We can achieve the same thing, by creating a unique key, hence we do not need a PK in that fact table.

We need to pause and think, if in the fact table we need a unique identifier of the rows. If we need to refer a fact row from another fact row (self referencing), then we do need a PK, which is usually a single column bigint FSK. But this unique identifier single column bigint FSK does not have to be an FK, it can be a non-clustered unique index.

The second purpose of having a PK is to enforce not null. This is really not the function of the PK, but more of a requirement of a PK. A PK requires that the column must be not-nullable. So not-nullable is a property of the column itself, not a property of the PK. And we implement this non-nullability when declaring the column on the table DDL.

We need to bear in mind that a PK has nothing to do with clustered or non-clustered indexes. SQL Server will automatically implement a PK as either a unique clustered index (UCI) or a unique non-clustered index (UNCI), depending on whether a clustered index already exists.

The Clustered Index

A clustered index does two things:

  1. Sorting the table physically
  2. Act as the row locator in non-clustered indexes

Point a) is for the performance of the queries. If we don’t partition a periodic snapshot fact table on the snapshot date, the next best thing is to cluster the table on the snapshot date.

But point a) is also for the performance of the update and delete. It is rare, but in some cases we need to update a periodic snapshot fact table (PSFT). So far I only found 1 case where I need to update a PSFT, out of about 150 PFSTs that I have created over the last 10 years. When updating fact table, it is absolutely crucial that the partitioning key, or the clustered fact table if you don’t have it partitioned, to be on the business date, plus the columns used as the joining criteria between the fact staging table and the PSFT. The clustered index should not be on the query criteria columns. It is the job of the non-clustered index to support the query.

Point b) means that the narrower the clustered key, the smaller the non-clustered indexes. Some people think that the narrow clustered key means that the non-clustered index will also have better performance but in my opinion and observation this performance increase is negligible.

So that’s the clustered index. For an insert-only PSFT we put the clustered index on the snapshot date plus the query criteria column to support the query performance. For an insert-and-update PSFT we put the clustered index on the joining criteria of the update statement.

For example, if the joining criteria of the update is snapshot date key + order ID (a degenerate dimension, the PK in the source table), whereas the query criteria is snapshot date key + account key, then for insert-only PSFT the clustered index should be snapshot date key + account key whereas for update PSFT the clustered index should be on snapshot date key + order ID.

The join SQL engine takes when updating the fact table depends on not only the clustered index of the PSFT, but also on the clustered index of the fact staging table (the source of the merge). If we do use the Merge command, we should convert it to update & insert commands. See my article here (link) about the Merge command’s inefficiency.

Partitioning

We can replace the physical ordering functionality above with partitioning. It is common and it make sense to partition a PSFT, I agree. But I would recommend not to partition the fact table when we create it, but later on. This is because of these reasons:

  1. We should not spend the time unless it is required, and when we create the fact table we don’t yet know if the partitioning is required.
  2. When the table is populated and queried we can learn about its condition and behaviour. Much, much better than imagining. For example, is the partitioning required to support query performance, or loading performance?
  3. We may have purging on the fact table, limiting the volume so that we don’t need to partition it because the performance is OK.
  4. We may need to create a new column for the partitioning key.
  5. When we create the fact table, we don’t yet know how the data will be distributed and queried. Purging can change the data distribution. We don’t know the query pattern, for example, 90% of the queries might be on today’s data so we should put it into a separate partition.
  6. Point e above dictates the partition maintenance strategy, i.e. whether we have partitioning functions splitting the table into 10,000 partitions or to 100 partitions with a “sliding window” maintenance. At the creation time, we have very limited knowledge of this.

Non Clustered Indexes

Each surrogate key is ideally indexed. Not combined as 1 index, but as separate indexes. All as non-clustered indexes (NCI). Say we have order date key, customer key, product key, store key, ship date key. Then we create 1 NCI on order date key, 1 NCI on customer key, 1 NCI on product key, 1 NCI on store key and 1 NCI on ship date key.

We should not combine these NCIs into 1 NCI because the second, third, and forth column of the combined NCI will not be used. Because their ordering is not sequential.

See also two of my articles which are related:

  • Indexing Fact Tables in SQL Server (link)
  • Primary Key and Clustered Index in the Fact Table (link)

27 December 2014

SQL Server Scripts Library

Filed under: SQL Server — Vincent Rainardi @ 8:05 am
Tags:

A lot of times we need to find out how to do things in SQL Server. Like searching for a procedure that contains certain word, find out indexes which are not used, fix orphaned users, and disable an index. In this article I’m trying to list down, as many as possible, the scripts for doing everyday stuff in SQL Server.


/* SQL Server Script Library
Author: Vincent Rainardi
Date written: 4th Dec 2014
Last updated: 2nd Feb 2015

CREATE DATABASE OBJECTS
1. Create a database
2. Create a filestream database
3. Drop a database
4. Rename a database
5. Create a login
6. Delete a login
7. Create a user
8. Delete a user
9. Create a schema
10. Drop a schema
11. Create a table
12. Create a memory-optimized table
13. Delete a table
14. Create a filetable
15. Delete a filetable
16. Create a view
17. Delete a view
18. Create an index
19. Create a filtered index
19a. Force using index
20. Create a spatial index
21. Delete an index
22. Create statistics
23. Show statistics
23a. When the statistic is last updated
24. Update statistics
25. Delete a statistics
26. Create a constraint
26a. Rename a constraint
27. Drop a constraint
28. Create a primary key
29. Delete a primary key
30. Create a foreign key
31. Delete a foreign key
32. Disable a foreign key
33. Enable a foreign key
34. Disable an index
35. Enable an index
36. Rename an index
37. Create a columnstore index
38. Delete a columnstore index
39. Add a column
40. Drop a column
41. Rename a column
42. Change data type of a column
43. Create a computed column
44. Create a synonym
45. Drop a synonym
46. Enable partitioning on the server
47. Disable partitioning on the server
48. Create a partitioned table
49. Drop a partitioned table
50. Create a partitioned index
51. Delete a partitioned index
52. Create an indexed view
53. Drop an indexed view
54. Add an extended property
55. Delete an extended property
56. Enable CDC on the server
57. Disable CDC on the server
58. Enable CDC on a table
59. Disable CDC on a table
59a. View the changed rows in the CDC table
60. CDC functions
61. CDC stored procedures
62. Enable Change Tracking on the database
63. Disable Change Tracking on the database
64. Enable Change Tracking on a table
65. Disable Change Tracking on a table
66. Change Tracking functions
67. Create a trigger
68. Delete a trigger
69. Disable a trigger
70. Enable a trigger

DATABASE DEVELOPMENT
1. Select rows
2. Update rows
3. Insert rows
4. Delete rows
5. Grouping rows
6. Inner join
7. Left and right join
8. Full outer join
9. Join hint
10. Like operator
11. Collation
12. Correlated subquery
13. Select into
14. Insert Select
14a. Output an SP into a table
15. Update from
16. Delete from
17. Truncate table
18. Cast
19. Convert
20. NULLIF
21. ISNULL
22. Coalesce
23. Union and union all
24. Declare a variable
25. Create a temp table
26. Drop a temp table
27. Create global temp table
28. Drop global temp table
29. Create a table variable
30. Memory-optimized table variables
31. Looping on a table variable
32. Delete from a table variable
33. Create a stored procedure
34. Execute a stored procedure
35. Drop a stored procedure
36. Create a scalar function
37. Create a table valued function
38. Create a CLR function
39. Delete a CLR function
40. Cross apply
41. Outer apply
42. While
43. Continue
44. Case when
45. If else
46. Try catch
47. Merge
48. Upsert
49. Checksum
50. String functions
51. Date functions
52. Set date format
53. Transaction
54. Pivot
55. Unpivot
56. Print the current time
57. Using row number
58. Find duplicate rows
59. Remove duplicate rows
60. Rank and dense rank
61. Choose function
62. IIF function
63. Except
64. Intersect
65. Dynamic SQL
66. Using Cursor
67. Cursor stored procedures
68. Put output of SP into a table
69. Select for XML
70. Prepare XML document
71. Remove XML document
72. Querying XML data
73. Option recompile
74. Bitwise operation
75. Waitfor
76. Applock

DATABASE ADMINISTRATION
1. Create a maintenance plan
2. Backup database
3. Restore database
4. Get database state
5. Get database size
6. Detact a database
7. Attach a database
8. Add a file group
9. Add a file to a database
10. Delete a file from a database
11. Move a database file
12. Inrease the size of a database
13. Reorganise an index
14. Rebuild an index
15. Check for fragmentation
16. Check size of log file
17. Check Log Sequence Number (LSN)
18. Check the last the a job run
19. Check the last time a table is accessed
20. Check the last time a user logged in
21. Create a database snapshot
22. Drop a database snapshot
23. Check for blocking
24. Check for locks
25. Check for deadlocks
26. Set deadlock priority
27. Set lock timeout
28. Check for orphaned users
29. Fix orphaned users
30. Find out the largest tables
31. Find out database space used
32. Get SQL Server version
33. Get memory used
34. Get number of connections
35. Create a job
36. Create a job step
37. Create a job schedule
38. Delete a job schedule
39. Delete a job step
40. Delete a job
41. Create a linked server to a SQL Server
42. Create a linked server to a SSAS
43. Drop a linked server
44. Find out who is connected
45. Create a role
46. Drop a role
47. Add a member to a role
48. Remove a member from a role
49. Grant permission to a stored procedure
50. Deny execution on a stored procedure
51. Grant permission to a table
52. Deny permission on a table
52a. Find out permissions
53. BCP
54. Create a trace
55. Delete a trace
56. Truncate transaction log
57. Bulk import
58. Bulk export
59. Enable data compression on a table
60. Disable data compression on a table
61. Checkpoint
62. Setup database mirroring
63. Monitoring database mirroring
64. Pausing a database mirroring session
65. Resuming a database mirroring session
66. Removing database mirroring
67. Create a server audit
68. View a SQL Server audit log
69. Cryptographic functions
70. List of all databases
71. List of columns in a table
72. List of columns in an index
73. List of statistics in an index
74. List of partitions in a table
75. Display view definition
76. Display table details
77. Find a column
78. Find an index
79. Find a view
80. Find a stored procedure
81. Find tables without a primary key
82. Find tables without an index
83. Find tables which were modified today
84. Find primary keys columns
85. Find foreign keys columns
86. Find columns with certain data types
87. Find a constraint
88. Find a trigger
89. Find a synonym
90. Find all stored procedures using a certain table
91. Find parameters in a stored procedure
92. Find owners the tables in a database
93. Find tables owned by a certain users
94. Detect page splits
95. Change isolation levels
96. Delete data from a large table
97. Row count on a large table
98. Max is too slow
99. Set row count
100. Display how long it takes to run a query
101. Display disk activity when a query runs
102. Kill a connection
103. Policy stored procedures
104. DB engine stored procedures
105. Security stored procedures

*/

-- PART 1. CREATE DATABASE OBJECTS
-- 1. Create database
CREATE DATABASE Database1
ON PRIMARY
( NAME = 'Database1_Data',
FILENAME = 'E:\MSSQL\Data\Database1_Data.mdf',
SIZE = 10 GB, MAXSIZE = 100 GB, FILEGROWTH = 5 GB
),
FILEGROUP SECONDARY
( NAME = 'Database1_Index',
FILENAME = 'F:\MSSQL\Data\Database1_Index.mdf',
SIZE = 4 GB, MAXSIZE = 30 GB, FILEGROWTH = 2 GB
)
LOG ON
( NAME = 'Database1_Log',
FILENAME = 'G:\MSSQL\DATA\Database1_Log.ldf',
SIZE = 1 GB, FILEGROWTH = 512 MB
)
GO
-- Suffix: KB, MB, GB, TB
-- Data file: min 5 MB to accomodate Model DB, max 16 TB
-- Log file: min 512 KB, max 2 TB
-- File growth: min 64 KB, rounded to nearest 64 KB

-- 2. Create a filestream database

-- To enable filestream: Configuration Manager, SQL Server Properties, FILESTREAM tab, then:
EXEC sp_configure filestream_access_level, 2
RECONFIGURE
GO

CREATE DATABASE Database2
ON PRIMARY
( NAME = 'Database2_Data',
FILENAME = 'E:\MSSQL\Data\Database2_Data.mdf',
SIZE = 10 GB, MAXSIZE = 100 GB, FILEGROWTH = 5 GB
),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM
( NAME = 'Database2_Filestream',
FILENAME = 'F:\MSSQL\Filestream\Database2'
)
LOG ON
( NAME = 'Database2_Log',
FILENAME = 'G:\MSSQL\DATA\Database2_Log.ldf',
SIZE = 1 GB, FILEGROWTH = 512 MB
)
GO

ALTER DATABASE [Database2] SET FILESTREAM (DIRECTORY_NAME = N'Database2') WITH NO_WAIT
GO

-- 3. Drop database
IF EXISTS (SELECT * FROM SYS.DATABASES WHERE NAME = 'Database1')
DROP DATABASE Database1
-- Compatibility view: SYSDATABASES

-- 4. Rename a database
ALTER DATABASE Database1 MODIFY Name = Database2
GO

-- 5. Create a login
CREATE LOGIN [DOMAIN\SQL Guy] FROM WINDOWS WITH DEFAULT_DATABASE=[Database1]
GO

-- 6. Delete a login
IF EXISTS (SELECT * FROM SYS.SYSLOGINS WHERE NAME = 'DOMAIN\SQL Guy')
DROP LOGIN [DOMAIN\SQL Guy]
GO
-- Compatibility view: SYSLOGINS

-- 7. Create a user
CREATE USER [DOMAIN\SQL Guy] FOR LOGIN [DOMAIN\SQL Guy]
GO

-- 8. Delete a user
IF EXISTS (SELECT * FROM SYS.SYSUSERS WHERE NAME = 'DOMAIN\SQL Guy')
DROP USER [DOMAIN\SQL Guy]
GO
-- Drop user automatically drop their memberships
-- Compatibility view: SYSUSERS

-- 9. Create schema
CREATE SCHEMA Schema1 AUTHORIZATION [dbo]
GO -- Create schema must be the only statement in the batch

-- 10. Drop a schema
IF EXISTS (SELECT * FROM SYS.SCHEMAS WHERE NAME = 'Schema1')
DROP SCHEMA Schema1
GO

-- 11. Create a table
CREATE TABLE dbo.Table1
( Column1 INT NOT NULL IDENTITY (1, 1),
Column2 NVARCHAR(10),
Column3 DATE,
Column4 DECIMAL(14, 5),
CONSTRAINT PK_Table1 PRIMARY KEY CLUSTERED (Column1 ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
) ON [PRIMARY]
GO

-- 12. Create a memory-optimized table
CREATE TABLE dbo.Table2
( Column1 INT NOT NULL IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
Column2 INT INDEX NC_Column2 NONCLUSTERED HASH WITH (BUCKET_COUNT=1000),
Column3 DATETIME2
) WITH (MEMORY_OPTIMIZED=ON)
GO

-- 13. Delete a table
IF OBJECT_ID('dbo.Table1') IS NOT NULL
DROP TABLE dbo.Table1
GO
-- Compatibility view: SYSOBJECTS

-- 14. Create Filetable
CREATE TABLE dbo.FileTable1 AS FILETABLE
WITH
( FILETABLE_DIRECTORY = 'E:\MSSQL\Folder1',
FILETABLE_COLLATE_FILENAME = database_default
)
GO

-- 15. Delete Filetable
IF OBJECT_ID('dbo.FileTable1') IS NOT NULL
DROP TABLE dbo.FileTable1
GO

-- 16. Create a view
CREATE VIEW View1 AS
SELECT T1.Column1, T2.Column2, T1.Column3
FROM Table1 T1
JOIN Table2 T2 on T1.Column1 = T2.Column2
WHERE T2.Column3 = 'Active'
-- Create view must be the only statement in the batch

-- 17. Delete a view
IF OBJECT_ID('dbo.View1') IS NOT NULL
DROP VIEW dbo.View1

-- 18. Create an index
CREATE INDEX NC_Table1_Column2Column4 ON dbo.Table1 (Column2 DESC,Column4 ASC) --This is NONCLUSTERED
CREATE CLUSTERED INDEX CI_Table1_Column1 ON dbo.Table1 (Column1) --Can take a long time if the table has 1 million rows

-- 19. Create a filtered index
CREATE INDEX FI_Table1_Column2Column4WithColumn3NotNull
ON dbo.Table1 (Column2 DESC,Column4 ASC)
WHERE Column3 IS NOT NULL

-- 19a. Force using index
SELECT Column2, Column3 FROM TABLE1
WITH ( INDEX(FI_Table1_Column2Column4WithColumn3NotNull) )
WHERE Column3 = 'FKA'

-- 20. Create a spatial index
CREATE SPATIAL INDEX SI_Table1_Column1
ON Table1 (Column1)
USING GEOMETRY_GRID
WITH
( Bounding_Box = (0,0,100,100),
GRIDS = LOW, LOW, MEDIUM, HIGH,
CELLS_PER_OBJECT = 64,
PAD_INDEX = ON
)

-- 21. Delete an index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'NC_Table1_Column2Column4')
DROP INDEX NC_Table1_Column2Column4 ON dbo.Table1
-- Compatibility view: sysindexes

-- 22. Create statistics
CREATE STATISTICS Stat1
ON Table1(Column1, Column2) --By default query optimizer determines the statistically significant sample size

CREATE STATISTICS Stat1
ON Table1(Column1, Column2)
WHERE Column3 > '2014-01-01' --Filtered Statistics

CREATE STATISTICS Stat1
ON Table1(Column1, Column2)
WITH SAMPLE 20 PERCENT --The optimizer scans all rows on a data page so the actual number is not 20%

-- 23. Show statistics
DBCC SHOW_STATISTICS ('dbo.Table1', Index1) --to specify schema use single or double aposthrophy
DBCC SHOW_STATISTICS (Table1, Index1) WITH HISTOGRAM --RANGE_ROWS = number of rows in each step/bucket
DBCC SHOW_STATISTICS (Table1, Index1) WITH STAT_HEADER --Density is not used, Average key length is in bytes
DBCC SHOW_STATISTICS (Table1, Index1) WITH DENSITY_VECTOR --Density = 1 / (Distinct Values - 200)

SELECT * FROM sys.stats WHERE object_id = object_id('Table1')
SELECT * FROM sys.stats_columns WHERE object_id = object_id('Table1')

-- 23a. When the statistic is last updated
SELECT Name, stats_date(object_id,stats_id) FROM sys.stats WHERE object_id = object_id('Table3') --Can be used for both index and statistic
SELECT Name, stats_date(object_id,index_id) FROM sys.indexes WHERE object_id = object_id('Table3') and Type_Desc <> 'HEAP' --For index

-- 24. Update statistics
UPDATE STATISTICS Table1(Index1)
EXEC sp_updatestats -- All tables

-- 25. Delete a statistics
DROP STATISTICS Table1.Stat1 --Can't drop statistics of an index

-- 26. Create a constraint
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 UNIQUE (Column2, Column3) --Unique constraint. This creates a unique index.
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 CHECK (Column2 < 10) --Check constraint
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 PRIMARY KEY (Column1) --Primary Key constraint
ALTER TABLE Table1 ADD CONSTRAINT Constraint1 FOREIGN KEY (Column1) REFERENCES Table2(Colomn1) --Foreign Key constraint
ALTER TABLE Table1 ALTER COLUMN Coloumn1 INT NOT NULL --NOT NULL constraint

-- 26a. Rename a constraint
EXEC sp_rename 'Constraint1', 'Constraint2' --Single quotes are annoying in WordPress

-- 27. Drop a constraint
ALTER TABLE Table1 DROP CONSTRAINT Constraint1

-- 28. Create a primary key
ALTER TABLE Table1 ADD CONSTRAINT PK1 PRIMARY KEY CLUSTERED (Column1, Column2) --Best: clustered PK on an identity column

-- 29. Delete a primary key
ALTER TABLE Table1 DROP CONSTRAINT PK1 --Takes a long time to recreate a Clustered PK

-- 30. Create a foreign key
ALTER TABLE Table1 ADD CONSTRAINT FK1 FOREIGN KEY (Column1) REFERENCES Table2(Colomn1) --Table2 must have PK on Column1

-- 30a. View foreign keys information
SELECT * FROM sys.foreign_keys
--Compatibility view: sys.sysforeignkeys

SELECT FK.name AS FKName,
 object_name(FK.parent_object_id) AS ChildTable,
 col_name(FK.parent_object_id, FKC.constraint_column_id) as ChildColumn,
 object_name(FK.referenced_object_id) AS ParentTable, --In ANSI standard, "Parent table" is the table with the PK. But in SQL Server, "Parent table" is the table with the FK.
 col_name(FKC.referenced_object_id, FKC.referenced_column_id) AS ParentColumn
FROM sys.foreign_keys FK
JOIN sys.foreign_key_columns AS FKC ON FK.object_id = FKC.constraint_object_id 

-- 31. Delete a foreign key
ALTER TABLE Table1 DROP CONSTRAINT FK1

-- 32. Disable a foreign key
ALTER TABLE Table1 NOCHECK CONSTRAINT FK1 --This is just disabling for UPDATE and INSERT, but not for Replication
ALTER TABLE Table1 NOCHECK CONSTRAINT ALL --Disable all FKs on Table1 (not PK, Check & Unique constraints)
--Disable all FKs on all tables: EXECUTE sp_msforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL' 

-- 33. Enable a foreign key
ALTER TABLE Table1 WITH CHECK CHECK CONSTRAINT FK1 --Enabling for UPDATE and INSERT, not for Replication
ALTER TABLE Table1 WITH CHECK CHECK CONSTRAINT ALL --Enable all FKs on Table1 (not PK, Check & Unique constraints
--Enable all FKs on all tables: EXECUTE sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL'

-- 34. Disable an index
ALTER INDEX Index1 ON Table1 DISABLE --if the Clustered Index is disabled, NC indexes will be automatically disabled
ALTER INDEX ALL ON Table1 DISABLE --disable all indexes on Table1

-- 35. Enable an index
ALTER INDEX Index1 ON Table1 REBUILD
ALTER INDEX ALL on Table1 REBUILD --enable all indexes on Table1. Can be used for all indexes.
CREATE INDEX Index1 ON Table1(Column2) WITH (DROP_EXISTING = ON) --Cannot be used for indexes for Unique Constraint
DBCC DBREINDEX (Table1, Index1) --Can be used for all indexes

-- 36. Rename an index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1')
 EXEC sp_rename 'Table1.Index1', 'Index2', 'INDEX'

-- 37. Create a columnstore index
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1')
 CREATE COLUMNSTORE INDEX Index1 ON Table1(Column1)

-- 38. Delete a columnstore index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1')
 DROP INDEX Table1.Index1

-- 39. Add a column
IF NOT EXISTS 
( SELECT * FROM sys.columns 
 WHERE name = 'Column1' and object_id = object_id('Table1')
)
ALTER TABLE Table1 ADD Column1 INT

-- 40. Drop a column
IF EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column1' and object_id = object_id('Table1')
)
ALTER TABLE Table1 DROP COLUMN Column1

-- 41. Rename a column 
IF EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column1' and object_id = object_id('Table1')
)
EXEC sp_rename 'Table1.Column1', 'Column2', 'COLUMN'

-- 42. Change data type of a column
IF EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column1' and object_id = object_id('Table1')
)
ALTER TABLE Table1 ALTER COLUMN Col1 DECIMAL (9,2)

-- 43. Create a computed column
IF NOT EXISTS 
( SELECT * FROM sys.columns 
 WHERE Name = 'Column3' and object_id = object_id('Table1')
)
ALTER TABLE Table1 ADD Column3 AS Column1 * Column2

-- 44. Create a synonym
IF NOT EXISTS (SELECT * FROM sys.synonyms WHERE name = 'Synonym1')
 CREATE SYNONYM Synonym1 FOR Server1.Database1.Schema1.Table1

-- 45. Drop a synonym
IF EXISTS (SELECT * FROM sys.synonyms WHERE name = 'Synonym1')
 DROP SYNONYM Synonym1 

-- 46. Enable more than 15000 partitions (on 2008 R2)
EXEC sp_db_increased_partitions 'Database1', 'ON' --We can replace ON with TRUE
EXEC sp_db_increased_partitions 'Database1' -- to check

-- 47. Disable more than 15000 partitions (on 2008 R2)
EXEC sp_db_increased_partitions 'Database1', 'OFF' --We can replace OFF with FALSE. Ensure no tables has more than 1000 partitions first. Set recovery model to simple to break the log chain.

-- 48. Create a partitioned table 
CREATE PARTITION FUNCTION PF1 (INT) AS RANGE LEFT FOR VALUES (20140101, 20140201, 20140301)
GO
CREATE PARTITION SCHEME PS1 AS PARTITION PF1 
ALL TO ([PRIMARY]) --Ideally on different file groups located on different disks
GO
CREATE TABLE Table1 
( Column1 INT, 
 Column2 VARCHAR(20)
) ON PS1 (Colomn1)
GO

-- 48a. Find out if a table is partitioned or not
SELECT * FROM sys.tables T
JOIN sys.indexes I on T.object_id = I.object_id and I.[TYPE] in (0,1) --0 is heap, 1 is clustered index
JOIN sys.partition_schemes PS on PS.data_space_id = I.data_space_id
WHERE T.Name = 'Table1'

-- 49. Drop a partitioned table
DROP TABLE Table1
GO
DROP PARTITION SCHEME PS1
GO
DROP PARTITION FUNCTION PF1
GO

-- 50. Create a partitioned index
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('Table1'))
 CREATE INDEX Index1 ON Table1(Column1, Column2) --Must include the partitioning column. If we don't, SQL Server will add it automatically.
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('Table1'))
 CREATE CLUSTERED INDEX Index1 ON Table1(Column1, Column2) --Must include the partitioning column. 

-- 51. Delete a partitioned index
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('Table1'))
 DROP INDEX Table1.Index1

-- 52. Create an indexed view
IF NOT EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('View1', 'VIEW'))
 CREATE UNIQUE CLUSTERED INDEX Index1 ON View1(Column1)
-- The view must not contain LEFT/RIGHT JOIN, SELECT in FROM, UNION, DISTINCT, MIN/MAX, CTE, COUNT, ORDER BY, PIVOT, OVER, APPLY.
-- The view must be created using WITH SCHEMABINDING. 
-- The first indexed view must be unique clustered.

-- 53. Drop an indexed view
IF EXISTS (SELECT * FROM SYS.INDEXES WHERE NAME = 'Index1' AND object_id = object_id('View1', 'VIEW'))
 DROP INDEX View1.Index1

-- 54. Add an extended property
IF NOT EXISTS (SELECT * FROM fn_listextendedproperty('Description',NULL,NULL,NULL,NULL,NULL,NULL))
EXEC sp_addextendedproperty 
 @name = N'Description', --Add extended property on the database
 @value = 'Risk Data Warehouse';

IF NOT EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Customer',NULL,NULL))
EXEC sp_addextendedproperty 
 @name = 'Description', --Add extended property on a table
 @value = 'This table stores the customer details', 
 @level0type = 'SCHEMA', @level0name = 'dbo',
 @level1type = 'TABLE', @level1name = 'Customer'

IF NOT EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Account','COLUMN','Balance'))
EXEC sp_addextendedproperty 
 @name = 'Description', --Add extended property on a column
 @value = 'This column contains the balance of the account on a particular date',
 @level0type = 'SCHEMA', @level0name = 'dbo',
 @level1type = 'TABLE', @level1name = 'Account',
 @level2type = 'COLUMN',@level2name = 'Balance'

-- 54a. View an extended property
SELECT O.Name as ObjectName, C.Name as ColumnName, EP.Name as ExtPropName, EP.Value as ExtPropValue
FROM sys.extended_properties EP
JOIN sys.objects O on O.object_id = EP.major_id
LEFT JOIN sys.columns C on C.object_id = EP.major_id and C.column_id = EP.minor_id
WHERE EP.class_desc = 'OBJECT_OR_COLUMN' --View the extended properties of all tables and columns

SELECT * FROM fn_listextendedproperty(NULL,NULL,NULL,NULL,NULL,NULL,NULL) -- View the extended properties of the database
SELECT * FROM fn_listextendedproperty(NULL,'SCHEMA','dbo','TABLE','Table1',NULL,NULL) --View the extended properties of Table1

-- 55. Delete an extended property
IF EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Customer',NULL,NULL))
EXEC sp_dropextendedproperty 
 @name = 'Description', --Delete an extended property on a table
 @level0type = 'SCHEMA', @level0name = 'dbo',
 @level1type = 'TABLE', @level1name = 'Customer'

IF EXISTS (SELECT * FROM fn_listextendedproperty('Description','SCHEMA','dbo','TABLE','Account','COLUMN','Balance'))
EXEC sp_dropextendedproperty 
 @name = 'Description' --Delete an extended property on a column
 ,@level0type = 'SCHEMA', @level0name = 'dbo'
 ,@level1type = 'TABLE', @level1name = 'Account'
 ,@level2type = 'COLUMN', @level2name = 'Balance';

-- 56. Enable CDC on the database
EXEC sys.sp_cdc_enable_db --Enable the current DB for CDC

-- 57. Disable CDC on the database
EXEC sys.sp_cdc_disable_db --Disable the current DB for CDC

-- 58. Enable CDC on a table
IF EXISTS 
( SELECT * FROM sys.tables 
 WHERE object_id = object_id('dbo.Table') AND is_tracked_by_cdc = 0 --Check that the table has not been enabled for CDC
)
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Table1', @role_name = NULL
 --CDC table will be created on the default filegroup

EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Table1', 
@role_name = 'Role1', --To control access to the change data
@filegroup_name = 'Filegroup1', --CDC table will be created on Filegroup1
@supports_net_changes = 1 --Generate this function: cdc.fn_cdc_get_net_changes_<capture_instance>. When a row had several changes applied to it, this function returns the final state of the row.

-- 59. Disable CDC on a table
IF EXISTS 
( SELECT * FROM sys.tables 
 WHERE object_id = object_id('dbo.Table1') AND is_tracked_by_cdc = 1 --Check if the table is enabled for CDC
)
EXEC sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'Table1', @capture_instance = 'dbo_Table1'

--Get the capture instance:
SELECT capture_instance FROM cdc.change_tables WHERE object_id = object_id('cdc.dbo_Table1_CT')

-- 59a. View the changed rows in the CDC table
select * from cdc.dbo_Table1_CT --$operation: 1 = delete, 2 = insert, 3 = before update, 4 = after update

-- 60. CDC functions
DECLARE @from_lsn binary(10), @to_lsn binary(10)
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Table1') --dbo.Table1 will return 0 hexa, which in turn will cause the get_all_changes function to say: An insufficient number of arguments were supplied.
SET @to_lsn = sys.fn_cdc_get_max_lsn()
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Table1 (@from_lsn, @to_lsn, 'all') --Replacing 'all' with 'all update old' gives us the old values as well, on updated rows.

select sys.fn_cdc_has_column_changed ( 'dbo_Table1','Column3' , 0x0F )

-- 61. CDC stored procedures
EXEC sys.sp_cdc_add_job @job_type = 'capture'
EXEC sys.sp_cdc_add_job @job_type = 'cleanup', @start_job = 0, @retention = 1440 --Retention period is in minutes
EXECUTE sys.sp_cdc_change_job @job_type = 'capture', @maxscans = 1000, @maxtrans = 15 --Update the job parameters
EXEC sys.sp_cdc_help_jobs --List CDC jobs
EXEC sys.sp_cdc_start_job --Starts the capture job
EXEC sys.sp_cdc_stop_job --Stops the capture job
EXEC sys.sp_cdc_drop_job @job_type = 'capture' --Drop the CDC Capture job
EXECUTE sys.sp_cdc_get_captured_columns @capture_instance = 'dbo_Table1'
EXECUTE sys.sp_cdc_get_ddl_history @capture_instance = 'dbo_Table1'

-- 62. Enable Change Tracking on the database
ALTER DATABASE Database1 SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON)

-- 63. Disable Change Tracking on the database
ALTER DATABASE Database1 SET CHANGE_TRACKING = OFF

-- 64. Enable Change Tracking on a table
ALTER TABLE dbo.Table1 ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON)

-- 65. Disable Change Tracking on a table
ALTER TABLE dbo.Table1 DISABLE CHANGE_TRACKING

-- 66. Change Tracking functions
SELECT CHANGE_TRACKING_CURRENT_VERSION()

SELECT CT.Column1, T1.Column2, T1.Column3, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM dbo.Table1 AS T1
RIGHT JOIN CHANGETABLE(CHANGES dbo.Table1, 0) AS CT --Change 0 with the output of CHANGE_TRACKING_CURRENT_VERSION(). CHANGETABLE must be aliased.
 ON T1.Column1 = CT.Column1 -- The PK of Table1

-- 67. Create a trigger
CREATE TRIGGER Trigger1 ON dbo.Table1 AFTER INSERT, UPDATE, DELETE AS
INSERT INTO Table1_Audit SELECT * FROM Inserted

-- 68. Delete a trigger
DROP TRIGGER Trigger1

-- 69. Disable a trigger
DISABLE TRIGGER Trigger1 on dbo.Table1

-- 70. Enable a trigger
ENABLE TRIGGER Trigger1 on dbo.Table1


-- PART 2. DATABASE DEVELOPMENT

-- 1. Select rows
SELECT Column1, Column2 FROM Table1 WHERE Column1 = 3

-- 2. Update rows
UPDATE Table1 SET Column2 = 'A' WHERE Column1 = 3

-- 3. Insert rows
INSERT INTO Table1 (Column1, Column2) VALUES (1,'A'), (2,'B'), (3,'C')

-- 4. Delete rows
DELETE FROM Table1 WHERE Column1 = 3
DELETE TOP(1000) FROM Table1 WHERE Column1 < '31 Dec 2015'
-- 5. Grouping rows
SELECT Column2, sum(Column3) FROM Table1 GROUP BY Column2 ORDER BY Column2 Desc

-- 6. Inner join
SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 --Only include rows exist in both tables

-- 7. Left and right join
SELECT T1.Column1, T1.Column2, T2.Column2, T3.Column3 FROM Table1 T1
LEFT JOIN Table2 T2 on T1.Column1 = T2.Column1 --Only include rows exists in Table1
RIGHT JOIN Table3 T3 on T1.Column1 = T3.Column1 --Only inlcude rows exists in Table3

-- 8. Full outer join
SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
FULL OUTER JOIN Table2 T2 on T1.Column1 = T2.Column1 --Include all rows from Table1, and all rows from Table2

-- 9. Join hint
SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER HASH JOIN Table2 T2 on T1.Column1 = T2.Column1 --Forces SQL Server to use Hash Join (good for large table)

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 OPTION (HASH JOIN) --Alternative way of forcing Hash join

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER MERGE JOIN Table2 T2 on T1.Column1 = T2.Column1 --Forces SQL Server to use Merge Join (good for large table which is sorted)

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 OPTION (MERGE JOIN) --Alternative way of forcing Merge join

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER LOOP JOIN Table2 T2 on T1.Column1 = T2.Column1 --Forces SQL Server to use Loop Join (good for small tables)

SELECT T1.Column1, T1.Column2, T2.Column2 FROM Table1 T1
INNER JOIN Table2 T2 on T1.Column1 = T2.Column1 OPTION (LOOP JOIN) --Alternative way of forcing Loop join

-- 10. Like operator
SELECT Column1, Column2 FROM Table1 WHERE Column2 LIKE '%MOTOR%'

-- 11. Collation
SELECT T1.Column1, T2.Column2 FROM DB1.dbo.Table1 T1
JOIN DB2.dbo.Table2 T2 ON T1.Column1 COLLATE SQL_Latin1_General_CP1_CI_AS = T2.Column1 --Joining tables from different DBs with different collation

-- 12. Correlated subquery (a subquery which depends on the outer query)
SELECT * FROM Table1 T1 WHERE NOT EXISTS 
(SELECT T1.Column1 FROM Table2 T2 WHERE T1.Column1 = T2.Column1)

-- 13. Select into
SELECT * INTO Table2 FROM Table1 WHERE Column1 = 2 --Table2 will be created if it doesn't exist. If it exists we get an error.

-- 14. Insert Select
INSERT INTO Table2 (Column1, Column2)
SELECT Column1, Column2 FROM Table1

-- 14a. Output an SP into a table
sp_configure 'show advanced options', 1; --Ad Hoc Distributed Queries is an advanced option. We need to set Show Advanced Option to Yes so we can see the Ad Hoc Distributed Queries option.
RECONFIGURE;
sp_configure 'Ad Hoc Distributed Queries', 1; --To set SQL Server to allow ad hoc access to distributed query
RECONFIGURE;

SELECT * INTO #Temp
FROM OPENROWSET('SQLNCLI', 'Server=Server1;Trusted_Connection=yes;', 'EXEC Database1.dbo.SP1') --OpenRowSet uses Ad Hoc distributed query

-- 15. Update from
UPDATE Table1
SET Column2 = T2.Column2
FROM Table1 T1
JOIN Table2 T2 on T1.Column1 = T2.Column1
WHERE T1.Column3 = 'A'

-- 16. Delete from
DELETE T1
FROM Table1 T1
JOIN Table2 T2 on T1.Column1 = T2.Column1
WHERE T2.Column2 = 'A'

-- 17. Truncate table
TRUNCATE TABLE Table1

-- 18. Cast
SELECT cast(Column1 as Varchar) FROM Table1

-- 19. Convert
SELECT convert(Varchar, Column1) FROM Table1
SELECT convert(Varchar(50),NULL) FROM Table1 --To set the data type of a temp table column
SELECT convert(Varchar,DateColumn,120) FROM Table1 --We can't specify the Style using Cast (120 means YYYY-MM-DD hh:mi:ss)

-- 20. NULLIF
SELECT 1/nullif(Column1,0) FROM Table1 --To avoid division by 0

-- 21. ISNULL
SELECT isnull(Column1,0) FROM Table1 --Change NULL with 0

-- 22. Coalesce
SELECT coalesce(col1, col2, col3) FROM Table1 --Returns the first not null column

-- 23. Union and union all
SELECT Col1, Col2 FROM Table1
UNION 
SELECT Col1, Col2 FROM Table2 --Rows with same values are only displayed once. Rows with all NULL are displayed.

SELECT Col1, Col2 FROM Table1
UNION 
SELECT Col1, Col2 FROM Table2 --Rows with same values are only displayed many times. Rows with all NULL are displayed.

-- 24. Declare a variable
DECLARE @Var1 Varchar(20) = 'String1'
SELECT @Var1

-- 25. Create a temp table
IF object_id('TempDB..#TempTable1') is not null DROP TABLE #TempTable1
SELECT Column1 as NewColumn1, Convert(Varchar(50),Column2) as NewColumn2 INTO #TempTable1 FROM Table1 -- To dictate the data type of NewColumn2
-- Or specify columns and data types:
IF object_id('TempDB..#TempTable1') is not null DROP TABLE #TempTable1
CREATE TABLE #TempTable1
( Column1 Int,
 Column2 Varchar(20)
)

-- 26. Drop a temp table
IF object_id('TempDB..#TempTable1') is not null DROP TABLE #TempTable1

-- 27. Create global temp table
IF object_id('TempDB..##TempTable1') is not null DROP TABLE ##TempTable1
SELECT Col1, Col2 INTO ##TempTable1 FROM Table1
-- Or specify columns and data types:
IF object_id('TempDB..##TempTable1') is not null DROP TABLE ##TempTable1
CREATE TABLE ##TempTable1
( Col1 Int,
 Col2 Varchar(20)
)

-- 28. Drop global temp table
IF object_id('TempDB..##TempTable1') is not null DROP TABLE ##TempTable1

-- 29. Create a table variable
DECLARE @Var1 TABLE
( Column1 Int,
 Column2 Varchar(20)
)
INSERT INTO @Var1 (Column1, Column2) VALUES (1,2)
SELECT * FROM @Var1

-- 30. Memory-optimized table variables (2014)
CREATE TYPE dbo.Table1 AS TABLE
( Column1 Int NOT NULL,
 Column2 Varchar(20) NOT NULL,
 INDEX Index1 HASH (Column1) WITH (BUCKET_COUNT = 128)
) WITH (MEMORY_OPTIMIZED = ON)
DECLARE @Var1 dbo.Table1
INSERT @Var1 (Column1, Column2) VALUES (1,2)
SELECT * FROM @Var1

-- 31. Looping on a table variable
DECLARE @RowCount Int, @LoopCount Int, @Column1 Int, @Column2 Varchar(20)
SELECT @RowCount = Count(*) FROM @Var1
SET @LoopCount = 1
WHILE @LoopCount <= @RowCount
BEGIN
 SELECT TOP 1 @Column1=Column1, @Column2=Column2 FROM @Var1 ORDER BY Column1
 SELECT @Column1, @Column2
 SET @LoopCount = @LoopCount + 1
 DELETE FROM @Var1 WHERE Column1 = @Column1
END

-- 32. Delete from a table variable
DELETE FROM @Var1

23 November 2014

SQL Server 2014 Installation

Filed under: Business Intelligence,SQL Server — Vincent Rainardi @ 4:42 pm
Tags: ,

I’m installing SQL Server 2014 on a new laptop and it has not been an easy experience.

First of all, the Books Online is not included and when I tried to “Install Content from Online” I had the issue below (I include the solution).

Secondly, there is no BIDS! BIDS needs to be downloaded from this link, and installed into Visual Studio 2013. But my Visual Studio is 2015 (Preview), so I had to install VS 2013 first.

After installing VS 2013 and downloading SSDT BI (new name for BIDS), when I installed SSDT BI it said that the architecture was incorrect: my SQL Server 2014 was 64-bit, and the SSDT BI download page said that it could do 32 or 64 bit. Bit in fact, the SSDT BI installation file was only for 32 bit (SSDTBI_x86_ENU.exe). There is no SSDT installation for 64 bit, because Visual Studio is only available in 32 bit.

The trick is, as “g2server” pointed out here: when installing SSDT BI, on the Installation Type we need to choose “new instance”, not “existing instance” (even though you already installed both SQL Server 2014 and VS 2013).

All sorted now. Hopefully with this article those of you who want to install SQL Server 2014 documentation and SSDT BI don’t have to spend time as much as me.

Symptom: open SQL Server 2014 documentation, Microsoft Help Viewer 1.1 opened up, Install Content from Online, chose the content (SQL Server 2014 Books Online, click Update button. Help Library Manager said “Updating Local Library, Analyzing package delta” for 2 mins then displayed this error message: An exception has occured. See event log for details.

The Event log:

An error occurred while updating local content: Microsoft.Help.CacheLib.CacheLibBitsException: Retrieving the COM class factory for component with CLSID {4991D34B-80A1-4291-83B6-3328366B9097} failed due to the following error: 80080005 Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE)). —> System.Runtime.InteropServices.COMException: Retrieving the COM class factory for component with CLSID {4991D34B-80A1-4291-83B6-3328366B9097} failed due to the following error: 80080005 Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE)). at Microsoft.Help.CacheLib.Downloader.CreateManager() — End of inner exception stack trace — at Microsoft.Help.CacheLib.Downloader.CreateManager() at Microsoft.Help.CacheLib.Downloader.Jobs() at Microsoft.Help.CacheLib.Downloader.CancelAllPendingJobs() at Microsoft.Help.CacheLib.DocumentationCache.DownloadWebContent(IEnumerable`1 webDownloads, Int32 totalDownloads, Int32 completedDownloads, String workingDirectory, StatusCallback`1 statusCallback) at Microsoft.Help.CacheLib.DocumentationCache.Download(ChangeDescription change, StatusCallback`1 statusCallback) at Microsoft.Help.CacheLib.DocumentationCache.Update(VendorName vendorName, Boolean checkForOnlineUpdates, StatusCallback`1 statusCallback, CertificateCheckCallback certCheckCallback) at Microsoft.Help.CacheLib.CacheManager.<>c__DisplayClass24.<UpdateAsync>b__23() at Microsoft.Help.CacheLib.AsyncOperationRunner.Run(Object state)

Source: https://support.microsoft.com/kb/870655?wa=wsignin1.0

In Register, HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems:

%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=sxssrv,4 ProfileControl=Off MaxRequestThreads=16

Added 512 like this: Windows SharedSection=1024,20480,768,512

Still the same error.

Solution: download 2012 and 2014 documentation mhsa files from

SQL Server 2014: http://www.microsoft.com/en-gb/download/confirmation.aspx?id=42557

Product Documentation for Microsoft SQL Server 2014 for firewall and proxy restricted environments

SQL Server 2012: https://www.microsoft.com/en-us/download/confirmation.aspx?id=347

Product Documentation for Microsoft SQL Server 2012 for firewall and proxy restricted environments

29 November 2013

Data Consistency in Oracle and SQL Server

Filed under: SQL Server — Vincent Rainardi @ 8:09 am

I feel ashamed that I have not taken the time to really understand the data consistency mechanism in Oracle and SQL Server. I’ve read about transaction isolation level a few times, but didn’t really realise that in Oracle there is no dirty read like in SQL Server (Oracle does not offer Read Uncommitted mode). 

Oracle SCN Mechanism

When a query begins, the current system change number (SCN) is determined. In the picture above the SCN is 10. As data blocks are read, only blocks written with the observed SCN are used. Blocks with recent SCNs containing changed or uncommitted data are not read. This guarantees that the data read does not change during the duration of the query. This is why in Oracle there is no dirty reads (reading uncommitted data). Source: Link

In SQL Server, when the data is being read, the table is locked. Even thought table is locked, other processes can modify the data. Rows can be deleted and updated, new rows can appear. The lock only prevents SQL Server from reading uncommitted transactions (dirty reads). But it does read new rows and deleted rows (which is correct, we want the latest data). With NO LOCK, SQL Server will read uncommitted transactions. Source: Link

So, both SQL Server and Oracle, by default, allows phantom read and non-repeatable read, but prevent dirty reads. But they do it using different mechanism: Oracle uses SCN and SQL Server uses locks.

There are 3 issues with data consistency:

  • A dirty read is when a process reads a row written by an uncommitted transaction. If this transaction is rolled back, the read becomes incorrect.
  • A non repeatable read is when a transaction rereads a row it has previously read (for example, for updating a row) and finds that another process has modified or deleted the row.
  • A phantom read is when a transaction rerun the same query, it finds a new row inserted by another process.

Here are the 4 isolation levels and their ability to solve the issues:

Isolation Level Dirty Read Non Repeatable Read Phantom Read
Read uncommitted Yes Yes Yes
Read committed No Yes Yes
Repeatable read No No Yes
Serializable No No No

26 December 2012

Formatting Dates in SQL Server 2012

Filed under: SQL Server — Vincent Rainardi @ 1:46 pm

Finally, after a long wait, we have “Format” in SQL Server which enable us to format dates. In Oracle we have to_char to format a date to any format we want e.g. to_char(date1, ‘YYYY-MM-DD’). And this is since Oracle 9i (2001). Now in SQL Server 2012 we have the Format function which we can use to format date to any format we want.

It’s quite straight forward to use especially if you know .Net formatting, but a few notes would probably help:

  1. Month is M (m is minute). Year is yyyy not YYYY.
  2. M is single digit month, MM is double digit.
  3. MMM is month short name (e.g. Dec), MMMM is long name e.g. December
  4. Day of the week is ddd (Wed) and dddd (Wednesday)
  5. hh is 12-hour, HH is 24 hour (h or H for single digit)
  6. Milliseconds in fff, microsecond is ff (F is 1/10, FFFF is 1/10000)
    f gives zero, F gives blank
  7. Escape character is \

Examples:

select format(getdate(), 'yyyy-MM-dd')
2012-12-26

select format(getdate(), 'MM/dd/yyyy')
12/26/2012

declare @a datetime2 = '2012-12-26 13:30:27.1234'
select format(@a, 'yyyy-MM-dd hh:mm:ss.fff')
2012-12-26 01:30:27.123

declare @a datetime2 = '2012-12-26 13:30:27.0004'
select format(@a, 'yyyy-MM-dd hh:mm:ss.FFF')
2012-12-26 01:30:27

declare @a datetime2 = '2012-12-26 13:30:27.0004'
select format(@a, 'yyyy-MM-dd hh:mm:ss.fff')
2012-12-26 01:30:27.000

select format(getdate(), 'ddd')
Wed

select format(getdate(), 'dddd')
Wednesday

declare @a datetime2 = '2012-12-26 11:00:00'
select format(@a, 'HH:mm\h')
11:00h

Complete formatting pattern is here:
http://msdn.microsoft.com/en-US/library/8kb3ddd4.aspx

Note that we can use standard .Net formatting too, e.g. d, D, g, G:
http://msdn.microsoft.com/en-us/library/az4se3k1.aspx

declare @a datetime2 = '2012-12-26 13:30:27.0004'
select format(@a, 'd')
select format(@a, 'D')
select format(@a, 'g')
select format(@a, 'G')
12/26/2012
Wednesday, December 26, 2012
12/26/2012 1:30 PM
12/26/2012 1:30:27 PM

30 August 2012

Make yourself sysadmin on SQL 2008 Express

Filed under: SQL Server — Vincent Rainardi @ 4:20 pm
Tags:

How to give yourself sysadmin permission on SQL Server 2008 Express

Caviat: you need to be a member of local admin in the computer where the SQL Express is installed.

Here’s how:
1. Stop SQL Express using Configuration Manager (or Windows services)

2. Start SQL Express in single user mode:
cd \Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Binn
sqlserv.exe -sSQLEXPRESS -m
It will say “SQL Server is now ready for client connections”

3. Open another cmd box, go to the SQL Express binn folder like point 2 above and type
osql -S ComputerName\SQLExpress -E
(you do need the ComputerName\ in front)
This logs you in using Windows Credential.
Note: you can’t login using Mgt Studio (if you have install Mgt Studio 2008).

4. create a login for your windows login
1> CREATE LOGIN [domain1\windowslogin1] FROM WINDOWS
2> GO

5. Assign your login to sysadmin role
1> exec sp_addsrvrolemember @loginname=’domain1\windowslogin1′, @rolename=’sysadmin’
2> go

6. Shutdown SQL Express
1> shutdown
2> go

7. exit/close cmd prompt Window

8. Start SQL Express as normal using SQL Server Configuration Manager, i.e. start up parameters:
-dc:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\DATA\master.mdf;-ec:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Log\ERRORLOG;-lc:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\DATA\mastlog.ldf

9. Connect to SQL Express as normal using SSSM

Voila, you are a sysadmin

Thanks to Chris Randall:
http://blogs.ameriteach.com/chris-randall/2009/12/11/sql-server-2008-forgot-to-add-an-administrator-account.html (pity the screenshots are cropped)

Ref: Connect to SQL Server When System Administrators Are Locked Out
http://msdn.microsoft.com/en-us/library/dd207004.aspx

27 September 2011

Top 2 things I missed in SQL Server and Oracle

Filed under: Data Warehousing,Oracle,SQL Server — Vincent Rainardi @ 6:08 pm

In the last few months I used both SQL Server and Oracle every day. This is because the source system is in SQL Server and the data warehouse is in Oracle. I begin to miss little things on SQL Server that I don’t get in Oracle, and vice versa. I thought I’d write this experience down as it could be useful for others.

In this article, when I say Oracle I mean using Toad, whereas when I say SQL Server I mean using SSMS. You can use Toad with SQL Server, and you can use other UI for Oracle, but that’s for another article. Version? SQL Server: 2008, Oracle: 10g R2 (well doesn’t really matter 10 or 11 as I use Toad as UI).

Oracle: Control Enter

First is the Control Enter to run SQL at the cursor. In SQL Server you need to block the code you want to run before pressing F5. In Oracle if you press Control Enter, you don’t need to block anything. Very useful I use this every day.

Oracle: No need for “top 1000”

You don’ t need to type “top 1000” when selecting large table. It will only bring back 500 rows. Automatically. That can be useful and can be annoying. Strangely, I found it very useful.

Oracle: Column Selection

It is called Intellisense in SSMS 2005/8 but Toad one is better: it enables you to select the column you like. When you type table alias followed by dot, it shows the column names. But you have to alias the table first, i.e. “select from fact_interest_rate_swap S” then you go back, put your cursor between “select” and “from” and type “S.” (s dot) and it will list all columns of fact_interest_rate_swap table.

Oracle: Count(*) Works

In the last 6 months or so, I work in a project where we have a column called countrysomething and every time I type count(*) it shows this country column instead. It is so irritating that I have to carefully press escape after typing count but before typing “(“.

SQL Server: Copy including Header

This SQL Server feature in SSMS is very useful to put data into Excel. In Oracle/Toad we need to Export to clipboard get the header.

SQL Server: Object Explorer

It’s true the saying: you don’t know how valuable it is until you don’t have it. In Toad we have Object Pallette, but it’s not as good as SSMS’ object explorer where you can browse the columns, stored proc, etc. In Toad Object Pallette you can only browse the tables, not the columns.

SQL Server: Top N

We won’t know how useful this top N is until we don’t have it. In Oracle, a SQL guy will go wondering why “select top 100 * from table1” doesn’t work. It’s “select * from table1 where rownum <= 10”.

In Informix/Ingress it’s FIRST: select first 100 * from table1. And it’s very handy in Informix to have SKIP clause: select skip 10 first 100 * from table1.

In DB2 we have FIRST too: select * from table1 fetch first 100 rows only

Ending

Well it’s not 2 things. It’s more like 11 things I know, but who’s counting anyway? 🙂

20 May 2011

Create table error: Concat Null Yields Null

Filed under: SQL Server — Vincent Rainardi @ 10:45 pm

Today I was trying to create a table and got this message:

SQLServer1: DDL_DATABASE_LEVEL_EVENTS: Error: INSERT failed because the following SET options have incorrect settings: ‘CONCAT_NULL_YIELDS_NULL’. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
Msg 3616, Level 16, State 2, Line 1
An error was raised during trigger execution. The batch has been aborted and the user transaction, if any, has been rolled back.

The DDL was:

create table schema1.table1
( column1 int not null,
  column2 varchar(10),
  column3 varchar(20),
  constraint PK_table1 primary key clustered
  ( column1 )
)

Solution: Tools menu, option, Query Execution, SQL Server, Advanced: make sure “SET CONCAT_NULL_YIELDS_NULL” is checked.

If it is not checked, the create index statement will error. In the above DDL it tried to create a clustered index for the PK hence it failed.

For further info see http://msdn.microsoft.com/en-us/library/ms176056.aspx

Next Page »

Blog at WordPress.com.