Data Warehousing and Data Science

30 August 2012

Make yourself sysadmin on SQL 2008 Express

Filed under: SQL Server — Vincent Rainardi @ 4:20 pm
Tags:

How to give yourself sysadmin permission on SQL Server 2008 Express

Caviat: you need to be a member of local admin in the computer where the SQL Express is installed.

Here’s how:
1. Stop SQL Express using Configuration Manager (or Windows services)

2. Start SQL Express in single user mode:
cd \Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Binn
sqlserv.exe -sSQLEXPRESS -m
It will say “SQL Server is now ready for client connections”

3. Open another cmd box, go to the SQL Express binn folder like point 2 above and type
osql -S ComputerName\SQLExpress -E
(you do need the ComputerName\ in front)
This logs you in using Windows Credential.
Note: you can’t login using Mgt Studio (if you have install Mgt Studio 2008).

4. create a login for your windows login
1> CREATE LOGIN [domain1\windowslogin1] FROM WINDOWS
2> GO

5. Assign your login to sysadmin role
1> exec sp_addsrvrolemember @loginname=’domain1\windowslogin1′, @rolename=’sysadmin’
2> go

6. Shutdown SQL Express
1> shutdown
2> go

7. exit/close cmd prompt Window

8. Start SQL Express as normal using SQL Server Configuration Manager, i.e. start up parameters:
-dc:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\DATA\master.mdf;-ec:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Log\ERRORLOG;-lc:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\DATA\mastlog.ldf

9. Connect to SQL Express as normal using SSSM

Voila, you are a sysadmin

Thanks to Chris Randall:
http://blogs.ameriteach.com/chris-randall/2009/12/11/sql-server-2008-forgot-to-add-an-administrator-account.html (pity the screenshots are cropped)

Ref: Connect to SQL Server When System Administrators Are Locked Out
http://msdn.microsoft.com/en-us/library/dd207004.aspx

17 August 2012

The Trio Maestro’s SSAS 2012 Tabular Book

Filed under: Analysis Services — Vincent Rainardi @ 6:08 pm

There are a lot of Microsoft BI developers in the world. There are also many SQL Server DBAs who have been trying to get their hands in Microsoft BI. If in 2005 when SSAS 2005 came out you spent a month familiarising yourself with the new SSAS and MDX, you would get a comfortable position between 2005 and 2012 in terms of your career. In London this means we can get £50k/year as perm, or £500/day for contract. Have a look at itjobswatch.com for the numbers. I believe in other countries the pay is good too.

This year, 2012, SSAS has a dramatic change again. It is shifting from multidimensional cube to Tabular ala PowerPivot. Well, not “shifting” exactly, but opening a whole new ground. This is the time to invest a month of your time to learn about Tabular, and to grab that evolving new market. Tabular and DAX are Microsoft’s answer to Spotfire and Qlikview. Self Service BI is a new growing market and the skills are lacking. We can expect Tabular pay rate to be as high as Qlikview and Spotfire. Qlikview is £50k/year perm, £450/day contract and Spotfire is £55k/£525 (again have a look at itjobswatch and jobserv!), but it would be more difficult for an MS BI developer to get into them. It is easier to get into SSAS Tabular. This is your opportunity, my fellow MS BI developer, to grab a book, sit down every night for a month and label yourself as SSAS Tabular.

SSAS is 10 times more popular than Qlikview, and 35 times more popular than Spotfire (see itjobwatch) and yet the pay level is almost the same. So this is why, you need to grab the trio’s Tabular book. Chris, Marco & Alberto are the top guys in SSAS. And they dedicate their time, putting pen to paper* for months to write this book. They layout the foundation bit-by-bit so brilliantly, guiding you step-by-step into Tabular world. And they did it very systematically (I can see the editor’s work here).

*well, “get stuck into their laptops” is probably a better expression. Who use pen and paper these days to write books? IT books?

And they really put their knowledge into it. I said earlier when they wrote their first book, it’s a steal! When people like them write a book about SSAS, don’t even think twice: just grab the book and start reading from cover to cover. I don’t care whether you pick Teo’s book or the trio’s book, but if you are now in MS BI and don’t get into Tabular, then you will miss a big wave.  I’ve read the trio’s book, they gave me a copy for review, but judging from his previous books which I’ve read, I’m sure Teo’s Tabular book is also brilliant, see “Who’s who in SSAS” that I wrote 2 years ago.

So folks, here are the links to their books:
Chris, Marco, Alberto’s Tabular book: US CA UK FR DE IT ES JP CN
Teo Lachev’s Tabular book: US CA UK FR DE IT ES JP CN
(choose your country, gosh there are so many!)

One more thing about this trio: they are all SSAS Maestro. For those of you who don’t know what SSAS Maestro is, it’s suffice to say that
a) it means they are the world’s top guys in SSAS,
b) it’s very difficult to get, much more difficult than MCTS. You need to get invited and only the experts are invited (for the training, followed by the exam). Read Marco’s story about it here. And Vidas’ experience here. Look at the list of books and material that he used to prepare for it! And the passing rate is only 10%! This is list of those 11 maestros on the first wave. And these 3 authors are there on that list! They are all SSAS Maestro! Need I say more to convince you to buy their book?

Think about it. It’s a brand new world. Even your interviewers wouldn’t know about Tabular. You could pass the interviews easily! This is the time to get into SSAS. Not as the old world know it (multidimensional cubes) but the new way: Tabular. In the EU, London, Dublin and Zurich are the primary markets. The international HQs in these cities would be the first users.

For those of us who are already in SSAS, we need to get into Tabular too! No, not “need”; we HAVE to get into Tabular. If we don’t we will only know half of SSAS, and that’s not good for our career. Not good at all! Any MDX expert out there, any AS cube expert, need to learn DAX and Tabular. This is the way the market is going. This is the demand of the job. They won’t advertise like this: “Required: multidimensional AS developer. Note: we don’t need the Tabular side” Any company advertising for SSAS developer will require both sides. Therefore, we have to learn Tabular. It’s a brand new world.

Vincent Rainardi, 17/8/2012

1 August 2012

Initial Data Load

Filed under: Data Warehousing — Vincent Rainardi @ 6:01 pm
Tags:

After we migrate the new ETL code into Production, we may need to load historical data, load past transactions, update/insert dimension rows or update some attributes. In data warehousing this is known as initial data load (IDL). IDL doesn’t only happen when the a DW goes live for the first time, but every single time a new ETL code goes to Production, we may need to do IDL.

In data warehousing projects, people tend to underestimate the effort and time involved in the initial load. In this article I am going to explain why Initial Data Load (IDL) could take a significant effort.

How big are we talking? I’ve seen projects where they estimated that the IDL would be taking 2 man-days effort, 3 days elapsed time, but in reality it took 20 man-days (10 times bigger than the estimate), 2 months elapsed time (20 times longer than the estimate). If we are not careful, IDL could blow the project budget and ruin the whole project.

The followings are things which need to be prepared for an IDL:

  1. Need to Write IDL Scripts
  2. Need to Prepare the Data to be Loaded
  3. Need to Test in Production-Like Environment
  4. Need to Prepare the Required Capacity in Production
  5. Need to Schedule Production Down Time
  6. Need to Schedule Who’s Going to Run it in Production
  7. Need to Schedule Who’s Going to Sign it off in Production

1. Need to Write IDL Scripts

We may need to load fact rows. We may need to load dimension rows. Or we may need to update the dim/fact table(s) with SQL scripts. These scripts need to be designed and confirmed with business, coded, tested in Dev and run in SIT/UAT environment using SIT/UAT data, validated/checked. When it’s all good, we run it again in SIT/UAT env, using real Production data.

You should never run a SQL Script in Production unless that script has been tested/run with real Production data in SIT/UAT environment. It is a suicide and you could be paying with very high price. The more complex the scripts, the more mandatory this “test with production data” becomes.

Because of this “test with production data” task, the development effort of the IDL script could easily increases by 10 times or even 20 times.

Typical IDL scripts are:

a) New attribute: set all historical values to a constant/fixed value
b) New attribute: set all historical values based on a file containing 2 columns: the business key and the attribute value.
c) New attribute: like b) but with certain business rules for type 2 attribute, i.e. only look at the effective date and ignore the expiry date, etc.
d) Change the data type of an attribute/measure: recalculate the past version of the attribute/measure to a higher precision.
e) New measure: Load the past values from files (file name contains dates).

Basically every time we create a new attribute or measure in an existing dim or fact table, we need to ask ourselves the important question: what should be the values of this attribute (measure) for past records?

2. Need to Prepare the Data to be Loaded

In the case of 1b, 1c and 1e above, we need to prepare the data files to be loaded into the data warehouse. The preparation of these data files could take significant effort and/or duration.

a) The person assigned to create the files may not have access to the production data. Getting access could take a week or two, and delay the process. The production data may contain confidential data and the developer may need to be cleared first (for example: in an investment bank where the FSA have public/private separation due to the Chinese Wall). The developer might be a contractor (rather than a permanent employee) who in big financial companies may have restricted access to production data.

b) The data warehouse team may require SIT data files in addition to the Production data files. And they may require the SIT data files to contain “boundary values”, “unknown values”, and “exception values”, which requires more effort to produce.

c) We/you may not have access to the source system and have to request the source system team to produce the data. And the source system team may not have the data for certain time period, forcing you to go back to the drawing board. Or, if they do have the data, they may not have the resource to do it until next month, which will delay your DW project.

d) To extract the data from the Prod source system, you may have to prepare a SQL script or an ETL package. This script/package will need to be tested first in SIT or UAT, before it is run in Production. This takes time. Do not run any Script or Package in Production without testing it successfully first in test env (SIT/UAT). If your company has a good IT organisation, there will be processes in place to ensure that all scripts/packages are tested and signed off first before they are run in Production. And if it causes production issue, then the person who signed it off will be brought responsible.

Because of all these points above, what you thought to be “ah it only takes a day to create that IDL data file” may take weeks of elapsed time and efforts, causing significant project delays.

3. Need to Test in Production-Like Environment

What I meant with this is that we need to test the IDL script or ETL package which updates the production data in Production-Like Environment. Let’s call this environment “IDL Test Env” (ITE). Say your production ETL code is version 3.5 but your SIT is version 3.6. Obviously you cannot run/test the IDL script/package in the SIT env. You need to create an ITE containing prod data and version 3.5 of the ETL code. If you ran it in SIT and it ran successfully, it didn’t mean anything. It might fail in Production because Prod is version 3.5 whereas SIT is version 3.6.

Preparing ITE with version 3.5 of ETL may take weeks, and again this could cause significant delay to the project. The DBA need to prepare DW tables which is exactly like Prod. The infrastructure people need to put the correct version of the ETL code, and they need to “dry run” the ITE first. Then after that they need to reset the ITE, ready for the real test.

4. Need to Prepare the Required Capacity in Production

Say next week you are going live for the first time (say it’s Tuesday 1/8/12), and as part of the business requirement you need to load last 2 years data into all fact tables. You need to calculate how much space is required to do IDL for this 2 years data. Is it 500 GB? Is it 1 TB? You then need to discuss with DBA/infrastructure team about this data volume you are about to load into Production environment.

The infrastructure team may say to you that they need to order more disks and it requires a few weeks. To avoid this disaster, it’s best to actively manage the data warehouse capacity as routine, regular exercise, not just when we do IDL. For example:

As you can see above, the infrastructure or DW manager is expanding the DW capacity every January, to 1400 GB of available space. The average consumption is about 50 GB / month. In Dec 2010, the available DW space was 420 GB and in Dec 2011 it was 610 GB. The go live in Aug 2012 bring down the available DW space down to 600 and it’s still 4 months to go.

The infrastructure / DW manager would not be happy, because this means that by Dec 2012 the available space in the DW would only be 600 – 4×50 = 200 GB. Meaning that a) there’s no room for another initiative, and b) there is a risk of running out of disk space. So the manager would need to order extra disk space asap (mid year) breaking the normal schedule of January.

So, tell whomever managing the DW capacity about your estimate of the data volume (500 GB in the above case), way before you do it for real in Production so that they can be prepare it in advance and manage the capacity properly.

5. Need to Schedule Production Down Time

When we load data into Production DW, often we need to have extra downtime. For example, if we schedule the IDL on Saturday morning to Sunday night, we may need to reserve Monday morning as a precaution in case the IDL overrun. If the IDL is estimated to take 4 days, we may want to break it into 2 weekends.

Some IDLs need to be run mid-week, particularly when the DW is already live. In this case we may need to schedule production down time. For example, you need to load new data on Wed next week, and you send out downtime notification to affected users this Friday. If you use materialised views, it is not downtime, but possibility of downtime. But if you don’t use MVs, then you’ll need to revoke all user access and have DW/BI production downtime.

To that end, it is important to make the IDL script “runnable bit by bit”. We need to run the IDL script bit by bit, not all in one go. For example if you need to load 25 months of data, load just 1 month first and check how long 1 month load takes. If it takes 1 hour, then load 3 months data. This 3 months data can take 5 hours or 2 hours (not linear) and you need to adjust the timing for the subsequent loads accordingly. Loading 25 months’ worth of data in one go is risking for disaster.

6. Need to Schedule Who’s Going to Run it in Production

Who will run the IDL script? Is it the DBA? Is it the Developer? In large corporations it’s usually the Production DBA, with the developer on the stand-by. In small companies it’s usually the developer. Whoever it is, it is usually out of office hours, i.e. week end or at night. The DW manager needs to arrange in lieu or overtime for the DBA/developer.

7. Need to Schedule Who’s Going to Sign it off in Production

In large corporations they usually have a system that monitors the production deployment. The deployment steps, the back out steps, the contact persons are all detailed there. Good corporations also require the proof of testing and the sign off document. Before they can schedule production deployment, they will insist on having the UAT sign off.

This sign off could be politically risky, particularly for large deployment. The person signing of the UAT will be a hot target to blame if any functional error is found in production. “Why did you miss that important piece of functionality?” is the question they need to face. Hence as preparation of the IDL, we need to clearly indicate who will be signing off the UAT.

After the UAT sign off is obtained, production support will need to sign off the production deployment. He or she understands the least risky time for deployment. For example, it is the development team’s wish (and the business’ wish) to run the IDL on Wednesday, but if the Production Support team decides that it is too risky, they could veto it (for the right reasons) and insist of running it on the following weekend instead.

Blog at WordPress.com.