Data Platform and Data Science

16 November 2011

SSIS: True/False and 1/0 values on Bit Columns

Filed under: SSIS — Vincent Rainardi @ 11:13 pm

The column is defined as bit on a SQL Server table. You created an SSIS package to export the table to a text file. In Mgt Studio when you query the table, the bit column displays 1 and 0 as its value, but in the output text file they are true and false.

You looked everywhere for what might have caused it to be true/false. You suspected the text file connection. But it is defined as string[50] there. So you changed the SQL query inside the workflow from

select column1 from table1

to: select convert(int, column1) from table1

But it is still the same. The output is still true/false not 1/0 as you see in Mgt Studio.

So you convert to varchar like this:

select convert(varchar, convert(int, column1)) from table1

You hoped that it will now be 1 and 0, but alas no. It is still true and false.

So you changed the data type in the text file connection from string to four byte signed integer.

What you get is -1 for true and 0 for false. Yes! Not 1 and 0 but -1 and 0.

So you changed the data type in the text file connection to four byte unsigned integer. Ha this will force it to be 1 and 0 you thought.

But alas, no! What you get is a very big number for true and 0 for false.f

So here the secret: right click on the OLE DB Source component in the data flow, and choose Show Advanced Editor. Go to the last tab, Input and Output Properties:

Go to OLE DB Source Output, Output Column (NOT External Columns!), then select the bit column. Change the DataType from Boolean to Four Byte Signed Integer.

Change your query to:

select convert(int, column1) as column1 from table1

Note: If you don’t the above the output will be -1 and 0, not 1 and 0.

And execute the data flow.

Voila, you get 1 and 0 in the output text file.

Note: whether in the flat file connection you defined the output columns as boolean or string or signed int, it will still be 1 and 0.

Hope it helps. As usual I welcome comments and corrections.

Vincent Rainardi, 16/11/2011

14 November 2011

WhereScape RED

Filed under: Data Warehousing — Vincent Rainardi @ 7:41 am

About 4 years ago I had an idea (well, more of a dream really) of “building a data warehouse at the press of a button”. This means that everything is automated, including the development of the reports/cubes and the development of ETL. I understand that we can build SSIS, SSRS, and SSAS programmatically (and Informatica PowerCenter mappings too). I also understand that it is painfully slow and tedious process. But we can make the process easier by creating a few templates then at run time we choose the nearest match template and modify it.

This works for ETL, reports and cube. But how about tables? Corresponding dim and fact tables could be generated on the fly by analysing the source tables based on its data types: numerical columns become measures and textual columns become attributes. Yet, I admit it is a difficult process, definitely requires human involvement. Still, it’s easier than doing it from scratch. We present the “machine made” fact & dim tables to the designer, who will then “fix” the tables. And the “which source column goes where” would be automatically fed to ETL creation module.

This was I think what Kalido does. But it does not create the cube and reports. It only creates the ETL. If I’m mistaken here I’d be happy to be corrected. I do feel strange that someone automate the DW build but not the BI part.

Wherescape RED

Like Kalido (and Insource Data Academy), Wherescape RED creates the ETL automatically. It also minimizes the time for designing the warehouse tables. Dimensions can be built by drag-and-drop the source table. SCD type 2 is also automated. RED also builds the cube automatically and the MicroStrategy project too. Documentation is also automated.

This article is about Wherescape RED so I’m going to go into a little bit more detail about how it is done.

In high level, the steps are:

  1. Load source tables
  2. Create dimensions
  3. Create dimension views
  4. Create staging tables
  5. Create fact tables
  6. Create SSAS cubes
  7. Create MicroStrategy reports

Having used it myself (evaluation copy) I found that it is easy to use. Step by step tutorial with screenshots is provided to it’s easy to do step 1 to 6 above (I have tried number 7 as it requires MicroStrategy extension to be installed – not included in the evaluation copy).

This is how it looks:

Lots of operations are done by drag and drop, and fill in the properties. It’s user friendly, easy to use, and quite simple to understand (note: you need to understand data warehousing of course).

You can use SQL Server, Oracle, DB2 or Teradata for your warehouse platform. These four are very popular platform for data warehousing. In the next release they include Netezza. No RED does not do Informix, Sybase or MySQL data warehouses.

The question that some people have about RED is: “Is it only suitable for simple DWs?” No it isn’t. Whether it’s complex fact table or complex dimension, we can use RED to build it. It can build aggregate fact table, SCD, hierarchy, KPI. We can also modify the generated SQL script if we need to do any custom stuff. You can also define indexes and create SQL agent jobs/scheduler.

The other questions are about the platform: Can it do BO? Informatica? Cognos reports? No, unfortunately. Any reporting tool such as BO/Cognos/RS can be used, but not automatically. You have to build all the reports manually. The ETL is SQL Script (T-SQL in SQL Server, PL/SQL in Oracle), or SSIS package. Yes RED can generate SSIS packages to load the data from source to stage/load tables, but not from stage to fact/dim tables. It doesn’t generate code for any other ETL tool: Ab Initio, OWB, ODI, PowerCenter, etc. Only SSIS.

The main disadvantage of RED (this is what people often ask when evaluating), is that it is ELT. It uses SQL Server T-SQL/Oracle PL-SQL to load data from the load tables to the DW/dim model (SSIS only load to stage/”load tables”). It doesn’t do ETL. Only ELT. Celinio explains the difference between ETL and ELT (link). Rob Davenport from Insource /Data Academy provides excellent white paper explaining advantage/disadvantages of ETL and ELT in great length.

ELT has its own advantages: flexible, easy to maintain programmatically*, simpler, lower cost (use DB server, not separate server). *difficult to maintain manually because it’s lots of SQL code. But of course once RED generates the SSIS package you can add all the transformation you like.

Second disadvantage is obviously the platform. It does not do Informix, or Sybase data warehouses. Or Ingress, MySQL or Greenplum. RED will be able to do Netezza DWs in the next release.

The main advantage of RED is fast development time. It surely cuts a lot of development time. My estimate is that a warehouse which takes 3 months to build using custom development (say using SQL Server + SSIS or Oracle + Informatica), can be built in 1 months using RED. And after it is built, changing or expanding is also quick.

Second advantage is ease of use. It’s a mature product, been there for years, so it’s easy to use. I found by following the tutorial within a day I’m able to use the product to build a DW. Very short training time. Proper ETL tools such as SSIS/Informatica requires weeks of basic training, plus weeks of on the job training (or buy the skills at £400-£500 / day).

Third advantage is once it’s built you can leave RED and continue to do it manually, maintaining the generated tables and ELT scripts or SSIS packages yourself. Expanding it as you need to, and developing the SSIS packages as you need to. And building the reports using your tool of choice, e.g. RS, Cognos or BO.

4th advantage and this is what I like best as I work quite a lot with cube: RED generates the SSAS cube. And RED also builds MicroStrategy project, enabling excellent reporting capability which is very flexible.

Conclusion

In my opinion Wherescape lives up to its reputation. We can use it build a data warehouse in much shorter amount of time. And this, in my experience, is what every BI consultant need. Reports only take a month to build but the DW takes 6 months* to build. If you can cut it down to 2 months it will be a big advantage, money wise and time/PM too. *that’s a small one, a big one is 18 months. The biggest plus point is that we can expand it later on: further develop and customise the SQL Scripts/SSIS package, SSAS cube, MicroStrategy projects for reporting. Unfortunately it does not generate Informatica or SSRS/BO/Cognos, some of the most popular tools in DWBI.

As always I’ll be glad to receive comments or corrections.
Vincent Rainardi 14/11/2011

Blog at WordPress.com.