Data Warehousing and Data Science

31 May 2011

Differences Between Data Warehousing and Business Intelligence

Filed under: Business Intelligence,Data Warehousing — Vincent Rainardi @ 6:38 pm
Tags: ,

Try asking your colleague what is the difference between business intelligence and a data warehouse. I find that a lot of people, even those who work in BI projects and BI industry, do not understand the difference. A lot of people use these 2 terms interchangeably. Some people even prefer to use 1 term instead of the other because it simply “sounds better”. Many people think that business intelligence is not just a data warehouse, but there is more to it. But when asked “what business intelligence systems are not data warehouse systems?” or “what part of business intelligence systems are not data warehouses?”, most of them have difficulties explaining the answer.

These days, “business intelligence” is the norm used by most vendors in the industry, rather than “data warehouse”. Most of them call / classify their tools as business intelligence software, not data warehouse software. The name of Cognos product is “Cognos 8 Business Intelligence”. BusinessObjects label themselves as “BI software company” and “global leader in BI software”. The name of one of Hyperionproducts is “Hyperion System 9 BI+”. SAS Enterprise BI Server provides a fully integrated and comprehensive suite of business intelligence software. Microsoft promotes SQL Server 2005 as the end-to-end business intelligence platform. It seems that only Kimball Group who consistently use the term data warehouse. Bill Inmon, as the inventor of this term, also uses the term data warehouse.

So, let’s get into the details. This is an example of a data warehouse system:


It includes ETL from the source system, front end applications (those 10 boxes on the right hand side), and everything in between. It has a control system, an audit system and a data quality system (also known as data firewall). Not all data warehouse systems have all the components pictured above, for example, some data warehouse system may not have operational data stored (ODS), see this article for details.

The 2 blue items are data warehouse databases. The cylinder is in relational format (labelled as dimensional data store, DDS for short), the box is in multidimensional format (labelled as cubes in the picture above). This blue cube is also known as on line analytical processing cube, or OLAP cube for short.

The yellow items are business intelligence applications. Most business intelligence applications take data from multidimensional format data warehouse, but some do take data from the relational format. The whole diagram above is also known as business intelligence system.

Some business intelligence applications take data directly from the source system. For example, some dashboard systems may get sales summary data from the source system and display it in gauge meter format. In this case, we can not call the system a data warehouse system. It is still a business intelligence system, but it is not a data warehouse system, because it does not have a data warehouse database behind the gauge meter application.

Business intelligence systems, in the past also known as executive information systems, or decision support systems, are a non-transactional IT system used to support business decision making and solve management problems, normally used by top executives and managers. Many varied definitions exist in the market place today about the business intelligence system; one from Dr. Jay Liebowitz is arguably one of the better ones. Most people agree that OLAP and data warehouse systems are a major and important part of business intelligence systems. Most business intelligence systems are in the form of a data warehouse systems. Yes, there are business intelligence systems that do not use OLAP or data warehouses, as illustrated in the example of gauge meter application above, but they are more rare than the ones with OLAP or a data warehouse.

According to Ralph Kimball, in his book The Data Warehouse ETL Toolkit, a data warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making. He stressed that a data warehouse is not a product, a language, a project, a data model or a copy of transaction system. In an interview with Professional Association for SQL Server (PASS) on 30th April 2004, he explained about the relationship between data warehousing and business intelligence.

In their latest book, The Microsoft Data Warehouse Toolkit, Joy Mundy and Warren Thornthwaite do not differentiate data warehouse systems and business intelligence systems. They consistently use the term DW/BI system throughout the book. This is understandable because, as I describe above, most business intelligence systems are in the form of a data warehouse system.

Bill Inmon, who invented the term data warehouse, defines data warehouse as a source of data that is subject oriented, integrated, nonvolatile and time variant for the purpose of management’s decision processes. He pointed that the term data warehouse was never trademarked or copyrighted. As a result, anyone can call anything a data warehouse. He recently defined a new term, DW 2.0, and this one is trademarked so nobody can change the definition. He explained the architecture in his article in dmreview, along with the differences between the first generation of data warehouses and DW 2.0 and its advantages.

So, as a summary, back to the original question, what is the difference between data warehouse and business intelligence? Most business intelligence systems are based on data warehouse systems (the one with dimensional model, fact tables, dimension, etc), but some business intelligence systems are not data warehousing, i.e. taking data directly from the source system, like the example described above. Business intelligence application (as opposed to business intelligence system) is the yellow boxes on the diagram above, i.e. the front end applications. The data warehouse database (or sometimes people dropped the word database, so it becomes just ‘data warehouse’) is the blue cylinder and blue box on the diagram above, i.e. the dimensional storage, whether in relational database format or in multidimensional database format.

If people say ‘data warehouse’, be careful because it can mean either data warehouse system (the whole diagram above) or data warehouse database (just the blue items). If people say ‘business intelligence’, it can mean either business intelligence system (the whole diagram above, or a BI system without data warehouse) or business intelligence application (the yellow boxes).

I hope this article makes the terms clearer, but I am open to comments and suggestions. As Ralph Kimball said, if you ask 10 different people what data warehouse is you are likely to get 10 different answers.

Vincent Rainardi
1st May 2006

This is a repost from SQLServerCentral.

28 May 2011

Many-to-Many: Which Dimension is Used?

Filed under: Analysis Services — Vincent Rainardi @ 6:19 am

In SSAS, it is possible for a M2M to have 2 paths.  The first one is through Dim1 and second one is through Dim2. In this case, which path will be used by M2M? I’ve come across this question about 2 years ago, and have been delaying finding the answer until now. So here it is.

Ah before I start, 2 things. 1) This post is related this post I wrote 2 days ago: Many to Many Is Not Always Right. 2) Thank you to Robert [last name] for bringing this up. I came across it for the first time 2 years ago, and I’ve always wanted to find out.

As always, it is easier to explain (and learn) by example.

On the top we have Fact Account Balances which list how much money is in each account.

The second fact table (Fact Customer Account) links each customer to the Fact Account Balances in two ways: via Dim_Account and Dim_Account2. For example, these 2 account dims could be Mortgage Account dimension and Saving Account dimension.

If we create a cube with M2M like this, which dim will M2M use, Dim Account or Dim Account2?

The answer is: both, it depends on the data.

Let’s populate the tables like this:

<pre>select * from M2M.Dim_Customer
select * from M2M.Dim_Account
select * from M2M.Dim_Account2
select * from M2M.Fact_Customer_Account
select * from M2M.Fact_Account_Balances

As you can see above, 3 customers, 3 accounts with balance = £20, £30, £40. Customer1 owns account1, customer2 owns account2, customer3 owns account3. Ideal situation.

As we expect, the cube displays this:

Now’s let’s change the relationship between customer and account:

On the second path, both customer 1 and 3 owns account 1.

The cube is now showing:

What does it mean? It means that only the rows where account1 = account2 are shown. SSAS only shows the rows when both paths are the same. It does not show the rows where the paths are different.

Here’s the rest of the experiment:

-- Rule 1. Only when account1=account2 it's shown, otherwise it's not shown
(1, 1, 1)
(2, 2, 2)
(3, 3, 3)
Output: all 3 customers are shown, balance: 20,30,40

(1, 1, 2)
(2, 2, 3)
(3, 3, 1)
Output: none is shown

(1, null, 1)
(2, 2, 2)
(3, 3, null)
Output: only customer2 balance 30 is shown

(1, 1, 1)
(2, 2, 2)
(3, 3, 1)
Output: only customer1&2 are shown, balance 20 & 30

-- Rule 2. null is not shown (even though they are the same, i.e. both are null):
(1, null, null)
(2, 2, 2)
(3, 3, 3)
Output: only customer2 & 3 are shown

-- By the way: the total is always 90 in all cases above

So my conclusion is: if M2M to have 2 paths, SSAS only shows the rows when both paths are the same. As always, I welcome comments and questions on vrainardi@gmail.com.

Vincent Rainardi, 27/05/11.

PS. SQL Script is below

-- Find out which dim is used by M2M

-- Create account balances fact table
create table M2M.Fact_Account_Balances
( Account_Key int, Balance money )

insert into M2M.Fact_Account_Balances values (1, 20)
insert into M2M.Fact_Account_Balances values (2, 30)
insert into M2M.Fact_Account_Balances values (3, 40)

-- Create 2 Dim Accounts
create table M2M.Dim_Account
( Account_Key int, Account_Number varchar(20) )

insert into M2M.Dim_Account values (1, 'Account1')
insert into M2M.Dim_Account values (2, 'Account2')
insert into M2M.Dim_Account values (3, 'Account3')

create table M2M.Dim_Account2
( Account2_Key int, Account2_Number varchar(20) )

insert into M2M.Dim_Account2 values (1, 'Account1')
insert into M2M.Dim_Account2 values (2, 'Account2')
insert into M2M.Dim_Account2 values (3, 'Account3')

-- Create customer-account factless fact table
create table M2M.Fact_Customer_Account
( Customer_Key int, Account_Key int, Account2_Key int )

truncate table M2M.Fact_Customer_Account
insert into M2M.Fact_Customer_Account values (1, 1, 1)
insert into M2M.Fact_Customer_Account values (2, 2, 2)
insert into M2M.Fact_Customer_Account values (3, 3, 1)

-- Create customer dimension table
create table M2M.Dim_Customer
( Customer_Key int, Customer_Name varchar(20) )

insert into M2M.Dim_Customer values (1, 'Customer1')
insert into M2M.Dim_Customer values (2, 'Customer2')
insert into M2M.Dim_Customer values (3, 'Customer3')

select * from M2M.Dim_Customer
select * from M2M.Dim_Account
select * from M2M.Dim_Account2
select * from M2M.Fact_Customer_Account
select * from M2M.Fact_Account_Balances

-- Rule 1. Only when account1=account2 it's shown, otherwise it's not shown
(1, 1, 1)
(2, 2, 2)
(3, 3, 3)
Output: all 3 customers are shown, balance: 20,30,40

(1, 1, 2)
(2, 2, 3)
(3, 3, 1)
Output: none is shown

(1, null, 1)
(2, 2, 2)
(3, 3, null)
Output: only customer2 balance 30 is shown

(1, 1, 1)
(2, 2, 2)
(3, 3, 1)
Output: only customer1&2 are shown, balance 20 & 30

-- Rule 2. null is not shown (even though they are the same, i.e. both are null):
(1, null, null)
(2, 2, 2)
(3, 3, 3)
Output: only customer2 & 3 are shown

-- By the way: the total is always 90 in all cases above

26 May 2011

Many to Many Is Not Always Right

Filed under: Analysis Services — Vincent Rainardi @ 7:10 am

Suppose you have an account balance table:

Account    Balance
Account1    £10
Account2    £10
Account3    £10

And you have “Which customer owns which account” table:

Customer    Account
Customer1    Account1
Customer2    Account1
Customer1    Account2
Customer2    Account2

In other words, Account1 are owned by customer 1 and 2. Same with Account2.

How much money does customer 1 have?
How much money does customer 2 have?

Assuming that customer1 doesn’t have money in another bank or at home or in his wallet, then the correct answer is: we don’t know. The £10 in Account1 could be split £3-£7 between customer1 and customer2. Or it could be £2.50-£7.50. We don’t know.

If we do this as M2M in SSAS, what does SSAS think?

Cube structure:

Why £20 for customer1? Well customer1 owns both account1 and account2, and there’s £10 on each account hence £20.

Why is the total £30? Hmm, that’s where I can’t understand. SSAS sums all the accounts, 3 x £10 = £30. SSAS should be summing account1 and account2 only (2x£10=£20) or summing what on display, i.e. customer1 & customer2 (2x£20=£40). But no, it displays £30.

To proof that it sums all 3 accounts, I’ll change the balance of account3 from £10 to £7, and reprocess the cube.

The total is now 27.

What does account3 have to do with customer1 and 2? I don’t know. Hence the title of this article, Many to Many Is Not Always Right.

If anybody can explain this, I’d be glad to hear from them. In the mean time I’ll also try to find out.

One more thing, if it’s not many-to-one (many customers own one account), but one-to-many instead (one customer owns many account), then the balance for each customer is correct. Of course it is! There’s no dispute about how to proportion the balance of an account to 2 people! For example:

Customer    Account
Customer1    Account1
Customer1    Account2
Customer2    Account3

In this case SSAS displays:

But, as you notice, the total is still wrong! Well it doesn’t make sense that the numbers above it are 20 and 10 and the total is 27, does it?

Vincent Rainardi, 26/5/11, vrainardi@gmail.com

PS. The script to create and populate the tables is below.

use Test
go

if exists (select * from sys.schemas where name = 'M2M')
drop schema M2M
go

create schema M2M
go

-- Create account balances fact table
if object_id('M2M.Fact_Account_Balances') is not null
  drop table M2M.Fact_Account_Balances
create table M2M.Fact_Account_Balances
( Account_Key int,
  Balance money
)
go

insert into M2M.Fact_Account_Balances values (1, 10)
insert into M2M.Fact_Account_Balances values (2, 10)
insert into M2M.Fact_Account_Balances values (3, 10)
go

-- Create account dimension table
if object_id('M2M.Dim_Account') is not null
  drop table M2M.Dim_Account
create table M2M.Dim_Account
( Account_Key int,
  Account_Number varchar(20)
)
go

insert into M2M.Dim_Account values (1, 'Account1')
insert into M2M.Dim_Account values (2, 'Account2')
insert into M2M.Dim_Account values (3, 'Account3')
go

-- Create customer-account factless fact table
if object_id('Fact_Customer_Account') is not null
  drop table M2M.Fact_Customer_Account
create table M2M.Fact_Customer_Account
( Customer_Key int,
  Account_Key int
)
go

insert into M2M.Fact_Customer_Account (Customer_Key, Account_Key) values (1, 1)
insert into M2M.Fact_Customer_Account (Customer_Key, Account_Key) values (1, 2)
insert into M2M.Fact_Customer_Account (Customer_Key, Account_Key) values (2, 1)
insert into M2M.Fact_Customer_Account (Customer_Key, Account_Key) values (2, 2)
go

-- Create customer dimension table
if object_id('M2M.Dim_Customer') is not null
  drop table M2M.Dim_Customer
create table M2M.Dim_Customer
( Customer_Key int,
  Customer_Name varchar(20)
)
go

insert into M2M.Dim_Customer values (1, 'Customer1')
insert into M2M.Dim_Customer values (2, 'Customer2')
go

select * from M2M.Dim_Customer
select * from M2M.Dim_Account
select * from M2M.Fact_Customer_Account
select * from M2M.Fact_Account_Balances
update M2M.Fact_Account_Balances set Balance = 7 where Account_Key = 3

20 May 2011

Create table error: Concat Null Yields Null

Filed under: SQL Server — Vincent Rainardi @ 10:45 pm

Today I was trying to create a table and got this message:

SQLServer1: DDL_DATABASE_LEVEL_EVENTS: Error: INSERT failed because the following SET options have incorrect settings: ‘CONCAT_NULL_YIELDS_NULL’. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
Msg 3616, Level 16, State 2, Line 1
An error was raised during trigger execution. The batch has been aborted and the user transaction, if any, has been rolled back.

The DDL was:

create table schema1.table1
( column1 int not null,
  column2 varchar(10),
  column3 varchar(20),
  constraint PK_table1 primary key clustered
  ( column1 )
)

Solution: Tools menu, option, Query Execution, SQL Server, Advanced: make sure “SET CONCAT_NULL_YIELDS_NULL” is checked.

If it is not checked, the create index statement will error. In the above DDL it tried to create a clustered index for the PK hence it failed.

For further info see http://msdn.microsoft.com/en-us/library/ms176056.aspx

LINQ2SQL and TSQL and CLR

Filed under: SQL Server — Vincent Rainardi @ 10:33 pm

Somebody just asked me today about the differences between LINQ 2 SQL, TSQL and CLR. I thought I’d post my short answer as it might be useful for other people.

Language Integrated Query, or LINQ is an application-oriented database query language, for example:

var q =
from c in db.Customers
where c.City == “London”
select c;
(source: http://msdn.microsoft.com/en-us/library/bb425822.aspx)

SQL is a row-oriented database query language, for example:
select * from Customers where City = ‘London’

TSQL is the Microsoft version of SQL, PL/SQL is the Oracle version, NZSQL is the Netezza version.

LINQ is good for application-database integration, but the problem is: databases accept SQL language, they don’t accept LINQ.
LINQ 2 SQL is a tool where developer can write LINQ in their application, and this tool will translate it into SQL automatically so that the database can accept it. The output from the database (rows) is then translated back to LINQ Objects.

A Stored Procedure is a collection of SQL queries e.g. TSQL, PL/SQL, NZSQL.
CLR is a way of writing Stored Procedure using application-oriented languages such as C#, C++ and VB.

19 May 2011

Data Warehousing & BI Companies

Filed under: Business Intelligence — Vincent Rainardi @ 10:57 pm

As a professional in data warehousing and BI, I need to know the companies in DWBI. And I do intend to stay in data warehousing and BI in the next 10 years, may be 20, making this need even more important. That’s the first purpose of writing this page. Secondly, it’s for you. Everybody knows (or they think they know) who IBM are. But who knows who Adatis are? Not many. And, for those who think they know IBM (in data warehousing/BI context that is), they don’t know that well. For example, did you know what Initiate is? Even people who work for IBM might not know what Initiate is. (it is a MDM service) That was the 2nd purpose of this page. The 3rd purpose: people forget. I often ask myself “What was the name of that company again?” If I know the name I can Google it. But I forgot the name! With this page I can browse and find it. And you can too. The 4th and last purpose is to share the DW/BI companies I have come across throughout the years.

I think that’s enough intro. I’ll get on with it. This page will contain the companies in data warehouse/BI world, e.g. solution providers, product providers, consultancy firms, education/training companies, etc. That is how I will group them. For each company, I will describe who they are in 1 or 2 lines. I will also write the URL. Later on, if I need to write more than 1 paragraph about a company, I will provide a separate page. I will limit myself to data warehousing and BI. A lot of companies are not just doing DW/BI; they do other businesses as well. I will not write about their non-DWBI businesses. If a company is mixed, e.g. they are a product provider, and they are a consultancy firms, I will put them in 1 area only, i.e. the area where they are most known for.

Some people whom I discussed the idea of this page with, responded with “How about individuals?” Why don’t you put famous DW/BI people here as well? We need to know them too? Some of the readers of my “Analysis Services Who’s Who” page said “Are you doing to do the same for data warehousing people, i.e. Ralph Kimball, Bill Inmon, etc?” The answer is I will. I have, in fact. I started writing it 12 months ago, got only 2-3 names then I never finished it. But I will. It would be on a separate page though, not on this companies page. What I will do is probably write about 30 or so companies, then 10 or so people, then publish them. And I will add to them as we go along. But I will finish this page first, before I do the individuals.

Before I start, the credit first. There are many people, web pages, and books (on Safari) that inspired me to write this page. But the most influential of them all is Andrea Vincenzi’s BI & DW Resource Centre. So thank you Andrea, for the inspiration.

So here we go. If you find there’s something is incorrect, I would be grateful if you could let me know and I will correct them.

DWBI Consultancy – UK

  1. IM Group: A well known and respected name in London for their expertise in MS BI. One of the very few who are still specialising in MS BI. About 70 consultants.
  2. Conchango: Conchango was a well known and respected name in London for their expertise in MS BI. Now part of EMC, a worldwide IT company (not just consulting).
  3. Hitachi Consulting (was Edenbrook): A worldwide consulting company. Not a BI consulting specialist, but a general IT consulting practice. Edenbrook was a BI specialist.
  4. Altius Consulting: 2 areas: Enterprise Performance Management and BI. A respectable old name in London in MS BI. Now also do Oracle BI (Hyperion and OBIEE).
  5. Adatis: A DW/BI consultancy specialising in MS BI.
  6. Contemporary: A specialist in BO and MS BI. Also gives training.
  7. Business & Decision: A worldwide company, 19 countries, 2800 consultants. Public company (listed on Euronext). Also does lots of non-BI stuff.
  8. bIntelligent: Moore Stephens’ consulting arm. Well known in London for MS BI in insurance sector. Its parent company, Moore Stephens, is an accountancy firm.
  9. Insource: offering a product called Data Academy which builds a SQL Server DW fast (like Red). Analyzer is the front end client. Pity they chose to specialise on Health Care.
  10. Clarity Integration: a generalist BI consulting practice (not specialist in certain platform, well if I must specify: MS BI, Informatica, BO). Also offer their own product (ClearView).
  11. Peak Consulting: A DW/BI consulting company, but generalist. In London, NY, Dubai and Mumbai.
  12. IT Performs: An BI consulting firm specialising in BO and MS BI. Also QlikView. Based near Birmingham, UK.

Companies Producing DWBI Software:

  1. Microsoft: SQL Server, PDW, SSAS, SSIS, SSRS, Crescent, Performance Point, PowerPivot. The most widely used DWBI products on the planet, in my observation.
  2. Oracle: Oracle 11g, OBIEE (was Siebel), ODI, Hyperion, Essbase, Exadata, Golden Gate, Oracle OLAP. Widely used, good quality, good support. One of the oldest, top, respectable names in DWBI.
  3. IBM: Data Stage, Infosphere, Cognos, TM1, DB/2, Express. Now include Netezza. Giant in DWBI, possibly in top 3 if I have to answer “Who’s the biggest in DWBI?”
  4. Informatica: PowerCentre, IDE, IDQ. Best ETL/DI tool on the planet, features wise (and performance). One of the oldest, well respected names in BI that hasn’t been bought yet by the big companies.
  5. SAP: BusinessObjects, Business Warehouse, Netweaver BI. Giant in DWBI, worldwide.
  6. Strategy Companion: offering a product called Analyzer, which is a SSAS client (cube browser) and dashboard. Very user friendly.
  7. Tableau: the best SSAS client (appearance wise), in my opinion. Good charting/visualisation.
  8. QlikTech: appearance wise, QlikView is the slickest analytics tool on the planet. Quick to build (1-2 month).
  9. MicroStrategy: comprehensive reporting solution and integrated BI platform. Multidimensional and relational. One of the oldest, well respected names in BI that is still going strong as individual company (not yet acquisitioned).
  10. Ab Initio: ETL/DI tool. Very secretive. I think feature wise much less than Informatica, hence the reason of not revealing.
  11. Teradata: a very good MPP, feature wise. Long history, proven in many customers. Wide range of products (in terms of capacity & price).
  12. Wherescape: offering a DW product called Red, which is known for its speed in building a DW in any RDBMS. In UK, US and NZ.
  13. SSAS clients: various companies producing SSAS tools. The most comprehensive list on the planet (by far), thanks to Vidas. 74 tools (!) and counting.
  14. Jaspersoft: growing rapidly in popularity. Widely known for their very low cost. Complete BI platform: DI/ETL, Reporting, Analytics.
  15. SAS: analytics, data mining, ETL/DI, DQ (DataFlux), MDM. One of the most complete BI platform. Historically respected as world #1 in data mining.
  16. Actuate: a product suite called BIRT (stands for BI Reporting Tool), which is known for its visualisation. No ETL/DI.
  17. Inetsoft: a product suite called Style, a reporting tool known for its visualisation. No ETL/DI.
  18. iQ4bis: reporting/analytics solution. Prebuilt plugins for JD Edwards & MS ERPs (Dynamics, Axapta, Great Plain, Dynamics NAV) for reporting/analysis via SSAS, SSRS, Excel Pivot tables and iQ4bis own tool.
  19. Board: reporting/analytics plus EPM (budget vs performance, etc). Award winning (BI Survey). Quick to build (1-2 months).
  20. Trillium: considered as one of the best and widely used data quality tool in DWBI world. The company is a DQ specialist, not acquisitioned by anyone yet.

DWBI Consultancies – World Wide

  1. SQLPower: BI, XBRL and data migration. Based in Canada. Many platforms: BO, Microstrategy, Oracle, Informatica, etc.
  2. KDNuggets: list of many consulting companies in data mining and analytics, probably 100 companies from all over the world.
  3. McKnight: a specialist in DWBI consulting based in Texas. Various DWBI platform & tools. Also do training.
  4. Data Warehouse Consultants: DWBI specialist based in Pittsburgh. Founded in 2004 by John Shantz.

DWBI Recruitment Agencies

  1. Business Intelligence Recruitment: DWBI recruitment specialist in Australia, based in Brisbane.
  2. The Data Warehouse Company (TDWC): DWBI recruitment specialist in UK, based in London.

DWBI Education and Training

  1. The Data Warehouse Institute (TDWI): provides education, training and certification in data warehousing, based in US. The most well-known and influential certification program in data warehousing.
  2. Toronto College of Technology: offering Postgraduate Diploma in DWBI for CAD 9950. Covers Oracle, SQL Server, Erwin, Cognos, Crystal, Informatica and SAS.
  3. Koenig: 1 week Oracle Data Warehousing boot camp in India for £790 incl. accommodation, meal and transport.
  4. Axioms Semantics: offers DWBI courses in Chennai, India. Microstrategy, MS BI, Informatica, OWB, Cognos and Crystal training courses.
  5. CCCITM (Canadian Career College of Innovative Technology and Management): offers a Post Graduate Diploma in DWBI in Canada.
  6. Aroha: a training company in Bengalore, India, specialising in DWBI. Offers a Diploma in Data Warehousing, covering SQL, DW concepts, Informatica, BO.
  7. University of Dundee: the School of Computing offers MSc degree in Business Intelligence. Taught by Dr Mark Whitehorn, one of well-known, long standing, very respectable names in data warehousing in UK. To my knowledge, this is the only one in the world (formal MSc title in BI), and it’s from a reputable university.

4/1/16 Update:

Apologies I missed Kimball University in the Education section. This is the best data warehouse training course in the world. Unfortunately they stopped doing DWBI courses in December 2015.

I was recently asked to specify DWBI companies in India. In India, the big 6 (TCS, Infosys, Tech Mahendra, Wipro, HCL, Cognizant) all have BI practice.

And these are the top BI companies in India:

  • Aditi, Bengalore & Chennai, aditi.com
  • Birla, Noida, birlasoft.com
  • Exilant, Bengalore, exilant.com
  • Advantics, Mumbai, advantics.co.in
  • Board, Mumbai, board.com
  • Birst, Ahmedabad, birst.com
  • Datawise, Hyderabad, mydatawise.com
  • Deflytics, Mumbai, Deflytics.com
  • ElegantJ, Admedabad, elegantjbi.com
  • Graymatter, Bengalore, graymatter.co.in
  • Helical, Hyderabad, helicaltech.com
  • Inteliment, Pune, inteliment.com
  • Ixsight, Mumbai, ixsight.com (look at the MDM)
  • Jaspersoft, Bengalore, jaspersoft.com
  • MAIA, Mumbai, maia-intelligence.com
  • Nalgan, Hyderabad, nalgan.com
  • OTSI, Hyderabad, OTSI.co.in
  • Progen, Hyderabad, progenbusiness.com
  • Sryas, Chennai, sryas.com
  • Symtrax, Mumbai, symtrax.com
  • Techaxes, Gurgaon, techaxes.com

(source: cioreview.in, removing companies which I think are not DWBI enough)

We must also look at the IT centres of Investment Banks as they do a lot of DWBI development in India, such as: Royal Bank of Scotland (Gurgaon), Bank of America (Hyderabad), Citi (Pune). Or other business sector such as telecom: Verizon (Chennai), T-Mobile, Sprint.

And my favorite DWBI company in India is: Capgemini, Bengalore, https://www.in.capgemini.com/insights-data

17 May 2011

Don’t Worry About the Sign Off

Filed under: Project Management — Vincent Rainardi @ 5:26 pm

A traditional DW/BI project has requirement, design, build and test phases. And between each phase we have deliverables. Dimensional Model for example, is one of the early deliverables, produced at design phase. If you look into the design phase in more detail, it consists of several activities: ETL Design, Dim Model Design (aka Warehouse Design), etc. At some point, a design document (or business requirement/analysis document) needs to be signed off, and it is used as a base for the next phase.

Here’s an example: business requirement document (BRD) –> functional requirement document (FRD) –> technical specification (TS) –> coding/build. The person writing the FRD will use the requirements defined in BRD. And the person writing the TS will use the functionalities defined in the FRD. Of course there are other things like quality requirement, non functional requirement, etc but that’s beside the point.

So a BRD needs to be signed off, before the FRD can be written. And the FRD needs to be signed off, before the TS can be written. And the TS needs to be signed off, before the code can be written. Right? Wrong. That’s the old concept. In these days and age, in BI projects, we don’t wait until a document is signed off, before we start the next stage. A soon as the author declared “I think I have the guts of it written”, then the next stage begins. Of course, when the upstream document changes, the downstream document needs to be updated. But that’s fine, because usually it’s minor changes. This method can save the project delivery time by a quarter. Now we are talking huh?

The second reason is: because no body knows the requirement in details. They can only specify the requirement to the best of their knowledge at that time. Of course it will change. Especially in BI project. Once you show the cube/report only then the business understand it better, and from that understanding comes out new requirements. Of course. That’s natural. Best systems are the ones which achieved version 5. This means that the software have incorporated feedback from the first version, second version, etc. It is a mature product.

The same goes with BI. It’s not a good idea to do it like 20 years ago: spending a lot of time interrogating the business users to get the requirements and then ask them to sign it off. So that if something goes wrong, you have it on paper. That’s a thing of the past. We don’t do that any more in 2011. Why? Because 20 years ago, the pace is very slow. You have 3 years to design a system. Now? A year is probably too long. Especially in BI and data warehousing.

Agile? Yes. Should we use Sprint? Absolutely. This actually removes the responsibility for the project team to “deliver on time”. Because we are not measured by time. We are measured by the user satisfaction. What’s the point delivering the system on time and on budget but doesn’t satisfy the business? We delivered according to the requirements, and they signed off the requirements. That’s why it’s not difficult to sell the Agile concept to the business, because what they need is satisfaction, not a system that is delivered on time.

Blog at WordPress.com.