Data Warehousing, BI and Data Science

7 November 2009

Informatica Metadata Manager can now read SSAS cubes and SSRS reports metadata

Filed under: Analysis Services,Data Warehousing — Vincent Rainardi @ 9:06 pm
Tags: ,

In Informatica version 9, PowerCenter Metadata Manager can now read SSAS and SSRS metadata. Metadata Manager (MM) is a tool to document the data lineage. It interogates the ETL tool, the RDMBS, the reporting tools and the analytics tools to get their metadata. Then it finds out (automatically) where each piece of data is coming from. For example, if your DW/BI system consists of:

  • Source System in Oracle
  • ETL in Informatica
  • Data Warehouse (DW) in SQL Server
  • Data Models in ERWin
  • Reports in SSRS
  • Cubes in SSAS

then using Informatica 9 Metadata Manager you can create a diagram that explains:

  1. From which column in the Source System each column in the DW is derived from (and vice versa)
  2. Which Transformation in which Informatica mapping does it #1 above (and vice versa)
  3. Which ERWin object is associated with each DW table (and vice versa)
  4. From which column in DW each attribute in the cube is derived from (and vice versa)
  5. From which column in DW each column in the SSRS report is derived from (and vice versa)

At the highest level the diagram probably consists of 5 icons: Oracle – PowerCenter – DW – ERWin – SSRS – SSAS. If click the + sign on any icon, it opens up and show you the details. For example if you click on the DW it shows the tables, and if you click on a table, it shows the column. If you click on the SSRS icon, it shows the relevant reports, click on the reports and it shows the columns. Go here and scroll down to page 2 to see the screenshot of that diagram.

Data lineage diagram like that is indispensible and a must have in any data warehousing project. Sorry, any large data warehousing project. It’s not for the sake of documentation, but more importantly it’s for impact analysis. You get to know in detail which objects will be affected if you modify the data type of column1 in table1 in the warehouse. The problem with data lineage, as with any documentation, is incorporating changes. Sure it’s up to scratch when your BI system went live, but after just 3 releases my experience shows that the documentation would usually lack behind. Hence we had to trace those columns manually. With data lineage tool like Metadata Manager we can point it to our system and it creates the diagram. It shows where the data flows to, in detail. Up to the reports and cubes.

Informatica 9 PowerCenter Metadata Manager can read BusinessObjects, Cognos, Netezza, MicroStrategy, Hyperion Essbase, DB2 Cube View, ERWin, PowerCenter, SQL Server, Oracle, Sybase, Teradata, Informix, Sybase, DB2 UDB, Oracle BIEE and SAP, to get their metadata. Up to version 8.6, it could not read SSRS and SSAS. Now in version 9 it can.

No SSIS is not in there yet. Hopefully soon.

2 November 2009

SQL Server Editions

Just a short post today about SQL Server editions. A quick reminder for all of us about the editions of SQL Server to use for our development and test servers. Many DW & BI developers don’t get involved in licensing, but even though it is not our decision at least we can point it out to the appropriate people. My point is: if we use Enterprise edition for Production, we need to use Developer edition for our Dev/UAT server, not Standard or Enterprise edition. This is because:

  1. Developer edition is 100% the same as Enterprise. Standard edition isn’t. There are many features that Standard edition doesn’t have. See list below. If we use Enterprise in Production and Standard edition for our Dev/UAT server, we can never be sure that it will work in Prod even though it works in Dev/UAT.
  2. Developer edition is a lot cheaper than Standard. 2008 developer edition retails at £50 in UK, whereas Standard is £4,325 (per processor, ex VAT*). Yes, it’s true that Developer edition is per developer, not per server like Standard and Enterprise, so if we have 10 developers we pay £500, but it’s still cheaper than Standard. A SQL Server with two dual-core processors costs £8,650. Even if we get 30% discount for volume licensing it’s still a lot more than the developer edition. There is a 10 CAL options (CAL=named user*) for £1,825, but BI systems tend to be used widely (hundreds of users) so it will cost more.
  3. The worse scenario is if we use Enterprise edition for Dev/Test server. In the UK, Enterprise retails at £17,285, per processor. A dev server with two dual-core processors will costs double of that, £34,570. Even with 30% discount it’s still a lot more than developer.

Notes: (those marked with *)

  1. All the prices I quoted are excluding VAT (Value Added Tax). This is because generally speaking companies will claim the VAT back. VAT rate is currently 15% but will be back to 17.5% on 1st Jan.
  2. CAL (Client Access Licence) can be a user or a device. The word “user” means a named user, not a concurrent user. The word “device” practically speaking means a PC or a laptop. A mobile phone running Windows is also a device. Practically speaking, some companies use “per device” rather than “per user” because it’s easier to count the number of PCs than the number of users. Note that to use CAL we need to buy the server licence first, see here.
  3. The prices above are from eCostSoftware. Dabs (ex VAT): £18,346 for Enterprise, £4,557 for Standard and £34 for Developer.
  4. The word “per processor” is exactly that: the chip. It’s not “per core”. We don’t count the number of cores or threads. See here and here for Microsoft multi-core policy on SQL Server licensing.
  5. For data warehousing and BI, companies tend to use powerful server. Well, naturally. Not only for the database engine, but also for the ETL server (SSIS) and OLAP server (SSAS). Because of the SQL Server licensing, server with 4 or 6-cores processors tend to be used more than single- or dual-core, for example on HP ProLiant DL580 G5, DELL PowerEdge R900 and Unisys 7400.

A feature comparison between Enterprise and Standard edition is here and here. In Data Warehousing and BI, there are many Enterprise features that we need to use, and they are not available in Standard edition (I’m talking about 2008 here): table and index partitioning, parallel index operations, database snapshots, index views, online operations, resource governor, backup compression, hot-add memory and CPU support, fast recovery, online page and file restore, parallel index operations, updated distribution partitioned views, data encryption, partitioned table parallelism, data compression, star join optimisations, change data capture. Hmmm, it’s probably useful if I write how these Enterprise features (DB engine and SSAS) are used in data warehousing and BI. I mean what they are used for, rather than repeating Books Online explaining what they are. But that’s for another post, not here.

For Analysis Services there are many features that we often take for granted, but they are not available in Standard edition: partitioned cubes, proactive caching, perspective, semi-additive measures, auto parallel processing, account intelligence, writeback, and linked dimensions. For SSIS, features that are not available in Standard are: fuzzy grouping and lookup, term extraction and lookup, dimension & partition processing destination adapters. And finally for SSRS: data-driven report subscriptions, report scale-out deployment, and infinite click-through in ad-hoc reports.

Some people have the impression that Developer edition can only have 5 connections. Perhaps this impression was caused by the restriction in MSDE in SQL Server 2000 where the workloads were throttled to 5 concurrent workloads. In SQL Server 2005 & 2008, the Developer edition has the same feature set as the Enterprise Edition; the only difference is the licensing policies.

Not exactly a short post, is it? 🙂

« Previous Page

Blog at WordPress.com.