Data Warehousing and Business Intelligence

1 December 2014

Using BI Tools as ETL Tools

Filed under: BI Tools — Vincent Rainardi @ 6:59 pm
Tags:

A few years ago a colleague said that sometimes users were only using the BI tool as an ETL tool, i.e. they use the BI tool such as Tableau and QlikView, just to get the data. Just to bring the data into Excel. Then they do everything in Excel.

Today I realise that it is true. For some users, they have all of their data in Excel, which they get from various sources. ERP, trading system, settlement system, reference data system, etc. And the data from data warehouse is just one additional source to them. If we give them the DW/BI data in SSAS cubes via Excel pivot table, great! They can “integrate” the DW data straight away.

But if we give them a BI tool such as Tableau, QlikView, Spotfire, Hana, Cognos, BO or Microstrategy, then they export the data into Excel and they use the data. This is particularly true in financial industry where the users are so advanced with Excel. Investment banks, retail banks, insurance and asset management, those are the industries where there are a lot of Excel wizards amongst the underwriters, claim adjusters, financial analysts, debt traders and fund managers. Yes, in financial industry, Excel is the defacto tool that every analyst uses to do their day-to-day job.

16 July 2014

iDashboards

Filed under: BI Tools — Vincent Rainardi @ 7:22 pm

A lot of people have heard about QlikView and Tableau, which in my opinion are the top two in dashboard software. But I found there are 3 dashboard software which are quite good, but much less well known: Dundas, iDashboards, Datazen. Today I would like to write about iDashboards, perhaps in other opportunity I’ll write about the other two. I want to write about them because I feel that these 3 software need more exposure in the BI community, because they are functionaly good, but most BI developers don’t know them.

What I write here is purely my opinion. I will start with what a dashboard is, and what features are important for them to have. Then I will be investigating iDashboards software with regards to the visualisation, development environment, data integration, security & collaboration, miscellaneous. After the conclusion, I’ll mention 43 other dashboard software which are outside the above 5.

What is a Dashboard?

A dashboard is a BI tool used for visualising and analysing data (BI tools which are not dashboard are reporting tools, OLAP cubes and data mining). Using a dashboard, business users can view their data at a high level to understand the trends and anomalies, then they can drilldown but still kept at high level (not at transaction level like in a BI reporting tool).

From the user point of view, a dashboards consists of a collection of controls i.e. widgets such as tables, charts/graphs (bar, pie, line, combi), maps, pictures, videos, gauges, thermometers, sparklines, KPI indicators (traffic lights, smileys, thumbs up/down, arrows), clocks, heatmap, dropdown, button, texts, labels, tabs, scrolling bars, sliders.

From the developer point of view, a dashboard system (I’m talking about a good one here, one which has undergone at least 5 years of evolution) consists of a development environtment (IDE = Integrated Dev Env), a database and a rendering engine. When we create a dashboard, we use the IDE to combine all those controls, connect them with our source database (or XML/text files, Excel spreadsheets, cubes), define KPIs, setup interactivity and drilldown path. When we click save, all those work is then saved into an internal database containing 40-50 different tables (controls, KPIs, users, dashboards, chart settings, permissions, data sources, formatting, hierarchies, tabs, etc). We then click view (or publish) and the rendering engine then read those tables and display the dashboard on the screen. The rendering engine also responds to mouse hover and mouse clicks from the users, showing different things and different data accordingly.

A scorecard was a particular type of dashboard which doesn’t have interactivity. It is a static dashboard where people can see the high level scores of snapshots of each key areas of the business so that at a glance, the business users will be able understand the status and position of their businesses. But that was 10 years ago. Now a days scorecards are interactives and users can filter, drilldown, and highlight certain areas of the scorecard; no difference to any other dashboards, apart from their function which is to report a particular state and status of the business.

What features does a Dashboard need to have?

A good dashboard will have these features:

  1. Very slick, modern looking visualisation controls with good interactivity abilities, including drilldown/up, hyperlinks, mouse click/hover.
  2. Good, easy to use development environment, complete with version control, layout tool, calculation, KPI, undo, scripting language, migration tool.
  3. Data sourcing: fantastic ability combine data from different tables and files (mesh up), i.e. multi-table, multi-column joins, filtering, sorting, grouping, XML reading, executing SPs, able to accept parameters and variables, data type conversion.
  4. Collaboration: different users see different data depending on their permissions, users can copy and modify dashboards, saves and share their version within their team, collaborate and share comments, single sign on, integration with SharePoint, templates, export/import, printing, alert.
  5. Architecture: rendering engine, web-based platform (thin client not Windows-based app), mobile access, web server, database storage, API, custom extension, flexibility.

Dashboards are used by 3 groups of users, each using the dashboards in different ways:

  1. Administrators: install the software, create data sources, create user groups, manage user permissions, manage licences, upgrade and patching.
  2. Developers: create KPIs, charts, join data tables, feed to controls and compose them into dashboards.
  3. Business users: view the dashboards, modify and save them, share their comments within their teams.

In the next 5 sections I will describe how iDashboards is doing with the 5 dashboard features above: visualisation, development environment, data sourcing, collaboration, and architecture.

1. Visualisation

iDashboards has a stunning visualisation front end. If you download and install the evaluation copy, you will be able to see various controls and images such as thermometer-like controls, speedometer-like (gauges), sliders, bar charts and line charts, maps, clocks, cones, KPIs, bubble charts, month-chooser, calendar controls, videos, sliders, colourful pie charts and ring charts, pyramids, butterfly charts, calculators, data tables, buttons, dropdown lists, rolling marques, Gantt chart, heatmaps, count down timer, images, videos, two-axis charts, traffic lights, texts and labels.

The interactivity is basic but useful, i.e. filter charts based on input values (parameters), choosing an specific data in one control limits the data in other controls, we can drilldown and go back up, when we click on a chart it drilled own to lower level chart or data tables, if we hover the mouse it explains the numbers, we can choose the value which act as a parameter using a dropdownlist, calendar input or selecting a particular section of a chart which will automatically select the corresponding part in another chart.

I said basic but there is not much more interactivity you can expect to get from a dashboard. So from that angle I think it is quite good. Plus we still have the following interactivity features: we can add/open bookmark, enter dashboard parameters, print the dashboard, change language, change skin colour, create alerts, receive alerts by email, change zoom levels, import and export to Excel.

2. Development Environment

I feel that iDashboards is designed with the business users as the developers. It is not difficult for business users to create a dashboard. It is almost as easy as creating charts in Excel: choose a layout, create charts, customize a little bit and we are done. Unfortunately the development environment has a non-intuitive user interface. For example, to discard a dashboard, we click on the left arrow, and to return to the application we click on the right arrow. Another example: to set the thickness of the border, we use a slider. It is not the standard Windows interface like what we find in Excel or PowerPoint. Throughout the development process I don’t feel comfortable with the user interface.

Putting a chart into a dashboard is quite easy, i.e. using drag-and-drop. We can read data from the output of a stored procedure, Excel file, text file, or database tables and views. We can create various charts, set the title size, colours, number formatting and margins. We can define legends and define drilldown to a different chart, different dashboard or a URL. We can create hierarchy for drilldown, we can pass a parameter when drilling down, and we can create macros.

Users can create reports based on the dashboard. The report can be in HTML format or in PDF. Reports contain the data snapshot which was in a particular dashboard at a particular point in time. Again, this “create report” feature seems to be intended for the user community, rather than IT developer. User can create reports for a particular chart, or the whole dashboard. This feature is very basic, almost like printing facility (to a PDF).

We can build custom application and place it into our dashboard. It is called VIPanel, stands for Visually Interactive Panel. In iDashboards Support’s website, developers can download VIPanel applications, along with their documentation. But VIPanel does not work on iOS (understandably as it’s iDashboards is built on Flash) and may not work on Android.

Unfortunately there is no facility for source control (check in, check out, version control), a feature which is important if more than 1 developers/users are working on the same dashboard. But there is a feature for migrating a dashboard from Dev/UAT environment to Prod environment and vice versa. We can export any chart or dashboard, from one environment to another.

3. Data Sourcing

iDashboards read RDBMS tables via ODBC or JDBC. It also read Excel. It can read SQL Server using native driver (version 6.5 to 2014), but the UI is not user friendly, i.e. we need to type server name, database name, instance name, port number, and table name pattern. It is not obvious that iDashboards can join multiple data sources. It seems (very vaguely) that using the “Custom SQL” data source we can join multiple tables, but it is not obvious nor certain. But as a work around, joining tables is not possible, we can create a view on the RDBMS. We can (definitely) use stored procedure as a data source, and when doing so we can supply parameters. We can create a custom drop down list for the users to choose as a parameter.

iDashboards has 3 data types: string, number and datetime. Data from RDBMS and spreadsheets are mapped into there 3 types. I don’t see that iDashboards can read XML file, with or without XSD/schema. It can not ready Analysis Services cube, or any other OLAP cubes. It can only read tables and files. There is a facility to pivot data, which I found very useful when I was developing dashboards using Spotfire.

User can create calculations, for example: colum1 / column2. The functions are very basic. Only a few functions are available: min, max, avg, total, median, standard deviation. iDashboards also have if statement to enable branching. We cannot filter on a calculated column.  We can read text file, or stared procedure. iDashboards seems to have quite an extensive range of date function (they are called “date macros”), from “current month” to “beginning of current month + 3, MMMM d, yyyy” which means if today si 15th July 2014 it is “October 1, 2014” (3 months from now, the first day).

4. User Collaboration

iDashboards’ security is Single Sign On (SSO) friendly, meaning that users don’t have to login to iDashboards. When users access the dashboard, the dashboard server automatically recognise who the users are, and whether they are allowed to access that dashboard or not. “who can access what” are stored in iDashboards database in SQL Server (or Oracle, DB2, Sybase). This database is called the “repository”. Different users see different data depending on their permissions. Depending on their roles, users can copy and modify dashboards, saves and share their version with their colleagues. A dashboard consists of several charts. Charts and dashboards are grouped, and the users are grouped, then the permissioning assign which user groups can access which dashboard groups.

In iDashboards we have a function to get the current user, which is very useful pass into RDBMS to get “different data for different users”. The “Filter on user” facility is more direct: we set the chart itself to present different data to different users.

In iDashboards, users can not write comments about a particular chart within a dashboard, and collaborate using these comments. There is no integration with Sharepoint, i.e. the permissioning model does not inherit Sharepoint security, such as folders and sites, so users can’t collaborate the sharepoint way, i.e. via webparts. Still about collaboration, in iDashboards we don’t have “corporate templates”, i.e. where all standard styling are stored to ensure that they adhere corporate standard. All standard appearance and stayles need to be implemented manually on each dashboard.

Excel is always a big topic in the user community when it comes to collaboration. “Can I export my dashboard to Excel?” is the most frequent question every BI user ask (because they want to share the data with their colleagues using Excel, merged with other data that they have). “Can I import my Excel file into my dashboard” is the 2nd most popular question. In iDashboards the answer is Yes to both questions. A dropdown list in a dashboard can be populated from a table in Excel, and the chart data can be sourced from a table in Excel. We can right click on any chart, and export the chart data to Excel. It is not native Excel format, but CSV, which can be opened in Excel.

Printing: users can print a single chart or the entire dashboard. Most dashboard software are web based and we can’t rely on IE printing feature, because this will include the menu bar of the browser. So an important question to ask is: “Does the print feature prints just the dashboard/chart, or the browser window?” The answer is just the chart/dashboard, not the browser window. iDashboards has “Print Dashboard” button which prints just the dashboard.

5. Architecture

iDashboards, like all other good dashboard software, consists of 4 components: a repository/database, a development environment, a dashboard engine/server. On the server side, we have the repository (which is on SQL Server or Oracle), the dashboard server (which is on Java web server such as Tomcat or Websphere), and connections to all the source systems. On the client side, we have the users PC, and in this PC we have a browser (IE, Chrome, Firefox), and in this browser we have the Flash, and within this Flash we have the dashboards. On the admin PC, we have the admin application, which is an HTML application.

It needs a browser with a Macromedia Flash add-on installed. This is not a problem on PC, but it is a problem for tablets and mobile device. Flash gives iDashboards an edge with regards to appearance, but it does limit the platform it can operate on. There are mobile/tablets with Flash, but it’s limited.

Conclusion

As always there are always pluses and minuses. The top 3 strengths of iDashboards are: visualisation richness, self service (easy to use by business users), and I believe the cost as well. Its top 3 weaknesses are data integration capability (ability to combine data and perform calculation), development environment which looks outdated, it is too simple/basic.

Would I recommend it? Yes, if your requirements simple / not too complex and if self service is very important to your business users (they want to create the dashboard themselves). No if you need to read cubes, or perform complex calculations.

Beside the 2+3 I mentioned at the beginning of this article, there are also 5 other dashboard software which have strong visualisation ability: MicroStategy, Birst, Roambi, Klipfolio, Inetsoft. Outside these ten, if you want a comprehensive list of dashboard software and you would like to evaluate them yourself, they are: Prognoz, Targit, Actuate BIRT, LogiAnalytics, Cognos Insight, Spotfire, Datameer, Information Builders, Cognos Express, PowerView, SiSense, BO Exelcius, SAS, Panorama Novaview, Jaspersoft, Salient, Board, Strategy Companion, BO Webi, CubePlayer, XLCubed, BO Voyager, PowerOLAP, Bitam, VisualMining, PureShare, Varigence Vivid, dminebi, Arcplan, Gooddata, InforSense, Pentaho, Alteryx, Zoho, Domo, Xtraction, Target Dashboard, Cartodb. Some of them are not just a dashboard software, but a BI software which has a lot of other capabilities apart from just creating dashboards. Most of their websites can be found by googling “Dashboard + their name”, but if you need their URLs please let me know. A good source for finding dashboard software are Dashboard Insight and Dashboard Spy.

17 December 2012

Composite

Filed under: BI Tools — Vincent Rainardi @ 8:19 pm

VR, Update 11/10/2015: Composite is now formally called Cisco Data Virtualisation Suite

Composite enables us to integrate data from various sources to provide BI. It is called data virtualisation (DV). It’s the exact opposite of a data warehouse (DW), where we collect data from various sources and store it in a new data store.

The biggest advantage of using Composite is its speed to market. Without building a data warehouse, we can provide the BI in 2-3 months, as opposed to 12-18 months. We don’t need to design the DW (3-6 months), we don’t need to design the ETL (6-9 months), and we don’t need to spend a lot of time testing the ETL (2-3 months).

As a DV tool, Composite is very good. It has it’s own SQL-like language. It has many functions. It has a adapter to most RDBMS as well as Hadoop and Teradata. Also flat files and Excel. We can create views and procedures which integrate many sources. We can cache them, so that the BI tool can experience fast query performance. The cache can be a database table, or we can also cache it in Hadoop, so that the query is super fast.

Rather caching the data, we can also build loading procs to retrieve data from the source system incrementally. This way, we lighten the burden to the source systems. We can also setup various triggers to invoke those loading procedures either on timer basis or event-based. And we can create a “batch” procedure which calls the loading procedures one-by-one so there’s no “waiting and delay” which happens if we arrange them on timer basis.

On the publication side, there is a facility to publish the views and procedures that we created as web services. These can then be consumed by BI tools, for example by Spotfire Information Designer.

That’s the strong points. What’s the weakness? The scripting language is primitive. Compared to Java, C# or even PL/SQL there are a lot of gap to close. Data manipulation functions are poor. In this area, Informatica functionality is far more superior. Programming environment is poor. It’s like going back to 1992, no debugging, no break point, no watch.

The database function is also weak. For example, we can’t truncate a partition. Something that is very basic in data loading. Data inspection (and DQ in general) is also weak. We can’t easily generate the statistics of the source data, e.g. distribution of values.

Would I recommend Composite for BI projects? Yes of course. Because it provides an excellent easy-to-use tool to integrate data from many places, to be consumed by the reporting/BI tool. It can cut development time significantly, as in 3 months instead of 12 months.

Disclosure: I receive no financial benefit from any company saying/writing this. The opinion is entirely mine, and it doesn’t necessarily reflect opinion of my employer or my clients. I have used both Informatica and Composite, hands on, as well as SSIS, BO, Oracle, Spotfire. As always I may be wrong and I’m open to suggestion. If I said something incorrect above, I would be happy to be corrected.

Vincent Rainardi, 17/12/2012

2 December 2012

Tibco Spotfire

Filed under: BI Tools — Vincent Rainardi @ 12:49 am

Like QlikView, Spotfire provides capability to serve BI users with analytical reports, with which users can slice and dice the data, export to Excel/PDF, in various different types of visualisations including pivot tables and charts.

Unlike SSAS, Spotfire don’t require an OLAP cubes to be built first. We define a set of data tables which are then used by the visualisations.

Spotfire has a good back end capability. It’s called Information Designer. In there we can create columns and procedures, from which we then create Information Links which feed the data to the reports. Info Links accept parameters and filters, which make them flexible enough for retrieving only the data we need.

Most of Spotfire developers time is spent on 3 things:

  1. Create information links (20%)
  2. Create data tables (30%)
  3. Create visualisations (30%)

Once Info Links are setup, they usually repeat 2 & 3 with the users in a loop / iterative way. Occassionally, they also do some automation (using Automation Services) and administration (arranging library content/folders and administration users/access)

Spotfire also have a development kit, which is in to Visual Studio (using .NET language such as C#), where we can do programming to add customisation to the visualisation and data tables. Alternatively, we can also do the programming inside the Spotfire itself, i.e. as a code behind on a button in a text area, using Phyton as the scripting language. The scripting capability has variables, branching, looping as usual, and API access to all visualisation and data table objects within Spotfire.

As most BI tools, Spotfire is a rapid development tool. An analytic report is typically developed within a week. That’s from the point where we sit down with the business (or BA) detailing the requirements, until we present the first draft version back to the business. From then of course there are many iterations to fine tune it until we get the finish products. This fine tuning typically takes 2-3 weeks, depending on how wide the business community involved and how fast/slow their responses are. And of course it depends on how complex the report is. A 1 pager with 2 visualisation and 3 data tables is about 2 days development time. Four pages with 7 different visualisations and 15 data tables takes about a week.

Most of the issues in Spotfire development lies in the data tables. This is not a thing that is specific to Spotfire. It is a general phenomenon in BI. The data is a very difficult thing to get it right. The reports on the other hand, are easy. They just are reflecting the numbers in the data. If somebody can guarantee that the data in the database/data warehouse is 100% correct and accurate, then to make the report would be accurate. I would say that 90% of the BI effort goes to prepare a database that is accurate, and only 10% goes to the reporting.

Overall Spotfire is a good BI tools for presenting data to the users and let them analyse it. It has Web Player and it has Enterprise Player. Web Player (WP) runs on web browsers / thin client and EP is a thick client. Like other BI tools, the thick client has more features than the Web Client. Your slice and dice capability is more limited if you use Web Player, for example, you can’t change the axes.

Strengh of Spotfire? I would say it’s the backend, i.e. the Information Designer and Data Tables. Also in the calculated column, there are so many things we can do there, very rich functions. Also strengh is the API. Weakness? It’s the visualisations in my opinion. For example, when arranging visualisations on a page, it’s unbelievably difficult. Bar chart: we can’t put the numbers on each bar. Using filters to slice data is difficult. Tableau and Strategy Companion is a lot easier to use when it comes to slicing and dicing the data.

Would I recommend it as a BI tool? Absolutely. Along with QlikView, I think Spotfire is one of the best visualisation/reporting tool in the BI world.

11 July 2012

Business Objects Voyager

Filed under: BI Tools,Business Intelligence — Vincent Rainardi @ 1:00 am

What is it?

BO Voyager is a tool for “browsing” cubes. It can read 3 OLAP cubes: SSAS, Hyperion Essbase, SAP BW (link, page 52). DB2 OLAP (8.2) uses the same provider as Essbase 7 hence Voyager can open DB2 OLAP. Oracle OLAP can only be accessed using Simba MDX provider (link), not native. SAP BW (or BIW) can only be accessed using BO Integration Kit for SAP.

The formal name of Voyager today (10/7/12) is “SAP BusinessObjects Analysis, edition for OLAP” (link). The user guide is here, the admin guide is here, the product sheet is here.

There are 2 versions of “BO Analysis”, the other one is “edition for Microsoft Office”, which is an Excel plugin (link). This is the new name of Business Explorer Analyzer (link).

This is how Voyager looks like: (link, page 51)

What is for?

Voyager has been used for many years (probably more than 7 years) to browse SSAS cube. Now that Excel can browse SSAS cube, the use of Voyager for browsing SSAS cube is a question mark. Tableau and XLCubed have better capability than Voyager. We can’t even create a hyperlink in Voyager.

Unlike Tableau, Excel or XLCube however, Voyager can browse Hyperion/Essbase and SAP BW as well. Small companies usually only have 1 OLAP platform. But large corporation like global banks usually have multiple OLAP platforms and this is where Voyager kicks in.

Apart from “browsing”/querying the cube and creating charts from it, we can use Voyager to do the following:

  • Highlight exceptions e.g. make the cell red (or any colour) if the number is bigger or smaller than certain limit.
  • Sort a column or a row, i.e. ascending, descending or hierarchies.
  • Filter a column or a row, i.e. display only top 10, display only where revenue > certain threshold.
  • Do calculations, i.e. sum, sumproduct, +, -, /, x, % of total
  • Useful OLAP functions: Prior period, period to date, moving average, rank.
  • Link to BO report (but not to URL)

Is it good?

No. In my opinion, Voyager is a last generation product. It feels like ProClarity to me. It needs a lot of investment to beat modern BI products such as Tableau, XLCubed, Qlikview, Roambi and Spotfire.

It provides basic functionality to explore Essbase & SSAS cubes, but nothing more. If this was 2006, Voyager would have a chance. But in 2012 where the BI tools have grown strong in competition, there’s no chance.

Prominent areas for improvements:

  1. Port to true thin client i.e. HTML5/Javascript
  2. Visualisation e.g. chart, dashboard
  3. Interactive ability, i.e. dashboard, drilldown, map, alert
  4. More integration with SSAS & Essbase, e.g. displaying description of measures & attributes, drilldown, KPI
  5. Scripting to support queries
  6. Automation and scheduling
  7. Mobile platform i.e. iPad, iPhone, Galaxy, Android, BB
  8. Cloud interface (S/I/PaaS), stream data
  9. Big data integration, i.e. Hadoop

10. Silverlight integration

11. Data mashup integration, e.g. composite

SAP is a giant corporation with good R&D. I admire Hana, which is a modern (even future) platform, i.e. in-memory database & analytics. I wrote an article about Hana (link). I also admire BO Webi, which in my opinion is the best in its class i.e. dynamic reporting. BO functionality is a lot richer than SSRS Report Builder.

But I don’t admire Voyager. I believe SAP has the capacity to improve it, which is why I took the time to write the 11 points above. It needs good investment to match today’s standards. SAP is one of the top 4 giants in BI, along with Oracle, Microsoft, SAS and IBM (link). SAP has the financial & technical capability to refurbish Voyager to be one of the best analytic tools.

7 July 2012

SAP Hana, an in-Memory Database

Filed under: BI Tools,Business Intelligence,Data Warehousing — Vincent Rainardi @ 7:19 am

What is it?

Hana is an in-memory database from SAP. It enables us to do lightning OLAP analytics from relational databases. We can access Hana data using SQL, MDX or from BO. Or from Cognos, Report Builder and PowerPivot. We can create calculated attributes and measures. We can create our own data model and hierarchies.

A must read technical introduction from SAP is here, the official web site home page is here, and the technical operation manual is here, the data modelling manual is here (very bad, no screenshots at all) and the Wikipedia page is here.

HANA stands for HAsso’s New Architecture. Hasso Plattner is a cofounder of SAP.
HANA also stands for High performance ANalytic Appliance.
Hana is also a name, rather than acronym.

Actually Hana is not just an in-memory database. It is also an analytics appliance.
Cisco provides a Hana appliance on Intel Xeon 7500 (link).
HP provides a Hana appliance on ProLiant DL580 and DL980 (link).
IBM provides a Hana appliance on x3690 X5 and x3950 X5 (link).
Violin provides a Hana appliance on Intel Xeon 5600 (link).

HANA is also a cloud application. It is a PaaS, Platform as a Service. EMC and Cisco provides a Hana could app on Cisco UCS and EMC VNX 5300 (link).

Hana was released in Nov 2010. Version 3 was Nov 2011, and version 4 will be in Q2 2012.

How does it work?

Hana has a parallel calculation engine. Hana takes in a SQL query and run the query in parallel across many partitioned data sections in memory, and across many multicore processors.

Hana also has business functions to speed up calculations, for example currency conversion and calendar functions, as well as predictive analysis library.

Hana database is both row based and column based. If a calculation involved only a few columns, the data columns are stored in columnar store (compressed) in memory. Hana also support temporal tables which speed up queries for historical data.

The key to Hana speed is the first sentence I wrote above: in-memory database. Memory is 1000 times faster than disk, and these days we can have a server with 2 TB of memory.

But that is the story of the past. Everybody now store in memory. What Hana has that the crowd doesn’t is the technique to load data from memory into CPU cache. To partition a query and data, and load them into different CPU cache so that each CPU can process them in parallel. That’s what differentiates Hana from the others.

This is how Hana works: (from the must read technical introduction that I mentioned earlier, link, page 10)

Hana does parallel SQL execution and parallel aggregation. These parallel execution and aggregation happens on multiple threads. Each thread has a hash table which stores the aggregation result. These hash tables are then merged by the merged threads, as we can see below: (from the technical introduction doc, link, page 12)

What is it for?

How do we use Hana? What can we use it for? Using Hana, we don’t have to create summary tables in the data warehouse. This removes the basic need for a data warehouse. We need a DW to aggregate data fast. Now we can do it using Hana. This also eliminates the need for having OLAP cubes. We use OLAP cubes such as SSAS to aggregate the data fast. Now we can do it using Hana.

Still need a SQL interface though, which is a no go-er from the start. Who wants to type in the SQL queries? Definitely not the business users! Hence we need a BI tool, such as BO Universe, Cognos, SSRS Report Builder and PowerPivot. The major weakness of BO Universe, RS Report Builder, Cognos and other flexible reporting technology when operating on a DW is that it is slow. Because the DW is disk based. Now that we have Hana, an in-memory database which is both row-based and columnar (and temporal!), these flexible reporting tool will be incredibly fast.

4 January 2010

Comparing Excel 2007 and ProClarity

Filed under: BI Tools — Vincent Rainardi @ 9:03 pm

I use Excel 2007 and ProClarity to browse SSAS cubes. This is what I experienced.

  1. In ProClarity you can put measures on the background so the user can pick which measure they want to use. In Excel 2007 you can’t; you can only put measures on the column or row.
  2. If an attribute has 5 members, in ProClarity you can select [member3, member4 and All]. In Excel 2007 you can’t; you can only select All, or select several members, but not both.
  3. If an attribute has 100,000 members, in ProClarity you can search for a member, to display only a certain member in the row. In Excel 2007 you can’t; you need to display all 100,000 members, then deselect All and select the one you want to display. Or, highlight several members, right click, select Filter, choose Keep Only Selected Item. But you can’t search without first displaying every single member, which can take a long time.
  4. In ProClarity you can create a set of members. You can either pick the members manually, or define criteria for inclusion/exclusion using Selector. In Excel 2007 you can create a group containing several members. You can only pick the members manually. You can’t define criteria for inclusion/exclusion. Note on performance: creating a set containing 5 members out of 50 by picking them manually takes 1 second in ProClarity. In Excel 2007 it takes 55 seconds.
  5. In both ProClarity and Excel 2007 you can link to a URL (action). However, I found that if we need to login to reach that URL, Excel 2007 failed to pass the parameter to the URL, so it landed on the front page of the app. Where as in ProClarity, after logging in it open the appropriate page.
    Note: after using ProClarity, if we use Excel 2007 again, Excel will open the correct page.
  6. In SSAS we can have attach an action to the measure: target type = Hierarchy members, Target object = measures. For example, the action expression could point to a URL explaining the meaning of the measure. In ProClarity, we can right click the measure name and select the action. In Excel 2007, you can’t do it; the action is greyed out.
    Note: If the action is attached to Dimension members, Attribute members, or Hierarchy members, it works on both Excel 2007 and ProClarity.
  7. In ProClarity you can drilldown: right click on a member, select Drill Down, and then select the attribute. In Excel you can’t. You need to manually move the attribute to background, set the background attribute to a certain member, and then put the attribute you want to drill down on the row or column.
  8. In ProClarity you can create a calculated measure, such as ratio and percentage difference (A/B and (A-B)/A). Date-based calculation is also available, such as year-on-year growth, rolling average and year-to-date totals. We can then use this measure just like a normal measure, e.g. we can drill down on any attribute. In Excel 2007 we can’t define a calculated measure. If you format the area as a table, and define a new column on the right of the pivot table, the column will just be calculating the cells. It is not a measure. You can’t slice it.
  9. In ProClarity there are no row headers. If you put 5 attributes on the Rows: order number, date ordered, date delivered, date invoiced, date paid, you will have no idea which dates are on which columns. Viewing it in ProClarity is not too bad, as you can see the attribute names on the Rows. But when you export the page to Excel, there is no indication all which attributes were put on the Rows. See below. Column attributes are fine by the way (measure1 to measure5 below).

    In ProClarity, Flattened Row Headers (on the Grid Properties) has nothing to do with row header. It’s when you put a hierachy (rather than an attribute) on the Rows and you want the value to be displayed not like a tree, but left justified.
    In Excel we have row header. So we can clearly see which attributes are on which column.

10 December 2009

Qlikview vs PowerPivot: connecting to SSAS cube

Filed under: Analysis Services,BI Tools — Vincent Rainardi @ 11:22 pm
Tags: ,

In QlikView we can’t connect directly to SSAS like in PowerPivot. We need to create a OLE DB linked server first, then use OpenQuery. So the output is flattened rowsets. In PowerPivot we can specify MDX statement and get cellsets e.g. they have axis. For differences between flattened rowsets and cellsets, see this Brian Welcker’s post.

PS. For normal Excel table or pivot table (not PowerPivot) we can specify MDX query by editing the .odc file, see Chris Webb’s post here.

8 December 2009

PowerPivot

Filed under: BI Tools — Vincent Rainardi @ 10:45 pm
Tags:

PowerPivot is an in-memory OLAP, like TM1 from IBM and QlikView from QlikTech. It is a SQL Server 2008 R2 add-on installed on Excel. Like Analysis Services, using PowerPivot we can create a ‘cube’ in Excel, which we can then slice and dice. The data for this ‘cube’ can be from database tables, Excel files, SSAS cubes or a combination of them. That, I think is the most important feature of PowerPivot: it allows us to combine data from databases, Excel and cubes so that we can analyse the data.

Users opens Excel, fill in a few sheets with data (or import it from database or files), then build the ‘cube’. It looks like the normal PivotTable, but it has slicers, which enable us to ‘cut’ the cube. The ‘cube’ can be published on Sharepoint so other users can ‘browse’ it. It is striking how similar that ‘cube’ to a pivot table. Under the concept of self-service BI, PowerPivot is aimed not for IT department, but for business users (data analysts/power users), e.g. MIS / MI department. It is so important I will repeat it here: PowerPivot is not for BI developers, it is for business users.

The ‘cube’ is called “PowerPivot pivot table”. To contruct a PowerPivot pivot table we need to a) create some tables in Excel, and b) relate those tables. A ‘table’ in Excel, is an area of cells that we mark as table. There are 2 ways of creating a ‘table’ in Excel: a) type it in Excel then ‘format as table’ (it’s in the Home toolbar), or b) import from database, file, or other sources.

I am of the opinion that PowerPivot will increase the demand for SSAS, rather than reducing it. This is because:

a) it encourage users to create OLAP and use it. Then after a few months, there will be a mess: cubes everywhere. And there will be a need for project to pull that scattered mess into a big SSAS cube (or two, or three).

b) because you don’t need to spend 6 months (or a year) to build a data warehouse. If the users have the data in Excel, they can immediately analyze it with PowerPivot. They can even combine it with data from any database. How powerful is that? This will only go to 1 direction: more need for a data warehouse. Imagine a company with a hundred data islands, from New York to Singapore, in the form of PowerPivot tables and charts sitting on users’ Excel files. As sure as eggs is eggs, they have a greater need for a data warehouse. Then naturally, SSAS cube follows. Sitting on top of the DW. Accessed via http from everywhere, from New York to Singapore.

c) because of the limitation of PowerPivot. There is no hierarchy. If you can’t compose a hierarchy, how do you drill down? True, we can create calculated member using DAX, but the capability is very limited. As soon as this limit is reached, there is will be a conversation similar to this: [user to IT]: how can I do this in PowerPivot? [IT to user]: eerrm, you need MDX.

PowerPivot is deliberately intended/targeted for business users, probably with no background on warehousing. People who never heard about Kimball or Inmon (that debate is really getting more religious than rational isn’t it?). Listen well to this: “Create a lookup relationship between two tables”. That’s the terms PowerPivot use to refer to the relationship between fact and dimension. Perhaps it’s me who got it at the wrong end of the stick: I have a data warehousing background, but PowerPivot is not for DW. It can take any tables, not just fact and dim. (And so can SSAS btw – see my post here). If I do get it wrong, please remind me, correct me (vrainardi@gmail.com, or via comments). I’d be happy to be corrected.

Below I’ll illustrate how to compose a ‘PowerPivot pivot table’ — aah can’t we have a shorter name? Four words! Sod it I’m going to use the word cube, to differentiate to normal pivot table. To get a flavor of what PowerPivot is, I’ll illustrate how to compose a cube in PowerPivot, from Excel data. Of course the source can be replaced by query from DB tables, etc. Please don’t think it as a ‘how to’ but to enable us to see in a glance what it is. For ‘how to’ there are lots of resources, please refer to the links at the bottom of this post.

First, install Excel 2010 (download from here), then install PowerPivot (download from here). Open Excel, create some data, like this for example:

Format each ‘table’ as table by clicking on ‘Format as Table’ on the Home toolbar. Then, PowerPivot tab (see below) and for each table click on Create Link Table, which means ‘make this Excel table available in PowerPivot’.

Then we click on ‘PowerPivot window’ and rename the tables at the bottom of the screen: (this is so that we can identify which is which when we create the pivot table later on)



Then create relationship between the tables by clicking Table tab, Create Relationship:

PowerPivot resources:

  1. PowerPivotPro: one of the first sites on PowerPivot. Contains introduction to PowerPivot, case studies, examples, and lots of blog posts by Rob Collie.
  2. PowerPivot-info.com: the most comprehensive portal about PowerPivot containing links to (almost) every single articles, post, news, author, books, etc about PowerPivot. Thanks to Vidas Matelis for creating this site.
  3. PowerPivot.com: Microsoft site. We can download the November CTP version of PowerPivot, try it in the virtual lab, and watch the video clips.
  4. PowerPivot on Excel team blog: Introduction to PowerPivot by Excel team.
  5. Bob Duffy presented PowerPivot at SQLBits5. His presentation is here (2.5 GB, pptx)

Create a free website or blog at WordPress.com.