Data Platform and Data Science

13 July 2023

It’s not BI any more, it’s Analytics!

Filed under: Business Intelligence — Vincent Rainardi @ 8:16 am

5 years ago everybody called it BI. Business Intelligence. It sounds great (intelligent, smart) and it has a great meaning too: it means that the company has business insight. They understand the trends happening in their business. Which customer segments need attention, which product lines are in demand. It means that the management knows how the business performs. Keep their finger on the pulse, as they say. And more importantly, that company is perceived by the market as a company that makes decisions based on data, not based on instinct.

In the job market, we say “BI Developer”. In the software market, we say “BI Tools”. It is called Power BI*. Tableau is a BI tool. In IT department we call it “BI reports”. BI Developers are people who develop BI reports, using BI tools.

Note: Microsoft is well known to name their products using English words with the same meaning. For example: Microsoft Word, SQL Server, SQL Data Warehouse and Power BI.

But today that word is obsolete. The right word is Analytics.

With the advance of AI, we now have 4 types of analytics (2D2P). They are: descriptive analytics, diagnostic analytics, predictive analytics and prescriptive analytics. The 2 Ds are the old BI (descriptive and diagnostic analytics). The 2 Ps are AI-based (predictive and prescriptive analytics).

What are those 4 types of analytics? Please read Ref #2 below which provides a good explanation. Basically:

  • Descriptive analytics: describes the past
  • Diagnostic analytics: explains why an event happened
  • Predictive analytics: what will happen in the future
  • Prescriptive analytics: actions we should take

The 3rd one (predictive analytics) is extremely valuable. It predicts the future. Which customers are likely to leave your business this year? Which products are likely to decrease in sales in the next few months? Every business wants to know the future.

One of the jobs in JobServe I’m looking at this morning is from a automotive retailer. They are hiring a data scientist to do machine learning forecasting based on the weather! It is important for retailers (shops) to forecast sales based on the weather, economic indicator (like interest rates), and other factors.

My dissertation is about stock forecasting. For investment companies it is important to forecast the value of their holdings based on various factors, including the past prices (it’s called time series forecasting, see Ref #3).

Another predictive analytics that banks and investment companies do is credit rating. Which companies will move from credit rating AA plus to AA minus in the next 12 months? Which industry sector will experience a general decline next year? You get the picture.

Descriptive and diagnostic analytics are BI. But predictive and prescriptive analytics are AI-based. BI explains the past, whereas the 2P analytics explains the future (using machine learning or AI). So it’s up to you, if you just want to explain the past, use BI. But if you want to predict the future, use the 2P. So when you say BI, it means just the 2D. When you say Analytics, it means all four. And in many cases what we mean is all four. That’s why we should say Analytics, not BI.

So there you have it. It is not BI any more, it is Analytics!

Reference:

  1. BI vs Business Analytics by Tableau: link
  2. Four Types of Analytics by Analytics8: link
  3. Complete Guide to Time Series Analysis and Forecasting by Marco Peixeiro: link

1 November 2022

Power BI Datamart

Filed under: Business Intelligence,Data Warehousing — Vincent Rainardi @ 8:39 am
Tags: , ,

Like SQL Data Warehouse, Microsoft once again named a product by what it does: Power BI Datamart. It is an Azure SQL database, with a visual query designer so users don’t need to write SQL to get the data out to BI tools, Excel or ML algorithms. It also generates a Power BI dataset automatically, which we can use to create Power BI reports.

Loading data into Datamart

Power BI Datamart uses a built-in Power BI Dataflow for loading data into it. After we create a Datamart, we click on Get Data and use PowerQuery to connect to various data sources such as Azure SQL database, Azure Data Lake Storage, Databricks (Spark), Azure Synapse, HDInsight, Sharepoint, Excel files, text files, JSON files, Parquet files, API, Salesforce, Snowflake, Redshift, BigQuery, Oracle DB, IBM DB2, Teradata, PostgreSQL, MySQL, Access, SAP, Analysis Services and ODBC (link):

The above list of data sources is so comprehensive, I could not believe it at first. This is a game changer. Gone are the days when we struggling in SSIS, Informatica or ADF to load data from those varied sources. PowerQuery has all the connectors and interfaces to every single one of those databases, files, data lakes and big data.

We then select the tables, specify transformations and PowerQuery would build the ETL rules in that built-in Dataflow, create the relationships and load the data from those tables in the data source into the Datamart. We can then see all those tables and relationships in the Table Tools tab:

Afterwards, we can create relationships (again) and measures. We can also hide the tables and columns that we don’t need.

In the workspace we automatically get a dataset:

We can schedule when we want to reload data from those data sources into the Datamart (called Scheduled Refresh). And we can also setup incremental load, i.e. only changed rows are loaded into the Datamart, based on the column that we want (known as “watermark column”, link):

Security and access control

Power BI Datamart uses row level security and roles to restrict user access to the Datamart. In this example below (link), the users are restricted to only be able to access rows with category ID of 1 and 2. Once that role is defined, we can assign users to that role, as we can see below:

Querying data

As I said at the start of this article, we can use a visual query designer to get the data out without writing SQL (link). As we can see below, the Visual Query Designer is like Markit EDM, SSIS and Informatica.

We can select rows, filter rows, select columns, split columns, remove duplicates, join tables, change data types, transpose table and pivot columns. We can also do error handling, such as removing errors or replacing errors (link). And we can also create custom functions (link).

Of course we can also write SQL:

Accessing Power BI Datamart from SSMS

And we can also query the data in the Power BI Datamart using SSMS (SQL Server Management Studio). And using Azure Data Studio too (link):

Not only we can use SSMS for querying data in the Datamart, but also adding users and configuring security. And we can also use SSMS for looking at the query statistics. This is how it looks from SSMS (link):

We can see above that the Power BI Datamart tables are listed as views, under “model” schema. The relationships in the Power BI Datamart is stored in two views called metadata.relationships and relationshipColumns, as shown above (link).

Conclusion

I think Power BI Datamart is a game changer.

  • It enables us to gather data from databases, data lakes, files and APIs into one data mart.
  • We can reload data every day/week and we can do incremental loading.
  • We can query the data using SSMS (can also do admin stuff).
  • We can build Power BI reports easily (the dataset is automatically created).
  • We query the data using visual query designer or by writing SQL.
  • We can easily get the data out to BI or ML tools, and to Excel.

That is a very powerful tool. Anyone in data warehousing and BI should consider this tool. At least take a look.

And I think the name suits it well. The Power BI Datamart is indeed a good data mart. With all its bells and whistles.

27 June 2018

Can Machine Learning replace BI?

Filed under: Business Intelligence,Data Science,Machine Learning — Vincent Rainardi @ 6:50 am

In the last 20 years companies have been slicing and dicing data, looking for patterns in the data manually. They then use the patterns that they find to make business decisions. But in the last 2 years companies have been using machine learning to find the patterns in the data, and use it for predictive analytics.

Business Intelligence

This is an example of a typical BI dashboard for sales, source: https://www.cbsi-corp.com/software/business-intelligence-reporting. The dashboard presents the sales data. On the top it is sales and profit over time and by product. And on the bottom it is sales by salesperson and profit by customer segment and product group.

Example of BI Dashboard

Based on this data the company can make business decisions such as:

  • Increase or decrease the profit margin for a particular product group.
  • Focus the marketing effort on a particular customer segment in a bid to bump the sales.
  • Restructure the sales teams in order to improve sales performance.

So BI is supporting the management to manage the business better by making them better understand the current and past business situations.

Machine Learning

A typical machine learning is fraud detection. Thousand of transactions a day (say credit card or payment/bank transfer) are analysed by ML algorithm and a few which are suspected are automatically put on-hold and manually checked. Date, time, amount, customer profile, payee, and transaction/payment type are “features” of the neural network algorithm which scores each transaction, and transactions which over certain limits are put on-hold.

In this case ML is replacing BI. The ML system automatically does the actions, which is holding off the transaction, and putting it in the queue for manual review. The fraud analyst then review and check with the customer. They then release the transaction if it is genuinely initiated by the customer or block the transaction if it isn’t. BI can’t do this. We need ML to do this.

But can ML do the sales dashboard above? No it can’t. ML is build to analyse the current situation, but it can’t suggest that Corporate Gift is the one which we should focus our marketing effort on, not Mugs and Drinkware. There are a lot of information which are not included in the data, and it is this information that enables the business users to be superior than any ML algorithms.

Can ML replace BI?

So can ML replace BI? Only in certain areas I believe, but not in its entirety. As I wrote in my book there are 3 areas of BI: Reporting, Analytics and Data Mining. It is the Data Mining area which is being replaced by ML. But the reporting and analytics are not.

15 April 2018

BI vs Data Science

Filed under: Business Intelligence,Data Science,Machine Learning — Vincent Rainardi @ 5:57 am

Last night there was a talk about Business Intelligence (BI) vs Data Science at Dataiku London. Unfortunately I was not able to attend but that event inspired me to write about this topic as it has always been an interesting topic for me. And I think it is very fitting to this blog as it is about Data Warehousing and BI, and recently I added about Data Science too.

Definition

Before we go too much further let’s remind ourselves what BI and Data Science are. Data Science is about scientific approaches to manage and analyse data using statistics, machine learning and visualisation (link). BI (Business Intelligence) is also about managing and analysing data and visualisation, but using business approaches, not scientific approaches (link, link).

I have been creating business intelligence and analytics applications across many industries including travel, banking, pharmaceutical, manufacturing, insurance. Both with a data warehouse/mart, and without. They are used for analysing the business, for reporting, and for dashboarding.

I have done Big Data development for a few years, including Hadoop, Hive, Data Lake, Spark, graph and document databases. In the last 6 months I have been learning Machine Learning and Neural Network, using Jupyter, Matlab and R, and recently Tensorflow. I understand how they are used for visual recognition, predicting prices, network security, marketing and playing games.

This Article

I have written a few articles about “Choosing between Big Data and Data Warehousing” (link, link, link) which differentiate the back-end part of BI and Data Science. So in this article I would like to focus on the front-end part, i.e. how they are used.

One of the best approaches to compare the usages of BI vs Data Science is probably using use cases from a particular industry. So in a particular industry we compare what BI can do and what Data Science can do. For example, in insurance BI is used for analysing customers, earnings, risks, financials and claims, and for reporting (operational, management and regulatory reports). Whereas Data Science is used to forecast losses, income and profitability (both account and product), adjusting rates, classifying customers for renewal, and grouping potential customers.

If we dive into other industry, say travel, we will understand the differences between BI and Data Science a bit more. The more industries we study, the more we understand the differences of how BI and Data Science are used. One is using business approach, and the other is using scientific approach.

With the above background I think we are ready to dive into core of this article, so let’s start with the first industry: insurance. I will then do one more industry: travel. I am a beginner in data science and particularly in machine learning. Suggestions and advice from readers will be most welcome, with regards to both about the approach and the content at vrainardi@gmail.com or via the comment box below. Once again in this article I’m not discussing the how or technical aspect, but the usage or business advantages, comparing the BI vs Data Science.

Insurance

General insurance is about insuring buildings, ships and businesses from risks like storms, sinking and frauds. There are about 20 classes in general insurance including property, aviation, marine, construction, oil & gas, professional indemnity, public liability and riots. It is a little bit different from life and pension business, and from retail business, i.e. home, car, health, pet and travel insurance.

I guess in this example we need to pick one line out of the above 3 main insurance businesses, which branches into many lines. Let us use car insurance for this example, because it is retail business so many people can relate to, and it’s easier to understand. It has many attributes which provide good illustration for the BI and Data Science usage.

The main usage of BI in car insurance is to analyse premium income and claim costs. These two numbers (and many of their derivatives, i.e. net, gross, subs, etc.) are analysed with regards to vehicle, time, office and customer attributes. Two of most common methods of this analysis is by using OLAP cubes and dashboards. The dashboards presents the most important findings such as income growth by vehicle types and profitability by customer groups, with guided drilldown along chosen pathways. The cubes provide a free-to-roam exploration platform, enabling drilling any numbers to the lowest details on any available pathways.

The business values of these activities are from knowing when and where exactly the revenues and costs are coming from and what factors influence them. This knowledge gives the business greater control and ability to grow the profitability across all business segments. Understanding claims and premiums profile for each type of vehicle (age, van/SUV, fuel, gearbox, engine size, parking, usage, milage) and each type policy holders / drivers (location, years of licence, profession, accident history, health, penalty points) enables the business to target the sales and marketing budgets on the more promising customers, as well as pricing the policy at the correct level. It also enables more precise filtering with regards to following up the leads from the Aggregates such as GoCompare and MoneySupermarket, and from brokers.

The Data Science is used to analyse customer churns, establishing rates, and analyse black box telematics with regards to risks and impact to premium levels. Standard actuarial rates give the business non-competitive advantage as they use standard factors such as driver age, engine size, mileage and parking location. Data Science enables insurers to factor-in new attributes such as driving behaviour, driving routes and driver’s social media, calculating the underwriting rates more specifically and more precisely for each policy. This enables the business to win new businesses, both commercial and retail, as quotes/pricing is the most important factor influencing new customers. Also, machine learning is used to improve the efficiency of online advertising, by predicting which how many users (and of which types) would click-through on different types of ads and by optimising bidding.

So BI and Data Science give different business advantages. It is not true that Data Science will replace BI in car insurance business, and in insurance in general. On the contrary Data Science will complement the business advantages that BI currently delivers.

Travel

Selling flights, hotel, holidays and cruises are the core businesses of travel companies. Car hire, travel insurance, excursions, airport transfer, airport parking, tours, restaurants, attractions, shows, villa, currency exchange and rail passes are the secondary businesses. They have retail and commercial business lines, both heavily utilising internet as a marketing platform because it reaches many countries and very cost efficient.

The BI is used for analysing web traffic such as funnels and conversion rates, revenue breakdown, customer segmentation, customer loyalty programs, and marketing campaigns such as email alerts and newsletters. Funnels enable the business to understand how much traffic filters trough each step of the purchasing process, and at which page each customer stops, as well as the sequence of pages they viewed and for how long each. This ultimately enables us to improve the website content and the products, resulting in higher conversion rates. Revenue breakdown is by product, by customer types, by country, by platform, by campaign, and by time. Marketing campaigns enable the business to drive more sales, with the content of each email/newsletter tailored differently to each individual customer. Each image on the newsletter is uniquely tagged enabling us to track which click from which customer triggered each sale.

The business value of these activities are: increased revenue, wider customer base, and increased customer loyalty. The revenue is higher because of higher conversion rates on web traffic and because the marketing campaigns drive more sales to the websites. Also because as a result of breakdown analysis we can focus our budget and time on the more profitable and promising product types and customer segments. The customer base is wider because market penetration from the campaigns. Customer loyalty increased because the offer is tailored specifically for each customer segment, sometimes down to each individual customer.

The Data Science can be used for customer segmentation, predicting campaign results, and analysing web traffic. It seems overlapping with the above BI activities, but if we dive into the detail levels they are actually complementing. By associating real time web traffic data and past purchase data using machine learning, travel companies can find out which customers have the best potential for purchasing products (using cookies to identify customers). The learned parameters are then used to display offers on the web pages the customers are currently on. The offers are specifically tailored to the highest potential that they are interested, based on their browsing behaviours (current and past sessions).

Predicting campaign results is done by first tracking back each sale to find its origin and contributing factors. The origin can be a click on an image on a particular email sent by a particular campaign. This can be identified by the tag on that image, which has the customer ID, campaign ID and product ID. The origin can also be a Google or social media advert from a particular campaign that the travel company runs, giving us the ID of the ad and possibly the user/audience ID. The contributing factors of a successful campaign can be the product, creative, layout, timing, pricing, audience and origin. Knowing which factors contribute the most to the success or failure of each campaign can be very revealing, from which we can then improve the subsequent campaigns. Various machine learning algorithms can be used for this including support vector machine, random forest and neural network.

1 December 2014

Using BI Tools as ETL Tools

Filed under: Business Intelligence — Vincent Rainardi @ 6:59 pm
Tags:

A few years ago a colleague said that sometimes users were only using the BI tool as an ETL tool, i.e. they use the BI tool such as Tableau and QlikView, just to get the data. Just to bring the data into Excel. Then they do everything in Excel.

Today I realise that it is true. For some users, they have all of their data in Excel, which they get from various sources. ERP, trading system, settlement system, reference data system, etc. And the data from data warehouse is just one additional source to them. If we give them the DW/BI data in SSAS cubes via Excel pivot table, great! They can “integrate” the DW data straight away.

But if we give them a BI tool such as Tableau, QlikView, Spotfire, Hana, Cognos, BO or Microstrategy, then they export the data into Excel and they use the data. This is particularly true in financial industry where the users are so advanced with Excel. Investment banks, retail banks, insurance and asset management, those are the industries where there are a lot of Excel wizards amongst the underwriters, claim adjusters, financial analysts, debt traders and fund managers. Yes, in financial industry, Excel is the defacto tool that every analyst uses to do their day-to-day job.

23 November 2014

SQL Server 2014 Installation

Filed under: Business Intelligence,SQL Server — Vincent Rainardi @ 4:42 pm
Tags: ,

I’m installing SQL Server 2014 on a new laptop and it has not been an easy experience.

First of all, the Books Online is not included and when I tried to “Install Content from Online” I had the issue below (I include the solution).

Secondly, there is no BIDS! BIDS needs to be downloaded from this link, and installed into Visual Studio 2013. But my Visual Studio is 2015 (Preview), so I had to install VS 2013 first.

After installing VS 2013 and downloading SSDT BI (new name for BIDS), when I installed SSDT BI it said that the architecture was incorrect: my SQL Server 2014 was 64-bit, and the SSDT BI download page said that it could do 32 or 64 bit. Bit in fact, the SSDT BI installation file was only for 32 bit (SSDTBI_x86_ENU.exe). There is no SSDT installation for 64 bit, because Visual Studio is only available in 32 bit.

The trick is, as “g2server” pointed out here: when installing SSDT BI, on the Installation Type we need to choose “new instance”, not “existing instance” (even though you already installed both SQL Server 2014 and VS 2013).

All sorted now. Hopefully with this article those of you who want to install SQL Server 2014 documentation and SSDT BI don’t have to spend time as much as me.

Symptom: open SQL Server 2014 documentation, Microsoft Help Viewer 1.1 opened up, Install Content from Online, chose the content (SQL Server 2014 Books Online, click Update button. Help Library Manager said “Updating Local Library, Analyzing package delta” for 2 mins then displayed this error message: An exception has occured. See event log for details.

The Event log:

An error occurred while updating local content: Microsoft.Help.CacheLib.CacheLibBitsException: Retrieving the COM class factory for component with CLSID {4991D34B-80A1-4291-83B6-3328366B9097} failed due to the following error: 80080005 Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE)). —> System.Runtime.InteropServices.COMException: Retrieving the COM class factory for component with CLSID {4991D34B-80A1-4291-83B6-3328366B9097} failed due to the following error: 80080005 Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE)). at Microsoft.Help.CacheLib.Downloader.CreateManager() — End of inner exception stack trace — at Microsoft.Help.CacheLib.Downloader.CreateManager() at Microsoft.Help.CacheLib.Downloader.Jobs() at Microsoft.Help.CacheLib.Downloader.CancelAllPendingJobs() at Microsoft.Help.CacheLib.DocumentationCache.DownloadWebContent(IEnumerable`1 webDownloads, Int32 totalDownloads, Int32 completedDownloads, String workingDirectory, StatusCallback`1 statusCallback) at Microsoft.Help.CacheLib.DocumentationCache.Download(ChangeDescription change, StatusCallback`1 statusCallback) at Microsoft.Help.CacheLib.DocumentationCache.Update(VendorName vendorName, Boolean checkForOnlineUpdates, StatusCallback`1 statusCallback, CertificateCheckCallback certCheckCallback) at Microsoft.Help.CacheLib.CacheManager.<>c__DisplayClass24.<UpdateAsync>b__23() at Microsoft.Help.CacheLib.AsyncOperationRunner.Run(Object state)

Source: https://support.microsoft.com/kb/870655?wa=wsignin1.0

In Register, HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems:

%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,768 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=sxssrv,4 ProfileControl=Off MaxRequestThreads=16

Added 512 like this: Windows SharedSection=1024,20480,768,512

Still the same error.

Solution: download 2012 and 2014 documentation mhsa files from

SQL Server 2014: http://www.microsoft.com/en-gb/download/confirmation.aspx?id=42557

Product Documentation for Microsoft SQL Server 2014 for firewall and proxy restricted environments

SQL Server 2012: https://www.microsoft.com/en-us/download/confirmation.aspx?id=347

Product Documentation for Microsoft SQL Server 2012 for firewall and proxy restricted environments

16 July 2014

iDashboards

Filed under: Business Intelligence — Vincent Rainardi @ 7:22 pm

A lot of people have heard about QlikView and Tableau, which in my opinion are the top two in dashboard software. But I found there are 3 dashboard software which are quite good, but much less well known: Dundas, iDashboards, Datazen. Today I would like to write about iDashboards, perhaps in other opportunity I’ll write about the other two. I want to write about them because I feel that these 3 software need more exposure in the BI community, because they are functionaly good, but most BI developers don’t know them.

What I write here is purely my opinion. I will start with what a dashboard is, and what features are important for them to have. Then I will be investigating iDashboards software with regards to the visualisation, development environment, data integration, security & collaboration, miscellaneous. After the conclusion, I’ll mention 43 other dashboard software which are outside the above 5.

What is a Dashboard?

A dashboard is a BI tool used for visualising and analysing data (BI tools which are not dashboard are reporting tools, OLAP cubes and data mining). Using a dashboard, business users can view their data at a high level to understand the trends and anomalies, then they can drilldown but still kept at high level (not at transaction level like in a BI reporting tool).

From the user point of view, a dashboards consists of a collection of controls i.e. widgets such as tables, charts/graphs (bar, pie, line, combi), maps, pictures, videos, gauges, thermometers, sparklines, KPI indicators (traffic lights, smileys, thumbs up/down, arrows), clocks, heatmap, dropdown, button, texts, labels, tabs, scrolling bars, sliders.

From the developer point of view, a dashboard system (I’m talking about a good one here, one which has undergone at least 5 years of evolution) consists of a development environtment (IDE = Integrated Dev Env), a database and a rendering engine. When we create a dashboard, we use the IDE to combine all those controls, connect them with our source database (or XML/text files, Excel spreadsheets, cubes), define KPIs, setup interactivity and drilldown path. When we click save, all those work is then saved into an internal database containing 40-50 different tables (controls, KPIs, users, dashboards, chart settings, permissions, data sources, formatting, hierarchies, tabs, etc). We then click view (or publish) and the rendering engine then read those tables and display the dashboard on the screen. The rendering engine also responds to mouse hover and mouse clicks from the users, showing different things and different data accordingly.

A scorecard was a particular type of dashboard which doesn’t have interactivity. It is a static dashboard where people can see the high level scores of snapshots of each key areas of the business so that at a glance, the business users will be able understand the status and position of their businesses. But that was 10 years ago. Now a days scorecards are interactives and users can filter, drilldown, and highlight certain areas of the scorecard; no difference to any other dashboards, apart from their function which is to report a particular state and status of the business.

What features does a Dashboard need to have?

A good dashboard will have these features:

  1. Very slick, modern looking visualisation controls with good interactivity abilities, including drilldown/up, hyperlinks, mouse click/hover.
  2. Good, easy to use development environment, complete with version control, layout tool, calculation, KPI, undo, scripting language, migration tool.
  3. Data sourcing: fantastic ability combine data from different tables and files (mesh up), i.e. multi-table, multi-column joins, filtering, sorting, grouping, XML reading, executing SPs, able to accept parameters and variables, data type conversion.
  4. Collaboration: different users see different data depending on their permissions, users can copy and modify dashboards, saves and share their version within their team, collaborate and share comments, single sign on, integration with SharePoint, templates, export/import, printing, alert.
  5. Architecture: rendering engine, web-based platform (thin client not Windows-based app), mobile access, web server, database storage, API, custom extension, flexibility.

Dashboards are used by 3 groups of users, each using the dashboards in different ways:

  1. Administrators: install the software, create data sources, create user groups, manage user permissions, manage licences, upgrade and patching.
  2. Developers: create KPIs, charts, join data tables, feed to controls and compose them into dashboards.
  3. Business users: view the dashboards, modify and save them, share their comments within their teams.

In the next 5 sections I will describe how iDashboards is doing with the 5 dashboard features above: visualisation, development environment, data sourcing, collaboration, and architecture.

1. Visualisation

iDashboards has a stunning visualisation front end. If you download and install the evaluation copy, you will be able to see various controls and images such as thermometer-like controls, speedometer-like (gauges), sliders, bar charts and line charts, maps, clocks, cones, KPIs, bubble charts, month-chooser, calendar controls, videos, sliders, colourful pie charts and ring charts, pyramids, butterfly charts, calculators, data tables, buttons, dropdown lists, rolling marques, Gantt chart, heatmaps, count down timer, images, videos, two-axis charts, traffic lights, texts and labels.

The interactivity is basic but useful, i.e. filter charts based on input values (parameters), choosing an specific data in one control limits the data in other controls, we can drilldown and go back up, when we click on a chart it drilled own to lower level chart or data tables, if we hover the mouse it explains the numbers, we can choose the value which act as a parameter using a dropdownlist, calendar input or selecting a particular section of a chart which will automatically select the corresponding part in another chart.

I said basic but there is not much more interactivity you can expect to get from a dashboard. So from that angle I think it is quite good. Plus we still have the following interactivity features: we can add/open bookmark, enter dashboard parameters, print the dashboard, change language, change skin colour, create alerts, receive alerts by email, change zoom levels, import and export to Excel.

2. Development Environment

I feel that iDashboards is designed with the business users as the developers. It is not difficult for business users to create a dashboard. It is almost as easy as creating charts in Excel: choose a layout, create charts, customize a little bit and we are done. Unfortunately the development environment has a non-intuitive user interface. For example, to discard a dashboard, we click on the left arrow, and to return to the application we click on the right arrow. Another example: to set the thickness of the border, we use a slider. It is not the standard Windows interface like what we find in Excel or PowerPoint. Throughout the development process I don’t feel comfortable with the user interface.

Putting a chart into a dashboard is quite easy, i.e. using drag-and-drop. We can read data from the output of a stored procedure, Excel file, text file, or database tables and views. We can create various charts, set the title size, colours, number formatting and margins. We can define legends and define drilldown to a different chart, different dashboard or a URL. We can create hierarchy for drilldown, we can pass a parameter when drilling down, and we can create macros.

Users can create reports based on the dashboard. The report can be in HTML format or in PDF. Reports contain the data snapshot which was in a particular dashboard at a particular point in time. Again, this “create report” feature seems to be intended for the user community, rather than IT developer. User can create reports for a particular chart, or the whole dashboard. This feature is very basic, almost like printing facility (to a PDF).

We can build custom application and place it into our dashboard. It is called VIPanel, stands for Visually Interactive Panel. In iDashboards Support’s website, developers can download VIPanel applications, along with their documentation. But VIPanel does not work on iOS (understandably as it’s iDashboards is built on Flash) and may not work on Android.

Unfortunately there is no facility for source control (check in, check out, version control), a feature which is important if more than 1 developers/users are working on the same dashboard. But there is a feature for migrating a dashboard from Dev/UAT environment to Prod environment and vice versa. We can export any chart or dashboard, from one environment to another.

3. Data Sourcing

iDashboards read RDBMS tables via ODBC or JDBC. It also read Excel. It can read SQL Server using native driver (version 6.5 to 2014), but the UI is not user friendly, i.e. we need to type server name, database name, instance name, port number, and table name pattern. It is not obvious that iDashboards can join multiple data sources. It seems (very vaguely) that using the “Custom SQL” data source we can join multiple tables, but it is not obvious nor certain. But as a work around, joining tables is not possible, we can create a view on the RDBMS. We can (definitely) use stored procedure as a data source, and when doing so we can supply parameters. We can create a custom drop down list for the users to choose as a parameter.

iDashboards has 3 data types: string, number and datetime. Data from RDBMS and spreadsheets are mapped into there 3 types. I don’t see that iDashboards can read XML file, with or without XSD/schema. It can not ready Analysis Services cube, or any other OLAP cubes. It can only read tables and files. There is a facility to pivot data, which I found very useful when I was developing dashboards using Spotfire.

User can create calculations, for example: colum1 / column2. The functions are very basic. Only a few functions are available: min, max, avg, total, median, standard deviation. iDashboards also have if statement to enable branching. We cannot filter on a calculated column.  We can read text file, or stared procedure. iDashboards seems to have quite an extensive range of date function (they are called “date macros”), from “current month” to “beginning of current month + 3, MMMM d, yyyy” which means if today si 15th July 2014 it is “October 1, 2014” (3 months from now, the first day).

4. User Collaboration

iDashboards’ security is Single Sign On (SSO) friendly, meaning that users don’t have to login to iDashboards. When users access the dashboard, the dashboard server automatically recognise who the users are, and whether they are allowed to access that dashboard or not. “who can access what” are stored in iDashboards database in SQL Server (or Oracle, DB2, Sybase). This database is called the “repository”. Different users see different data depending on their permissions. Depending on their roles, users can copy and modify dashboards, saves and share their version with their colleagues. A dashboard consists of several charts. Charts and dashboards are grouped, and the users are grouped, then the permissioning assign which user groups can access which dashboard groups.

In iDashboards we have a function to get the current user, which is very useful pass into RDBMS to get “different data for different users”. The “Filter on user” facility is more direct: we set the chart itself to present different data to different users.

In iDashboards, users can not write comments about a particular chart within a dashboard, and collaborate using these comments. There is no integration with Sharepoint, i.e. the permissioning model does not inherit Sharepoint security, such as folders and sites, so users can’t collaborate the sharepoint way, i.e. via webparts. Still about collaboration, in iDashboards we don’t have “corporate templates”, i.e. where all standard styling are stored to ensure that they adhere corporate standard. All standard appearance and stayles need to be implemented manually on each dashboard.

Excel is always a big topic in the user community when it comes to collaboration. “Can I export my dashboard to Excel?” is the most frequent question every BI user ask (because they want to share the data with their colleagues using Excel, merged with other data that they have). “Can I import my Excel file into my dashboard” is the 2nd most popular question. In iDashboards the answer is Yes to both questions. A dropdown list in a dashboard can be populated from a table in Excel, and the chart data can be sourced from a table in Excel. We can right click on any chart, and export the chart data to Excel. It is not native Excel format, but CSV, which can be opened in Excel.

Printing: users can print a single chart or the entire dashboard. Most dashboard software are web based and we can’t rely on IE printing feature, because this will include the menu bar of the browser. So an important question to ask is: “Does the print feature prints just the dashboard/chart, or the browser window?” The answer is just the chart/dashboard, not the browser window. iDashboards has “Print Dashboard” button which prints just the dashboard.

5. Architecture

iDashboards, like all other good dashboard software, consists of 4 components: a repository/database, a development environment, a dashboard engine/server. On the server side, we have the repository (which is on SQL Server or Oracle), the dashboard server (which is on Java web server such as Tomcat or Websphere), and connections to all the source systems. On the client side, we have the users PC, and in this PC we have a browser (IE, Chrome, Firefox), and in this browser we have the Flash, and within this Flash we have the dashboards. On the admin PC, we have the admin application, which is an HTML application.

It needs a browser with a Macromedia Flash add-on installed. This is not a problem on PC, but it is a problem for tablets and mobile device. Flash gives iDashboards an edge with regards to appearance, but it does limit the platform it can operate on. There are mobile/tablets with Flash, but it’s limited.

Conclusion

As always there are always pluses and minuses. The top 3 strengths of iDashboards are: visualisation richness, self service (easy to use by business users), and I believe the cost as well. Its top 3 weaknesses are data integration capability (ability to combine data and perform calculation), development environment which looks outdated, it is too simple/basic.

Would I recommend it? Yes, if your requirements simple / not too complex and if self service is very important to your business users (they want to create the dashboard themselves). No if you need to read cubes, or perform complex calculations.

Beside the 2+3 I mentioned at the beginning of this article, there are also 5 other dashboard software which have strong visualisation ability: MicroStategy, Birst, Roambi, Klipfolio, Inetsoft. Outside these ten, if you want a comprehensive list of dashboard software and you would like to evaluate them yourself, they are: Prognoz, Targit, Actuate BIRT, LogiAnalytics, Cognos Insight, Spotfire, Datameer, Information Builders, Cognos Express, PowerView, SiSense, BO Exelcius, SAS, Panorama Novaview, Jaspersoft, Salient, Board, Strategy Companion, BO Webi, CubePlayer, XLCubed, BO Voyager, PowerOLAP, Bitam, VisualMining, PureShare, Varigence Vivid, dminebi, Arcplan, Gooddata, InforSense, Pentaho, Alteryx, Zoho, Domo, Xtraction, Target Dashboard, Cartodb. Some of them are not just a dashboard software, but a BI software which has a lot of other capabilities apart from just creating dashboards. Most of their websites can be found by googling “Dashboard + their name”, but if you need their URLs please let me know. A good source for finding dashboard software are Dashboard Insight and Dashboard Spy.

15 February 2014

Building a Data Warehouse and BI on Low Budget

Filed under: Business Intelligence,Data Warehousing — Vincent Rainardi @ 11:34 am

The core activities in building a data warehouse (DW) and BI are:

  1. Design the dimensional DW (fact and dimension tables)
  2. Identify the data source for each column in the DW
  3. Build the ETL to populate the DW
  4. Build a cube on top of the DW

A lot of companies want to know how to build a DWBI on small budget. A medium size DWBI perhaps cost about £400k to build: £100k on hardware, £100k on software, £200k on labour. Many companies don’t have £400k. They want to spend £25k for POC, then another £75k to make the real one. Some even only have £50k.

Building a DW + BI with £400k

The £400k DW is as follows. The £100k hardware is probably something like this:

  • 4 DW servers 32 GB RAM (Dev, UAT, Staging/DR, Production), 6 cores. Blade or rack with cabinet. Dev & UAT could be 16 GB 4 cores, or VM.
  • 5-10 TB disk storage (1-2 TB per environment), RAID 5, 146 GB 15k rpm, e.g. NAS or SAN including enclosure.
  • 2 ETL servers 16 GB RAM, 4 cores (1 for DEV+UAT, 1 for Production + Staging/DR)
  • 2 Reporting & OLAP servers 16 GB RAM

The £100k software is to purchase ETL and BI software, such as Informatica, QlikView or Cognos, and the database software, such as SQL Server or Oracle. Also OS e.g. Windows Server 2012, Spotlite and VMWare, team collaboration tools e.g. TFS, and other software such as JIRA, Beyond Compare, Visio, Office, MS Project, SnagIt and RedGate.

The £200k labour is for paying 5-6 FTE for 1 year: 1 DW architect & data modeller, 1 BA, 1 ETL Developer, 1 Report Developer, 1 tester, ½ PM, ½ Infra.

Building a DW and BI with £50k

To get this down to £50k we need to minimize all 3 fronts: we need to use only 2 servers. One for Dev + UAT, 1 for Prod. The server will be installed with everything: ETL, BI software and RDBMS. We won’t have DR. This could get the H/W and S/W cost down to 20k including storage and OS.

The remaining £30k is to pay 1 FTE for 1 year who does everything from requirement analysis, design, build, test, manage. From data models, create databases and tables on the database server, stored procedure, ETL packages, reports, and analytic cubes. It will be difficult to find this 1 person because the variety of skills required probably won’t exist in 1 person.

With 1 server functioning as both RDBMS and ETL and Reporting and Analytics, it means that we can only have a small DWBI system. Say 1 TB total net storage, which means that the DW size is probably about 100 GB max, including Staging tables.

We may able to spend £3k on software, but not more than that. SQL Server is about £3k for 2012 Standard edition for 10 users (link). And that includes RDMBS, Reporting and Analytics. Oracle BI Standard Edition is $460 (about £345 including VAT, link) per named user, so 10 users is about £3450. You would probably need to consider open source BI and databases such as MySQL, JasperSoft and Pentaho

I think it is possible to build a DW with £50k, but it has to be a small, simple DW, so that a) the small infra can host it, and b) a single developer can build it within 1 year. Simple, i.e. type 1 dimensions, no accumulative snapshot fact table (only transactional or periodic)

Building a DW and BI with £100k

With £100k we can definitely buy Microsoft BI (£3k for 10 users), or Oracle BI (£3.5k for 10 users), but not Cognos or Informatica. Cognos 10 BI for 10 users is about £30k (link). This is just Analysis Studio + Report Studio (assuming the same price) and 1 Admin pack, including VAT. BO Edge is about £27k (link). That’s for 5 users and 1 report developer. Informatica PowerCenter starts at $140k (link).

The allocation of £100k is probably something like: £30k hardware, £10k software, £60k labour. To me this sounds right for a small company. The £60k is for 2 developers for a year, with 1 slightly more senior (say £25k – £35k split). With 2 developers we can be built a lot more. The first version can be release after 6 months, and we have another 6 months for enhancing it to the users’ needs.

The £30k hardware will buy us 2 servers with ok-ish spec (something like 4 cores, 32 GB) with good amount of storage, probably about 5 TB. We won’t be able to effort SAN, but we can effort RAID 5 on DAS.

Building a DW and BI with less than £50k

If the budget is below £50k, or the time frame is less than 6 months, I would recommend not to build a proper data warehouse. We can still deliver BI without having a DW. One way is using data virtualisation such as Composite. Another way is using QlikView, Tableau, or Spotfire which reads the data sources directly. From cost, the latter is lower than the former.

You can attempt to build a DW (say MS BI) but not proper Kimball DW. I call this Pseudo Dimensional DW. It is fact and dim tables, but no surrogate key. The dimension tables have natural key as their PK. This is a lot simpler and faster to build than a proper Kimball DW. For a small scale (5 dims + 2 facts), instead of taking 3 months, a Pseudo DW could be done in 1 month.

8 November 2013

The 5 Stages of DWBI Journey

Filed under: Business Intelligence — Vincent Rainardi @ 7:39 am
Tags:

During my years working in MS BI, I saw that the companies’ journeys in DWBI, particularly in MS BI, can be grouped into a few stages. I’m going to write that view, which I hope can be useful.

Stage 1: Report Developer (Cost: £25k/year. Duration: 2-3 years)

In most companies, the “BI Team” usually started with just 1 report developer. The business asked IT to produce report from “the database” (i.e. the back end database of the business application), and this report developer sat down with the business to “define” the requirements and write the report, say using SSRS, Jasper or Crystal. At this time they were not called BI. The word “BI” was never heard anywhere in the company. It was “Report”, not BI, not even MI. There were probably 5 people in IT (out of 100 employees, annual revenue of £5m) and 90% of the IT business was about desktop support, i.e. Exchange/email, file server, network, SQL server. And this report business was only a part time job. Then the business asked for the second report. And then the third. An the report developer became busier and busier. If the first report took him 3 weeks. The 5th report only took him 3 days. He became conversant with the tool, and with the data structure in the database (of the main business application, e.g. EPOS). No formal report specification was written. Just rough sketches of the report layout when sitting down with the business users.

How much? Annual spending at this stage was probably in the region of £20-25k. £14k for half FTE (Full Time Employee) for the junior developer (cause he’s doing desktop support too), plus £6k for training, PC, office, expenses and pension. No cost for software. For SQL Server shops, the RS came with SQL. For MySQL shop Jasper is free. May be £500 for Crystal, for the Oracle shops. I don’t like the perception that RS is free, so let’s calculate it. Say RS is 25% of SQL Server (the other 75% is Relational DB, IS, AS). And SQL Server is say £15k (server hardware + licences). So RS is £4k (25% of £15k). So the annual spending is 14k + 6k + 4k = £24k. In US the numbers are probably the same, but in $, i.e. $24k. How long? A typical company would probably spend about 2-3 years at this stage. Depends on the business growth of course. And the industry sector. A law firm would be quite different from an asset management firm. The former is not IT intensive whereas the latter is IT intensive.

Stage 2: More Reports (Cost: £150k/year. Duration: 1-2 years)

As the business grew, the business users would requested more and more reports from IT. This is especially true if the business application doesn’t have good reporting tool. If it can only report at transaction-by-transaction level, or customer-by-customer level, then the pressure to have reports developed by IT is bigger. If the business application doesn’t have the capability to report things at the company level, or provide “time series” information which shows how the company progresses from week to week, from month to month, then the pressure to create reports will be bigger. The number of reports produced by the IT department would probably exceeds 50 now. Probably even more than 80. Many divisions now are requesting reports. Not just the sales department, but also stock management and finance. And IT also need their own reports too, out of the helpdesk calls. The report development is no longer a part time job. It is a full time job now. Probably even grow from 0.5 person to 2 persons. Number of IT staff has grown from 5 to 15. The company size is probably about 200 employees. Or even 400 employees, depending on which industry sector. The annual revenue is probably £30m now, with total IT spend of £0.5m. Again depending on which industry. They don’t just use 1 business application, but probably 4-5 applications, most of them are off-the-shelf. The maintenance of these apps are outsourced. They have 1 or 2 full time report developers, and probably 1 or 2 full time application developer, most probably Web-.NET-SQL developers, or PHP-Java-PL/SQL.

I’m going to give an example in MS BI now, but it’s probably the similar situation in other tech stack. The report developer is now called MI Report developer. He/she uses SSRS to develop the reports, and host them in SharePoint. There are a few reports which are scheduled and emailed out. Reports provide values to the business. The quantity of the reports and the workload are both manageable. The business loves IT department because they can get the information out of these 4-5 business application, and get this valuable information to the business. It took 2-3 days to produce a new report, but sometimes 2 weeks because there are a queue. Prioritisation works quite well. Some of the reports beginning to be not used, particularly those which were developed 2 years ago. They are now superseded by newer reports. So they have initiative to archive reports which are not used. They monitor the usage of the reports. The IT manager want to make sure that the reports are worth the time invested to create them. These usage report is a worthwhile tool for the IT manager to put forward good arguments to those business users who ordered reports but not using them actively.

How much? The cost of MI is probably about £150k now. The salary of 2 report developers = 2 x £28k = £56k. May be 3 developers = £84k. Plus 3 x £6k for expenses & overhead (PC, office, training, pension) = 18k + 84k = £102k. Now IT has dedicated SQL Server for MI (£15k for H/W + S/W). Probably 2 servers (£30k). Plus some SharePoint licences to host the RS reports (say £10k). Probably plus £5k for some consultant fee to setup/configure RS/SharePoint. So the annual spending is 102k + 30k + 10k + 5k = £147k. How long? This stage is surprisingly fast. Probably only 1-2 year, at which point the company then will experience stage 3: too many reports.

Stage 3: Too Many Reports – Self Service BI (Cost: £150k/year + £50k one-off. Duration 1 year)

By the time the number of report development workload achieved 3 FTEs (Full Time Employees), or 150k per year, the IT manager is very much interested to the idea of Self Service BI. The idea of creating a data warehouse and cubes on top of it becomes a good business proposition from the cost point of view. The users will be able to use Excel to query the cubes. They may consider the idea of buying BI software like BO, Cognos, MicroStrategy, QlikView but the cost is probably becomes a barrier. It’s incredibly difficult to get the prices (if you are not buying) but I’ll take a stab, otherwise we won’t have an idea about the cost. If I’m way off, I’ll be glad if the corresponding vendors can correct me. QlikView is probably about £150k for 100 users, i.e. £1500 per named user including document licence. Plus £50k for live & test server licences. Plus £10k for SharePoint component + workbench. Plus £10k for server/infra cost. Plus training £10k for training, £20k for consulting services. So total of 150k + 50k + 10k + 10k + 10k + 20k = £250k one off cost. Plus 20% annual cost for support (£50k). BO is £35k per 25 CAL, x4 for 100 users = £140k. Plus 3 x £400 = £1200 for developer licences. Say £50k for training, consultant, servers. So £200k one off, plus probably 20% = £40k annually. Cognos and MicroStrategy is probably similar, i.e. in the region of £200-250k one off cost, plus £40-50k annually, for 100 business users. So it’s a no-goer if the Stage 2 cost is £150/year, because these developer cost won’t go away, but increase as they have to hire a QlikView/BO/Cognos developer, or retrain existing staff.

Hence the idea of doing self service using a data warehouse + SSAS cubes starting to take off, in the head of the IT manager. The numbers look good: SSAS is free. At least that was the initial perception. SharePoint is already used for SSRS, and the Excel is widely deployed to each desktop, so no additional cost then! And this Data Warehousing stuff will only cost 1 new server, with a big storage, so £20k. Say 2 new servers, plus storage = £40k. Much more palatable than the £250k of the popular off-the-shelf BI software above. There is a training cost, i.e. SSAS, SSIS, Modelling, say £10k. This initiative is double-acting as a “retainer” too, from HR point of view. Good people stay because they can develop their skills. Getting them ready to go to the next level in their career. And the whole DW OLAP thing is a new world for them. Of course they’d stay. Eagerly. At this point they learn about dimensional modelling, about ETL, and about data quality. They learn about metadata, about the technicality of building cubes. After struggling about 6 months, in their spare time, they managed to construct 1-2 fact tables and a few dimensions in the data warehouse, and built a cube on top of it. The demo was a great success, the business absolutely loved it. I’ve been attending “first business demo” (on SSAS) for about 8 times now and I haven’t experienced a single event where the business is not excited seeing their numbers being sliced and diced in front of them, turned into time series charts and gadgets.

Now, with all data being delivered via the cubes to them, the business can do self service BI. What they loved in the first few weeks was that they don’t need to wait 3 weeks for IT to create a report for them. They can directly get the information they need themselves. And what the IT manager loves in the first few months is that the number of report requests was down a bit. The number of reports each quarter was something like: 100, 120, 150, 200, 160. They managed to knock some reports of the list (in this example 40) because the information need can be satisfied/fulfilled from the cubes, i.e. self service BI.

At this point they would also consider 2 kinds of software which helps the development of the data warehouse: a one click warehouse builder (such as RED and Kalido), which can build a data warehouse very quickly, and a data virtualisation software (such as Composite), which can deliver BI without the pain of building a data warehouse. So they tried out 1 or 2 of these software. They would hope that these software could enlighten their workload. A few companies might switch and abandon the MS BI, but majority would stay with MS BI and only use tools that complement to MS BI.

How much and how long? The one off cost is about £50k. £40k for servers and £10k for training. The annual spend is the same as Stage 2, £150k/year (£100 for 3 people, £50 for infra). This is for 100 business users. How long? Usually companies stay at this stage only for a short period, i.e. 1 year. This is the period when the initial data warehouse is constructed, and the cubes developed. After the initial version is deployed, the IT Department will have many requests to put more and more data into the DW and cubes. And that’s Stage 4.

Stage 4: Growth of DW and Cubes (Cost: £200k/year. Duration: 2-3 years)

The were many requests to put various attributes and measure into the cubes and warehouse. Can you put the Sales Person into the cube please? Oh and supplier please. And the store manager names please? There will be a change of product grouping next month, can you accommodate that please? We have a new price list coming this quarter, that needs to go into the warehouse. Can we have the employee hierarchy in the cube? You get the picture. Lots of changes and new requests to add more data into the warehouse and the cubes.

The danger is, the “BI Team” is still new to DW. They just learned yesterday what a dimension is, what a fact table is, and what grain is. They are inexperienced in data modelling. And if they got it wrong, it cost a lot of working days to correct it. Most probably weeks, not days. They read Kimball’s data modelling book, and expect to see a design that suitable for their case, but there are a lot of specific points which are not mentioned in the book. They tried to browse the internet to get information about data modelling, but got even more confused. And they need even more guidance on the ETL side. They are clueless how to do SCD type 2. They hope that SSIS SCD2 task will solve it for them, and it did to some degree, but alas, the population of the fact tables now becomes an issue. Plus the performance is an issue, and the metadata is an issue. At this point, they seriously thinking to hire a data architect. Someone who have been designing and building data warehouses and cubes in different companies. Someone who knows how the ETL should be built. So they hire a data architect, say £50k/year. He knows about data modelling, ETL, and SSAS. And they hire an ETL developer too. An SSIS + T-SQL guy who has some exposure to cubes. This guy has a lot of experience moving data from 1 place to another. Data integration experience. 1 of the 3 report developer now becomes an ETL developer. They are now called a BI Team, with 5 people: 2 report developers, 2 ETL developers, 1 data architect who also does SSAS.

Some of the SSRS reports are now decommissioned. They managed to get the reports down. Say from 200 to 100. Some of these reports are re-pointed to the data warehouse. Instead of reading from a copy of the business application database, the SSRS reports read from the data warehouse tables. Initially the DW needs to be reconciles with a copy of the business application (of course!), but as time moves on, there are measures which are created in the warehouse which don’t exist in the live app, for example: ranking measures such as decile and quartile, and various ratios between original measures such as financial ratios.

Business users enjoys the benefit of having a lot of data at their finger tips. They can have the information instantly, without waiting for 3 weeks like before, for the reports to be created by the report writer.

How much and how long? Surprisingly, instead of decreasing, the cost of BI is increasing. It’s now 5 people instead of 3, so 4 x £28k + 1 x £50k = £162k. Plus £40k of annual infra cost such as additional capacity, replacement servers, etc. So about £200k/year. How long? A typical period is 2-3 years, after which the business users become dependant of BI. The data coming from the DW via cubes and reports really drive the business forward.

Stage 5: Users Became Dependant of BI (Cost: £50k one off + 10k/year. Duration: 3-4 years)

One by one, each business area uses the new technology (DW & cubes). Sales department, manufacturing department, finance department, marketing department, even HR department. In insurance industry: underwriting, claim, reinsurance, actuarial, finance. In utility industry: sales, customer service, operation, finance. The information from BI affects the business decision in each area: premium at renewal, pricing policies, CRM strategy, product design, marketing strategy, production cost control, etc. Even more requests to put more and more data into the warehouse and cubes and reports.

The companies at stage 5 usually looked for BI tools. Visualisation tools. And data warehouse tools such as DQ and ETL. They want to make the lives of their users easier with BI and visualisation tools. They want to make their own lives (IT) easier, using various development tools. As development work grows, they need to tidy up on the governance, version control & code repository, request & bug management, release management and data quality. Software like Red Gate, SQL Sentry, JIRA, TFS and would come into their consideration horizon. Red Gate & SQL Sentry is SQL Server development tools (and server monitoring). JIRA is request and bug management tool. TFS is code repository and version control tool. It also does task management such as Sprint planning.

With regards to data visualisation, software like Cube Player, Power BI, Strategy Companion, Panorama, XLCubed, Pyramid, Verigence Vivid, Bonavista Dimensions and Tableau would come into the picture of any MS BI shop. Out of about 15-20 SSAS client out there, most companies only know 1 or 2. Probably 30 actually (I mean SSAS client), complete list is here at Vidas Matelis’ ssas-info website, the most content intensive website on SSAS. At this stage, companies are trying out these SSAS client, and eventually hooked with 1 of them. I can’t recommend which one (even though I have personal preference), hmm… probably good to write about SSAS clients actually. I just browsed for them and very little is written about their comparison. Most notably ones are here, here and here.

Apart from SSAS clients, companies also frequently thought about reporting tool, i.e. what’s better than RS? If their budget is big enough, they will probably be approached by (or think about) QlikView and Spotfire. If their budget is minimal they probably think about Jasper and Pentaho. QlikView, Spotfire, Hana, TM1, PowerPivot are all “in memory OLAP”. Unlike SSAS which stores the Multidim structure on disk, these tools read relational tables, and create the multidimensional structure in memory. Much like MicroStrategy. TM1 is not like that, it stores the proprietary Multidim structure as files. Ditto QlikView.

How much and how long? It depends on the tools they select and the industry they are in. For retail, telco, shipment, manufacturing, legal, mining, transport, utilities, it is about 2-3 years. For banks, fund managers, healthcare, insurance, probably 3-4 years. Could be 5. For costing, let’s use Tableau. The cost of implementing Tableau on top of DW + SSAS is surprisingly low: 50k one off + 10k/year. Developers uses Desktop Pro which is $1999 + $199/year maintenance (source: link). Business users will become report creators, they uses Tableau Online Licences which is $500/year. And report reader is free. So if we have 3 developers and 15 report creators the cost is about (assuming the price is same in the UK but in £): 3 x £2k = £6k one off + annual cost of 3x£200 + (15+3) x £500 = £9600/year. Plus £40k for training, servers and consultant it’s around £50k one off project cost and £10k/year on-going cost.

After

What’s after stage 5? BI is stable, matured and managed well. Operation are governed. What usually happened at this stage is a complete change of technology. For example: implementation of Cognos or Business Objects. Yes, changing all the Microsoft BI stack with Cognos or BO stack, leaving just the SQL Server DB engine, which could be replaced with Oracle too. The other approach is to stay in Microsoft BI path, and look at a few things:

a. Parallel Database

For high volume industry such as retail and telecom, as the business grow, the number of transactions per day could be in the million. This requires some kind of MPP (Massively Parallel Processing), meaning parallel database engine. Either Netezza, PDW (Parallel Data Warehouse, Microsoft new MPP), Exadata, Greenplum or Teradata. Or column oriented database such as Vertica, Sybase IQ.

b. Data mining
For certain industries, CRM and marketing is vital, particularly in the retail sectors. They could look into integrating CRM and marketing campaign management with their data warehouse. This requires data mining and forecasting, which can be done in SSAS, but some companies uses SAS, SPSS, BIRT or even manually using .Net or R.

c. Big Data

This is where most company after stage 5 are going. At least trying. Map Reduce, Hadoop, Mongo DB, Hive, and things like that. The project cost is not high, because it’s common hardware and the software is mostly free. But the benefit is potentially big. Hence the motivation.

d. Text Mining

For certain industries such as healthcare, insurance and retail, the proposal of able to crawl over millions of patents documents, emails correspondence with customers, and thousands of policies, have business good values. Software like Clarabridge, Angoss, Attensity, NetOwl can be implemented for this purpose (mining text within documents or emails for information).

For IT managers, I hope this helps broadening the DWBI horizon, whatever stage you are in. For vendors, I’d be grateful if you could advise if I’m wrong, particularly about the cost. For developers, I hope this article could clarify a bit about the future of your career path, if you stay within DW BI world. There are many experts out there and I’m not even an expert so I would be grateful for your comments.

VR, 9/11/14:

  • Andrei Padrei on QlikView and Tableau prices: link
  • Tableau prices: link

17 December 2012

Composite

Filed under: Business Intelligence — Vincent Rainardi @ 8:19 pm

VR, Update 11/10/2015: Composite is now formally called Cisco Data Virtualisation Suite

Composite enables us to integrate data from various sources to provide BI. It is called data virtualisation (DV). It’s the exact opposite of a data warehouse (DW), where we collect data from various sources and store it in a new data store.

The biggest advantage of using Composite is its speed to market. Without building a data warehouse, we can provide the BI in 2-3 months, as opposed to 12-18 months. We don’t need to design the DW (3-6 months), we don’t need to design the ETL (6-9 months), and we don’t need to spend a lot of time testing the ETL (2-3 months).

As a DV tool, Composite is very good. It has it’s own SQL-like language. It has many functions. It has a adapter to most RDBMS as well as Hadoop and Teradata. Also flat files and Excel. We can create views and procedures which integrate many sources. We can cache them, so that the BI tool can experience fast query performance. The cache can be a database table, or we can also cache it in Hadoop, so that the query is super fast.

Rather caching the data, we can also build loading procs to retrieve data from the source system incrementally. This way, we lighten the burden to the source systems. We can also setup various triggers to invoke those loading procedures either on timer basis or event-based. And we can create a “batch” procedure which calls the loading procedures one-by-one so there’s no “waiting and delay” which happens if we arrange them on timer basis.

On the publication side, there is a facility to publish the views and procedures that we created as web services. These can then be consumed by BI tools, for example by Spotfire Information Designer.

That’s the strong points. What’s the weakness? The scripting language is primitive. Compared to Java, C# or even PL/SQL there are a lot of gap to close. Data manipulation functions are poor. In this area, Informatica functionality is far more superior. Programming environment is poor. It’s like going back to 1992, no debugging, no break point, no watch.

The database function is also weak. For example, we can’t truncate a partition. Something that is very basic in data loading. Data inspection (and DQ in general) is also weak. We can’t easily generate the statistics of the source data, e.g. distribution of values.

Would I recommend Composite for BI projects? Yes of course. Because it provides an excellent easy-to-use tool to integrate data from many places, to be consumed by the reporting/BI tool. It can cut development time significantly, as in 3 months instead of 12 months.

Disclosure: I receive no financial benefit from any company saying/writing this. The opinion is entirely mine, and it doesn’t necessarily reflect opinion of my employer or my clients. I have used both Informatica and Composite, hands on, as well as SSIS, BO, Oracle, Spotfire. As always I may be wrong and I’m open to suggestion. If I said something incorrect above, I would be happy to be corrected.

Vincent Rainardi, 17/12/2012

Next Page »

Blog at WordPress.com.