Data Warehousing and Data Science

15 April 2018

BI vs Data Science

Filed under: Business Intelligence,Data Science,Machine Learning — Vincent Rainardi @ 5:57 am

Last night there was a talk about Business Intelligence (BI) vs Data Science at Dataiku London. Unfortunately I was not able to attend but that event inspired me to write about this topic as it has always been an interesting topic for me. And I think it is very fitting to this blog as it is about Data Warehousing and BI, and recently I added about Data Science too.

Definition

Before we go too much further let’s remind ourselves what BI and Data Science are. Data Science is about scientific approaches to manage and analyse data using statistics, machine learning and visualisation (link). BI (Business Intelligence) is also about managing and analysing data and visualisation, but using business approaches, not scientific approaches (link, link).

I have been creating business intelligence and analytics applications across many industries including travel, banking, pharmaceutical, manufacturing, insurance. Both with a data warehouse/mart, and without. They are used for analysing the business, for reporting, and for dashboarding.

I have done Big Data development for a few years, including Hadoop, Hive, Data Lake, Spark, graph and document databases. In the last 6 months I have been learning Machine Learning and Neural Network, using Jupyter, Matlab and R, and recently Tensorflow. I understand how they are used for visual recognition, predicting prices, network security, marketing and playing games.

This Article

I have written a few articles about “Choosing between Big Data and Data Warehousing” (link, link, link) which differentiate the back-end part of BI and Data Science. So in this article I would like to focus on the front-end part, i.e. how they are used.

One of the best approaches to compare the usages of BI vs Data Science is probably using use cases from a particular industry. So in a particular industry we compare what BI can do and what Data Science can do. For example, in insurance BI is used for analysing customers, earnings, risks, financials and claims, and for reporting (operational, management and regulatory reports). Whereas Data Science is used to forecast losses, income and profitability (both account and product), adjusting rates, classifying customers for renewal, and grouping potential customers.

If we dive into other industry, say travel, we will understand the differences between BI and Data Science a bit more. The more industries we study, the more we understand the differences of how BI and Data Science are used. One is using business approach, and the other is using scientific approach.

With the above background I think we are ready to dive into core of this article, so let’s start with the first industry: insurance. I will then do one more industry: travel. I am a beginner in data science and particularly in machine learning. Suggestions and advice from readers will be most welcome, with regards to both about the approach and the content at vrainardi@gmail.com or via the comment box below. Once again in this article I’m not discussing the how or technical aspect, but the usage or business advantages, comparing the BI vs Data Science.

Insurance

General insurance is about insuring buildings, ships and businesses from risks like storms, sinking and frauds. There are about 20 classes in general insurance including property, aviation, marine, construction, oil & gas, professional indemnity, public liability and riots. It is a little bit different from life and pension business, and from retail business, i.e. home, car, health, pet and travel insurance.

I guess in this example we need to pick one line out of the above 3 main insurance businesses, which branches into many lines. Let us use car insurance for this example, because it is retail business so many people can relate to, and it’s easier to understand. It has many attributes which provide good illustration for the BI and Data Science usage.

The main usage of BI in car insurance is to analyse premium income and claim costs. These two numbers (and many of their derivatives, i.e. net, gross, subs, etc.) are analysed with regards to vehicle, time, office and customer attributes. Two of most common methods of this analysis is by using OLAP cubes and dashboards. The dashboards presents the most important findings such as income growth by vehicle types and profitability by customer groups, with guided drilldown along chosen pathways. The cubes provide a free-to-roam exploration platform, enabling drilling any numbers to the lowest details on any available pathways.

The business values of these activities are from knowing when and where exactly the revenues and costs are coming from and what factors influence them. This knowledge gives the business greater control and ability to grow the profitability across all business segments. Understanding claims and premiums profile for each type of vehicle (age, van/SUV, fuel, gearbox, engine size, parking, usage, milage) and each type policy holders / drivers (location, years of licence, profession, accident history, health, penalty points) enables the business to target the sales and marketing budgets on the more promising customers, as well as pricing the policy at the correct level. It also enables more precise filtering with regards to following up the leads from the Aggregates such as GoCompare and MoneySupermarket, and from brokers.

The Data Science is used to analyse customer churns, establishing rates, and analyse black box telematics with regards to risks and impact to premium levels. Standard actuarial rates give the business non-competitive advantage as they use standard factors such as driver age, engine size, mileage and parking location. Data Science enables insurers to factor-in new attributes such as driving behaviour, driving routes and driver’s social media, calculating the underwriting rates more specifically and more precisely for each policy. This enables the business to win new businesses, both commercial and retail, as quotes/pricing is the most important factor influencing new customers. Also, machine learning is used to improve the efficiency of online advertising, by predicting which how many users (and of which types) would click-through on different types of ads and by optimising bidding.

So BI and Data Science give different business advantages. It is not true that Data Science will replace BI in car insurance business, and in insurance in general. On the contrary Data Science will complement the business advantages that BI currently delivers.

Travel

Selling flights, hotel, holidays and cruises are the core businesses of travel companies. Car hire, travel insurance, excursions, airport transfer, airport parking, tours, restaurants, attractions, shows, villa, currency exchange and rail passes are the secondary businesses. They have retail and commercial business lines, both heavily utilising internet as a marketing platform because it reaches many countries and very cost efficient.

The BI is used for analysing web traffic such as funnels and conversion rates, revenue breakdown, customer segmentation, customer loyalty programs, and marketing campaigns such as email alerts and newsletters. Funnels enable the business to understand how much traffic filters trough each step of the purchasing process, and at which page each customer stops, as well as the sequence of pages they viewed and for how long each. This ultimately enables us to improve the website content and the products, resulting in higher conversion rates. Revenue breakdown is by product, by customer types, by country, by platform, by campaign, and by time. Marketing campaigns enable the business to drive more sales, with the content of each email/newsletter tailored differently to each individual customer. Each image on the newsletter is uniquely tagged enabling us to track which click from which customer triggered each sale.

The business value of these activities are: increased revenue, wider customer base, and increased customer loyalty. The revenue is higher because of higher conversion rates on web traffic and because the marketing campaigns drive more sales to the websites. Also because as a result of breakdown analysis we can focus our budget and time on the more profitable and promising product types and customer segments. The customer base is wider because market penetration from the campaigns. Customer loyalty increased because the offer is tailored specifically for each customer segment, sometimes down to each individual customer.

The Data Science can be used for customer segmentation, predicting campaign results, and analysing web traffic. It seems overlapping with the above BI activities, but if we dive into the detail levels they are actually complementing. By associating real time web traffic data and past purchase data using machine learning, travel companies can find out which customers have the best potential for purchasing products (using cookies to identify customers). The learned parameters are then used to display offers on the web pages the customers are currently on. The offers are specifically tailored to the highest potential that they are interested, based on their browsing behaviours (current and past sessions).

Predicting campaign results is done by first tracking back each sale to find its origin and contributing factors. The origin can be a click on an image on a particular email sent by a particular campaign. This can be identified by the tag on that image, which has the customer ID, campaign ID and product ID. The origin can also be a Google or social media advert from a particular campaign that the travel company runs, giving us the ID of the ad and possibly the user/audience ID. The contributing factors of a successful campaign can be the product, creative, layout, timing, pricing, audience and origin. Knowing which factors contribute the most to the success or failure of each campaign can be very revealing, from which we can then improve the subsequent campaigns. Various machine learning algorithms can be used for this including support vector machine, random forest and neural network.

Blog at WordPress.com.