What is it?
Hana is an in-memory database from SAP. It enables us to do lightning OLAP analytics from relational databases. We can access Hana data using SQL, MDX or from BO. Or from Cognos, Report Builder and PowerPivot. We can create calculated attributes and measures. We can create our own data model and hierarchies.
A must read technical introduction from SAP is here, the official web site home page is here, and the technical operation manual is here, the data modelling manual is here (very bad, no screenshots at all) and the Wikipedia page is here.
Actually Hana is not just an in-memory database. It is also an analytics appliance.
Cisco provides a Hana appliance on Intel Xeon 7500 (link).
HP provides a Hana appliance on ProLiant DL580 and DL980 (link).
IBM provides a Hana appliance on x3690 X5 and x3950 X5 (link).
Violin provides a Hana appliance on Intel Xeon 5600 (link).
HANA is also a cloud application. It is a PaaS, Platform as a Service. EMC and Cisco provides a Hana could app on Cisco UCS and EMC VNX 5300 (link).
Hana was released in Nov 2010. Version 3 was Nov 2011, and version 4 will be in Q2 2012.
How does it work?
Hana has a parallel calculation engine. Hana takes in a SQL query and run the query in parallel across many partitioned data sections in memory, and across many multicore processors.
Hana also has business functions to speed up calculations, for example currency conversion and calendar functions, as well as predictive analysis library.
Hana database is both row based and column based. If a calculation involved only a few columns, the data columns are stored in columnar store (compressed) in memory. Hana also support temporal tables which speed up queries for historical data.
The key to Hana speed is the first sentence I wrote above: in-memory database. Memory is 1000 times faster than disk, and these days we can have a server with 2 TB of memory.
But that is the story of the past. Everybody now store in memory. What Hana has that the crowd doesn’t is the technique to load data from memory into CPU cache. To partition a query and data, and load them into different CPU cache so that each CPU can process them in parallel. That’s what differentiates Hana from the others.
This is how Hana works: (from the must read technical introduction that I mentioned earlier, link, page 10)
Hana does parallel SQL execution and parallel aggregation. These parallel execution and aggregation happens on multiple threads. Each thread has a hash table which stores the aggregation result. These hash tables are then merged by the merged threads, as we can see below: (from the technical introduction doc, link, page 12)
What is it for?
How do we use Hana? What can we use it for? Using Hana, we don’t have to create summary tables in the data warehouse. This removes the basic need for a data warehouse. We need a DW to aggregate data fast. Now we can do it using Hana. This also eliminates the need for having OLAP cubes. We use OLAP cubes such as SSAS to aggregate the data fast. Now we can do it using Hana.
Still need a SQL interface though, which is a no go-er from the start. Who wants to type in the SQL queries? Definitely not the business users! Hence we need a BI tool, such as BO Universe, Cognos, SSRS Report Builder and PowerPivot. The major weakness of BO Universe, RS Report Builder, Cognos and other flexible reporting technology when operating on a DW is that it is slow. Because the DW is disk based. Now that we have Hana, an in-memory database which is both row-based and columnar (and temporal!), these flexible reporting tool will be incredibly fast.