In some cases, several fact tables need to be joined, for the purpose of reporting. Cubes join different fact tables, but reports can’t do that.
There are 2 approaches to do this:
- Create a “fact table bridge” table. The bridge table contains the fact key columns of each fact table, plus any filtering columns required.
- Create a new fact table containing all the required measures at the combined grain.
For approach a) some considerations are:
- The fact tables have different grains (if they have the same grain they should be merged). If there are 2 fact tables, it is possible that fact table 1 has 1 row but fact table 2 has 2 rows.
- The fact tables need to have a fact key column, which is a unique row identifier. This column needs to be indexed to support the join operation.
- During DW loading, the bridge table needs to be populated after both fact tables are populated.
- If this approach is taken, the downstream reporting system needs to do daily incremental extraction from this bridge table + fact tables, and persist the joined-up fact data in a wide fact table.
For approach b) some considerations are:
- It will require extra disk space as fact data is duplicated.
- Report will have shorter query time because there is no need to join any tables.
- It is possible to create a calculated measure involving facts from different fact table and persist it in the combined fact table.
For example, in retail, we have order fact table and delivery fact table. Order fact table contains the customer order, with the grain of 1 row for each order line. Delivery fact table contains the delivery details. One order can be sent in multiple deliveries, and several orders can be sent in one delivery.
To get all the cost information of an order, we need to join these 2 fact tables. We group by order ID, and sum up the cost of goods sales column (COGS), and the delivery cost column (DC). Joining the 2 fact tables is a lot easier because we have this bridge table, which contains the row identifier for both fact table (the fact key column).
Example of a calculation involving the two fact table is the total cost of an order, which is the sum of COGS and DC. Another example is the delivery lead times, defined as the elapsed days from the order date to the delivery date. Note: it’s the first order date and the last delivery date for that order. Meaning that, if an order is sent in multiple deliveries, then we take the last delivery date. If several orders are sent in one delivery, then we take the first order date.
The bridge table contains the fact key from both fact tables, i.e. OrderFactKey and DeliveryFactKey. The bridge table also contains several filtering columns, such as: OrderDateKey, DeliveryDateKey, CustomerKey, ProductKey, Order Number. These filtering columns allow the reports to filter the big bridge table (could be 500 million rows) to just rows for a particular date or a particular order.
The filtering columns need to be indexed. Alternatively, if all fact tables are periodic snapshot fact tables, the bridge table can be partitioned on the SnapshotDateKey. Giving the bridge table a PK is optional, but the advantage is to be able to uniquely identify a row.