# Data Warehousing and Machine Learning

## 22 March 2019

### Bridge table

Filed under: Data Warehousing — Vincent Rainardi @ 8:22 am

In dimensional modelling, a bridge table is a table which connects a fact table to a dimension, in order to bring the grain of the fact table down to the grain of the dimension.

The best way to learn the complexity of this bridge table is using an example, so let’s get down to it.

Driver Bridge Table

An example of a bridge table is the insurance claim fact table in the car insurance industry. The grain of this fact table is one row for each claim (this is different to claim payment fact table where the grain is one row for each claim payment). One of the dimension keys in this claim fact table is the policy_key. Another dimension key is the insured_driver_key.

In car insurance there is only one policy per claim so there is no problem with the policy key. There is usually one driver per claim, but there could be two drivers per claim. It is rare but occasionally there is third driver. I mean drivers covered by our policy, not the third party drivers. The second driver on the policy is called the named driver (ditto the third driver). In other company there could be a fourth driver, but for this company/case let us set that the maximum number of driver to three.

The function of the bridge table is to connect the claim fact table to the driver dimension. Because there could be more than one driver per claim, the surrogate key column is not called driver_key, but driver_group_key. The bridge table is called Bridge_Driver_Group, which has only two columns:

• driver_group_key: connects to the Insured_Driver_Group_Key column on the claim fact table
• driver_key: connects to the driver_key column on the driver dimension table

A Sample Claim

Now consider a claim where the claim fact table is like this:

Fact_Claim table:
1178|20100518|298|32839|88|16000|1200|14800|500|14300
3122|20180725|397|45578|88|32000|4000|28000|1000|27000
3123|20181108|267|45642|89|51000|3000|48000|1500|46500

Bridge_Driver_Group:
driver_group_key|driver_key
88|188
89|302
89|304

Dim_Driver:
driver_key|driver_name|date_of_birth|driving_licence_number
188|Chloe Dunn|1980-08-08|3824920
302|Peter Keller|1971-06-05|4503532
304|Samantha Keller|1970-12-07|4507651

The key question in car insurance claim is how much claim has a driver made, across all policy, across call insurer (insurers share claim data). This figure is used to determine the premium discount for the following year. The more recent a claim is the more weight it has to the next year premium. Chloe Dunn has 2 claims totalling £42800 (before excess). The first one was on 18th May 2010 and the second one was on 25th July 2018.

Type and Weight Columns on the Bridge Table

But Peter Keller shares a policy with his wife, Samantha Keller. We know that both of them has claimed £48000 but how much each is what we need to know.

To do this we add a weight column and driver_type on the bridge table. There is a business rule to determine the weight, for example: the main driver get 2/3 and the second driver get 1/3. If there are 3 drivers, the main get 50%, the second and third get 25%. The rule depends on the company. Each company has a different rule.

Bridge_Driver_Group:
driver_group_key|driver_key|driver_type|weight
88|188|single driver|1.000000
89|302|main driver|0.666666
89|304|second driver|0.333333
90|405|main driver|0.500000
90|406|second driver|0.250000
90|407|third driver|0.250000

Now we know how much claim is attributed to Peter Keller. It is 2/3 of £48,000 = £32,000.

The Group Dimension

Note that the bridge table above has many-to-many relationship with the fact table. Analysis Services doesn’t like many-to-many relationships. In Analysis Services modelling, a fact table can’t be linked to a bridge table like above. Instead, a fact table must be linked to a dimension table in a one-to-many relationship. This dimension table is called a Group dimension, like this:

Fact_Claim table: (same as before)
1178|20100518|298|32839|88|16000|1200|14800|500|14300
3122|20180725|397|45578|88|32000|4000|28000|1000|27000
3123|20181108|267|45642|89|51000|3000|48000|1500|46500

Dim_Driver_Group: (this is the Group dimension)
driver_group_key|driver_group_name
88|88
89|89

Bridge_Driver_Group: (same as before)
driver_group_key|driver_key|driver_type|weight
88|188|single driver|1.000000
89|302|main driver|0.666666
89|304|second driver|0.333333

Dim_Driver: (same as before)
driver_key|driver_name|date_of_birth|driving_licence_number
188|Chloe Dunn|1980-08-08|3824920
302|Peter Keller|1971-06-05|4503532
304|Samantha Keller|1970-12-07|4507651

Account and Diagnosis Bridge Tables

Another classic example of a bridge table is in retail banking. It is the bridge table from the account to customer. Another famous example by Ralph Kimball and Margy Ross is the multi value diagnosis. For this please read Kimball Data Warehouse toolkit chapter 13. For the account to customer bridge table it is in chapter 9 of the same book. I don’t want to repeat them here as they have done an excellent job in explaining it there. Note the book I quote above is the second edition. For other editions please look at the index for “bridge tables”.

Policy Dimension

I just realised that the Kimball Data Warehouse toolkit book also mentions the bridge table for insured driver (in chapter 15). But the difference is that the driver dimension is linked to the policy dimension, not driver group dimension. Also that the insurance example in Kimball book is about a premium fact table, not claim fact table.

The reason it is possible to use a policy dimension instead of driver group dimension is: in the car insurance the driver is attached to the policy. So I think Kimball’s approach is better, because it eliminates the need to create the driver group dimension, which in reality such thing didn’t exist. But the alternative of not using any dimension at all is also possible, I mean we attach the bridge table directly to the fact table, like the first example I mentioned above.

If we replace the driver group dimension with a policy dimension, what we get is something like this:

Fact_Claim table: (same as before)
1178|20100518|298|32839|88|16000|1200|14800|500|14300
3122|20180725|397|45578|88|32000|4000|28000|1000|27000
3123|20181108|267|45642|89|51000|3000|48000|1500|46500

Dim_Policy: (previously this was Dim_Driver_Group)
policy_key|policy_number|inception_date|last_renewal_date|expiry_date|policy_holder|cover_type|european_cover|effective_date|expiry_date|active_flag
48523|NK98402|2015-08-12|2018-08-12|2019-08-11|Chloe Dunn|comprehensive|Y|2015-08-07 00:00:00|9999-12-31 23:59:59|Y
63291|NK84826|2014-11-01|2018-11-01|2019-10-31|Peter Keller|third party|N|2014-10-26 00:00:00|9999-12-31 23:59:59|Y

Bridge_Driver_Group: (replace the first column with policy_key)
policy_key|driver_key|driver_type|weight
48523|188|single driver|1.000000
63291|302|main driver|0.666666
63291|304|second driver|0.333333

Dim_Driver: (same as before)
driver_key|driver_name|date_of_birth|driving_licence_number
188|Chloe Dunn|1980-08-08|3824920
302|Peter Keller|1971-06-05|4503532
304|Samantha Keller|1970-12-07|4507651

We need to bear in mind that in car insurance, the policy dimension is the biggest dimension. It has a lot of columns and a lot of rows. And it is a type 2 dimension. The last renewal date and the expiry date change every year, and therefore the policy_key column in the bridge table will need to be updated. This is really not suitable for the purpose of the bridge table (which is to link the fact table to the driver dimension).

Group Dimension Created from Policy Dimension

The best solution I found is to create a group dimension as a type 0 dimension, based on the policy dimension. We cut away all the attributes, leaving just the policy number, like this:

Fact_Claim table: (same as before)
1178|20100518|298|32839|88|16000|1200|14800|500|14300
3122|20180725|397|45578|88|32000|4000|28000|1000|27000
3123|20181108|267|45642|89|51000|3000|48000|1500|46500

Dim_Driver_Group:
driver_group_key|policy_number|inserted_date
88|NK98402|2015-08-07 04:12:05
89|NK84826|2014-10-26 04:14:09

Bridge_Driver_Group: (same as before)
driver_group_key|driver_key|driver_type|weight
88|188|single driver|1.000000
89|302|main driver|0.666666
89|304|second driver|0.333333

Dim_Driver: (same as before)
driver_key|driver_name|date_of_birth|driving_licence_number
188|Chloe Dunn|1980-08-08|3824920
302|Peter Keller|1971-06-05|4503532
304|Samantha Keller|1970-12-07|4507651

Note that Dim_Driver_Group is type 1 so it doesn’t have effective_date, expiry_date or active_flag columns. Instead it only has inserted_date column because it is type 0. Type 0 means that it is fixed. Once a row is inserted it will never get changed or updated.

Tracking Changes in the Bridge Table

What if Peter and Samantha Keller add their 18 year old daugther (Karina) to the policy, and now the number of insured driver in their policy becomes three?

Fact_Claim table: (same as before)
1178|20100518|298|32839|88|16000|1200|14800|500|14300
3122|20180725|397|45578|88|32000|4000|28000|1000|27000
3123|20181108|267|45642|89|51000|3000|48000|1500|46500

Dim_Driver_Group: (same as before)
driver_group_key|policy_number|inserted_date
88|NK98402|2015-08-07 04:12:05
89|NK84826|2014-10-26 04:14:09

Bridge_Driver_Group:
driver_group_key|driver_key|driver_type|weight|start_date|end_date|active_flag
88|188|single driver|1.000000|1900-01-01|9999-12-31|Y
89|302|main driver|0.666666|1900-01-01|2019-03-22|N
89|304|second driver|0.333333|1900-01-01|2019-03-22|N
89|302|main driver|0.500000|2019-03-22|9999-12-31|Y
89|304|second driver|0.250000|2019-03-22|9999-12-31|Y
89|375|third driver|0.250000|2019-03-22|9999-12-31|Y

Dim_Driver: (same as before)
driver_key|driver_name|date_of_birth|driving_licence_number
188|Chloe Dunn|1980-08-08|3824920
302|Peter Keller|1971-06-05|4503532
304|Samantha Keller|1970-12-07|4507651
375|Karina Keller|2001-04-15|9302583

Notice that Karina does not contribute to her parents’ claim on 18th Nov 2018, because she was put on the policy today (22nd March 2019). But Karina is responsible for any claim happen after today. It is quite complex to calculate the claim amount attributed to each driver, so I would recommend calculating it overnight and putting the output in a separate fact table, or an output table. If the business needs to browse this data (drilling up and down, slicing and dicing) then put it in a fact table. If this “claim per driver” data is required for producing a report, then put it in an output table.