Data Warehousing, BI and Data Science

22 March 2011

Linking 2 Fact Tables

Filed under: Data Warehousing — Vincent Rainardi @ 7:05 pm
Tags:

It’s a weather forecast. You have “measurement sets” consisting of area, temperature, humidity, wind directions, etc every 5 mins. Each area has different set. For example if A=Area, T=temp, H=humidity, WD=Wind Direction, WS=Wind Speed, the forecast for Area1 is like this.

Input:
13:05 A1, Tmin1, Tmax1, Hmin1, Hmax1, WD1, WS1
13:10 A1, Tmin2, Tmax2, Hmin2, Hmax2, WD2, WS2
13:15 A1, Tmin3, Tmax3, Hmin3, Hmax3, WD3, WS3
13:20 etc.

Then you take a group of measurements in the last N days over certain area (say area 1 to 100) and make a forecast for a certain day in the future. Example: take the last 7 days for area 1 to 100, and forecast for the next 7 days. The forecast is for a specific area (the grain is “day” or “hour”), and the input is from the “close by” areas. The forecast contain 3 variables: temp (min & max), cloudiness, humidity, wind direction, wind speed.

Output (Forecast):
Day1 A1, Tmin1, Tmax1, H1, WD1, WS1
Day2 A2, Tmin2, Tmax2, H2, WD2, WS2
Day3 A3, Tmin3, Tmax3, H3, WD3, WS3
Day4 etc.

So the FactMeasurement (input) is like this:

SetKey, DateKey, TimeKey, AreaKey, MinTemp, MaxTemp, MinHum, MaxHum, WinDir, WinSpd

And the FactForecast (output) is like this:

ForecastSetKey, DateKey, TimeKey, AreaKey, MinTemp, MaxTemp, Hum, WinDir, WinSpd

Now, the question is, where would you record the fact that Forecast Set 38 (which is say 200 rows) is predicted from row 3100 to 3800 in Measurement fact table?

This is where the FactKey comes in. If we have FactKey on both fact tables:

FactMeasurement: FactKey, DateKey, TimeKey, AreaKey, MinTemp, MaxTemp, MinHum, MaxHum, WinDir, WinSpd

FactForecast: FactKey, DateKey, TimeKey, AreaKey, MinTemp, MaxTemp, Hum, WinDir, WinSpd

Then we create a bridge table:

BridgeKey, MeasurementFactKey, ForecastFactKey

In some lucky cases, we don’t need to create a bridge table, because the relationship between the input and the output is 1 to 1, i.e. each output row is determined from 1 input row. In this case we can just include the Input fact key on the output fact table. But this is very rare. The majority of the case is many to many.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: