Data Warehousing, BI and Data Science

27 November 2020

Python: List, Tuple, Dictionary, Set

Filed under: Data Warehousing — Vincent Rainardi @ 4:54 am

Use append function to add an element.
Use pop and remove functions to remove an element.
Operator: *, +, len, max, sorted

a = ['A','B',1,2]
print(a)
Output: ['A', 'B', 1, 2]

a.pop()
print(a)
Output: ['A', 'B', 1]

a.append('C')
print(a)
Output: ['A', 'B', 1, 'C']

a.remove('B')
print(a)
Output: ['A', 1, 'C']

print(a*2)
Output: ['A', 1, 'C', 'A', 1, 'C']

print(a+a)
Output: ['A', 1, 'C', 'A', 1, 'C']

a = [2,3,5,1]
print(len(a), max(a), sorted(a), a[0:3], a.index(5))
Output: 4 5 [1, 2, 3, 5] [2, 3, 5] 2

b = [[1,2],[3,4],[5,6]]
print(b[1], '|', b[0][1])
Output: [3, 4] | 2

a[n][m] means list n, element m
a[n:m] means list n to list m-1

a = [[1,2,3],[4,5,6],[7,8,9]]
print(a[0][0])
Output: 1

print(a[:])
Output: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

print(a[0:1]) 
Output: [[1, 2, 3]]

print(a[0:2]) 
Output: [[1, 2, 3], [4, 5, 6]]

print(a[1:3]) 
Output: [[4, 5, 6], [7, 8, 9]]

a = [1,2]
b = [3,4]
a.extend(b)
print(a)
Output: [1, 2, 3, 4]

a.insert(3,1)
print(a, a.count(1))
Output: [1, 2, 3, 1, 4] 2

Tuple is like a list but once created it can’t be changed.

a = ("A", 1, "B", 2)
print(a, a[1])
Output: ('A', 1, 'B', 2) 1

b = list(a)
print(b)
Output: ['A', 1, 'B', 2]

c = tuple(b)
print(c)
Output: ('A', 1, 'B', 2)

d = ([1,2],"A")
print(d)
Output: ([1, 2], 'A')

Dictionary uses labels instead of index.
To get the value of a key which may not exist, use get.
To add a key value pair, use =.
We can convert a dictionary to a list. Use values() to get the values, keys() to get the keys.
We can use dict to create a dictionary.
Use update to add a key value pair and del to remove a key value pair.

a = {'Name': 'Harry', 'Age': 11}
print(a, a['Name'])
Output: {'Name': 'Harry', 'Age': 11} Harry

Name = a.get('Name')
City = a.get('City','N/A')
print(Name, City)
Output: Harry N/A

a['City'] = 'London'
print(a)
Output: {'Name': 'Harry', 'Age': 11, 'City': 'London'}

print(list(a.values()), list(a.keys()), 'Harry' in list(a.values()))
Output: ['Harry', 11, 'London'] ['Name', 'Age', 'City'] True

a = dict(Name = 'Harry', Age = 11)
a.update({'City':'London'})
print(a, len(a))
Output: {'Name': 'Harry', 'Age': 11, 'City': 'London'} 3

del (a['City'])
print(a, len(a))
Output: {'Name': 'Harry', 'Age': 11} 2

a = {1: ['Harry',11], 2:['Ron',10]}
print(a)
Output: {1: ['Harry', 11], 2: ['Ron', 10]}

Using set we can do Venn diagram operations such as union, intersection and difference

a = [1,2,2,3,1]
b = set(a)
c = {3,4}
print(a,b,c)
Output: [1, 2, 2, 3, 1] {1, 2, 3} {3, 4}

print(b.union(c), b.intersection(c), b.difference(c), b.symmetric_difference(c))
Output: {1, 2, 3, 4} {3} {1, 2} {1, 2, 4}

a = [1,2,3,4]
print(a[-1])
Output: 4

S = "I love Python"
print(S[2:6], S[-11:-7])
Output: love love

L = [1,2,3]
print(L*2)
Output: [1, 2, 3, 1, 2, 3]

L = [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]]
print(L[2:])
Output: [[7, 8, 9, 10]]

C = [2, 5, 9, 12, 13, 15, 16, 17, 18, 19]
F = [2, 4, 5, 6, 7, 9, 13, 16]
H = [1, 2, 5, 9, 10, 11, 12, 13, 15]
print(sorted(set(C) & set(F) & set(H)))
Output: [2, 5, 9, 13]

print(sorted(set(C) & set(F) - set(H)))
Output: [16]

d = C + F + H
print(d)
Output: [2, 5, 9, 12, 13, 15, 16, 17, 18, 19, 2, 4, 5, 6, 7, 9, 13, 16, 1, 2, 5, 9, 10, 11, 12, 13, 15]

L = []
for i in d:
    if d.count(i) == 2: L.append(i)
print(sorted(list(set(L))))
Output: [12, 15, 16]

L = []
for i in range(1,21):
    if d.count(i) == 0: L.append(i)
print(sorted(list(set(L))))
Output: [3, 8, 14, 20]

Python: String and Array

Filed under: Data Warehousing — Vincent Rainardi @ 4:23 am

To remove spaces from the left or right, use strip, lstrip and rstrip functions.

Example: a = " Test%%"
print(a.lstrip().rstrip("%"), a.strip("%").strip())
Output: Test Test

To remove all spaces (including in the middle) use the replace function.

Example: b = " %Te st %% A"
print(b.replace(" ","").replace("%",""))
Output: TestA

To split string use the split function.

Example: c = "A,B,C"
d = c.split(",")
print(d, d[1])
Output: ['A', 'B', 'C'] B

To join strings use the join function.

Example: e = " & ".join(d)
print(e)"
Output: A & B & C

To declare and initialise 3 variables at the same time use commas:

Example: a,b,c = "A", "B", "C"

To get a portion of the string use [a:b:c], which means from position a to position b, skipping c characters (a or b or c can be omitted).
Note: including b (b is included).
A negative index means counting from the right. The right most character is -1.

Example: a = "0123456789"
print(a[0:4], a[1:7:2], a[-2])
Output: 0123 135 8

Example: print(a[:5], a[:], a[::3])
Output: 01234 0123456789 0369

Example: a = "I love Python programming"
print(a[7:13], a[-18:-12], len(a))
Output: Python Python 25

Arithmetic operators are +, -, /, %, **, //
(the last 3 are modulo, power, floor division)

Example: a,b = 5,2
print(a%b, a**b, a//b)
Output: 1 25 2

Comparison operators are >, <, ==, !=

Example: print(a>b, a<b, a==b, a!=b)
Output: True False False True

The print function accepts parameters.

Example: a,b = "A","B"
print("{0} test {1}".format(a,b))
Output: A test B

To concatenate 2 strings use the + operator.

Example: print(a + " & " + b, "Line1\nLine2")
Output: A & B Line1
Line2

To print a double quote, enclose the string with a single quote.
Or escape it with a backslash.

Example: print('a"b', "a\"b")
Output: a"b  a"b

To print a single quote, inclose the string with a double quote.
Or escape it with a backslash.

Example: print("a'b", 'a\'b')
Output: a'b   a'b

To change line use \n.
If we use r (means raw string), \n doesn’t have an affect.

Example: print(r"a\n")
Output:  a\n

Set means distinct members.

Example: a = set('aba')
print(a)
Output: {'a', 'b'}

Set operations are -,|,&,^.

  • a-b: in a but not in b.
  • a|b: in a or b or both.
  • a&b: in both a and b.
  • a^b: in a or b but not both.
Example: a,b = set('ab'), set('ac')
print(a-b, a|b, a&b, a^b)
Output: {'b'} {'a', 'b', 'c'} {'a'} {'b', 'c'}

8 November 2020

Day Measures

Filed under: Data Warehousing — Vincent Rainardi @ 3:04 am

Sometimes the measures we want to store is the number of days. For example:

  • the number of days from when we received an order until we shipped it
  • the number of days from when a booking was made until the engineer visited
  • the number of days from today until a bond matures
  • the number of days since the policy was incepted (began) until today

I call these measures Day Measures.
When a day measure is “from today” or “until today”, then the measure changes everyday.
For example: the maturity of a bond is the number of years from today until a bond matures. If the bond matures in 31st December 2030, and today is 9th Nov 2020, then the maturity is 10.1451 years (10 + 52/365.25).
Tomorrow it is 10.1396. The maturity of a bond is important because the bigger the maturity, the higher the risk that the bond value can be impacted by the interest rate.

Days to shipment (the number of days from when the order is received until shipment) does not change if it was a last year order. But if the order was placed yesterday and it usually takes about 3 weeks for us to fulfill an order, then in the next 3 weeks the “days to shipment” changes every day.

If we have a daily periodic snapshot fact table, and we have a day measure in this fact table, then we need to calculate this measures every day.

The question is: is that right?
Couldn’t we just store the maturity date of the bond, rather than calculating the Maturity daily?
Couldn’t we just store the shipment date and the order date, rather than calculating “days to shipment” daily?

We can and we should. But “in addition to” not “instead of”.
It would be silly to store just the date because everytime we want to use it we need to calculate it. The point of having a data warehouse is not to save storage, but to make it easy to use. Easy to query.
When we calculate the Maturity or Days To Shipment, we don’t need to calculate anything. It is there ready for us to use.

So yes it is worth calculating Days Measures every day, and store them in the daily periodic snapshot fact table.

Blog at WordPress.com.