Data Warehousing and Data Science

29 March 2021

Plot

Filed under: Python — Vincent Rainardi @ 6:41 am
# 2x3 box + bar plots for categorical variables:
plt.figure(figsize=(18,8))
plt.subplot(2,3,1)
sns.boxplot(data=df, x="season", y="count")
plt.subplot(2,3,2)
sns.boxplot(data=df, x="month", y="count")
plt.subplot(2,3,3)
sns.boxplot(data=df, x="weekday", y="count")
plt.subplot(2,3,4)
sns.countplot(df["season"])
plt.subplot(2,3,5)
sns.countplot(df["month"])
plt.subplot(2,3,6)
sns.countplot(df["weekday"])
plt.show()
# 1x2 bar plots, one without confidence interval line:
plt.figure(figsize=(10,3))
plt.subplot(1,2,1)
sns.barplot(data=df, x="month", y="count", hue="year")
plt.subplot(1,2,2)
sns.barplot(data=df, x="weekday", y="count", hue="year", ci=None)
plt.show()
# 3x3 pair plot of continuous variables 
plt.figure(figsize=(4,4))
sns.pairplot(df[["count", "temp", "windspeed"]])
plt.show()
# View the heat map of correlations between variables
plt.figure(figsize=(12,7))
correlation_data = df.corr()
correlation_data = round((correlation_data*100),0).astype(int)
sns.heatmap(correlation_data, annot=True, fmt="d", cmap="YlGnBu")
plt.show()
# Setting box plot title, x and y axis labels, range and format
ax = sns.boxplot(data=df, width=0.5, palette="Blues")
ax.set_title('Plot Title')
ax.set_xlabel("Label for x axis");
ax.set_ylabel("Label for y axis"); 
ax.set_yticklabels(['${:,}m'.format(int(x/1000000)) for x in ax.get_yticks().tolist()]);
ax.grid(which='major', linestyle=':', linewidth='1', color='grey');
for item in ([ax.xaxis.label, ax.yaxis.label]):
    item.set_fontsize(14);
    item.set_color("blue"); #set the x and y label to blue
    
# Draw 2 red lines showing the preferred range
ax.axhline(5000000, ls='-', linewidth='1', color='red')
ax.axhline(15000000, ls='-', linewidth='1', color='red')
ax.set_ylim(0, 80000000); #set the upper limit of y axis

Pandas

Filed under: Python — Vincent Rainardi @ 6:29 am
#Rename columns: 
df = df.rename(columns={"col1": "column1", "col2": "column2"}

#Drop columns:
df.drop(columns = ["column1", "column2"], axis=1, inplace=True)

#Unique values in each column:
df.nunique()

#Number of missing values in each column:
df.isnull().sum()

#Convert a string column to date:
df["date_column"] = pd.to_datetime(df["string_column"], dayfirst = True)

#Get the day element from a date column:
df["day"] = df["date_column"].dt.day

#Get the weekday name (e.g. Monday, Tuesday) from a date column:
df["weekday_name"] = df["date_column"].dt.day_name(locale='English')

#Group by 2 columns and get the row count:
df.groupby(["column1","column2"]).size()

#Combine 2 data frames:
df = pd.concat([df1, df2], axis=1)

2 December 2020

Python: Function, Lambda, Map, Filter, Reduce

Filed under: Python — Vincent Rainardi @ 6:43 am

A function with 2 arguments:

def f(a,b):
    c = a * b
    return c
print(f(2,3))
Output: 6

Recursive function:

def factorial(n):
    if n>1: return n*factorial(n-1)
    else: return n
print(factorial(3))
Output: 6

Star argument means I don’t know how many arguments. The argument is stored as a tuple.

def total(*args):
    return(sum(args))
print(total(1,2,3))
Output: 6
def proper_case(a):
    return " ".join([word[0].upper() + word[1:] for word in a.split()])
print(proper_case("going to town"))
Output: Going To Town

Lambda is a shortcut to create a function on the fly.

f = lambda a,b: a*b
print(f(2,3))
Output: 6

Use map to pair a function to a list (to run the function to every element in the list)
Use filter to apply a condition to a list, to look for elements which satisfy that condition.
Use reduce to apply a function to a pair of values each time, repeatedly (reduce is in functools library)

# Using map and lambda to find words beginning with a
L= ['Apple', 'Andy', 'Banana', 'Ben']
list(map(lambda x: 1 if x[0].lower() == 'a' else 0, L))
Output: [1, 1, 0, 0]
# Using map and lambda to produce cube numbers
input_list = [1,2,3]
list(map(lambda x: x**3, input_list))
Output: [1, 8, 27]
# Using map to pair 2 lists
def v_add(x,y): return(x+y)
list1 = [1,2,3] #Argument1 is a list
list2 = (4,5,6) #Argument2 is a tuple
print(list(map(v_add, list1, list2))) #We pass 2 arguments to map
Output: [5, 7, 9]
# Using map to pair 2 lists
L1 = ['P','O']
L2 = ['X','Y']
list(map(lambda x,y: x + ' ' + y, L1, L2))
Output: ['P X', 'O Y']
# Using filter to find the even numbers 
f = lambda x:  x%2 == 0
L = [1,2,3,4,5,6]
list(filter(f, L))
Output: [2, 4, 6]

# Now using lambda
list(filter(lambda x: x % 2 == 0, L))
Output: [2, 4, 6]
# Using filter to find the words that starts with a and ends with y
L = ['Apple', 'Andy', 'Banana', 'Ben']
list(filter(lambda x: x[0].lower()=='a' and x[-1].lower()=='y', L))
Output: ['Andy']
# Using reduce to sum the input
from functools import reduce
def v_add(x,y): return(x+y) 
reduce(v_add, range(1,4))
Output: 6

# Now using lambda
reduce(lambda x, y: x+y, range(1,4))
Output: 6
# Using reduce to find the largest number
L = [22,45,32,20,87,94,30]
def v_max(x,y):
    if x>y: return x
    else: return y
#or v_max = lambda x,y: x if x>y else y
reduce(v_max,L)
# Using reduce to concatenate letters
L = ['A','B','C']
v_concat = lambda x,y: x+y
reduce(v_concat, L)
Output: 'ABC'
# Using reduce to calculate 1 x 2 x 3 x 4 x ...
def f(x,y): return(x*y)
n = 4
L = list(range(1,n+1))
print(1 if n == 0 else reduce(f,L))
Output: 24

1 December 2020

Python: List comprehension

Filed under: Python — Vincent Rainardi @ 7:05 am

List comprehension means using for loop to create a list :

L = [i*2 for i in range(1,5)] #5 not included
print(L)
Output: [2,4,6,8]

We can use 2 for loops to get the words in a paragraph :

paragraph = ["This is sentence one." , 'This is sentence two.']
result  = [word for sentence in paragraph for word in sentence.split()]
print(result)
Output:
['This', 'is', 'sentence', 'one', 'This', 'is', 'sentence', 'two']

We can use 2 for loops and an if to get words beginning with certain letters :
(the example below is using the paragraph defined above)

letters = ['s','o']
result = [word for sentence in paragraph for word in sentence.split() if word[0].lower() in letters]
print(result)
Output:
['sentence', 'one.', 'sentence']

We can use an if with any conditions we like :

numbers = [100,200,300,400]
result = [n for n in numbers if n > 200]
print(result)
Output:
[300,400]

We can make a list using double for in i and j format like this:

product_list = [ i*j for i in range(1,3) for j in range(10,20,5) ] #3 and 20 are not included
print(product_list)
Output:
[10, 15, 20, 30]

We can make a dictionary too using for :

result = {i:i*3 for i in range(1,7,2)}
print(result)
Output:
{1: 3, 3: 9, 5: 15}

We can use for to access the key in a dictionary.
In this example I try to get the best Hogwart pupils in a dictionary (highest remarks):

Hogwarts = {1:['Harry',80] , 2:['Ron',70], 3:['Hermione',90], 4:['Neville',60], 5:['Seamus',50]}
best_pupils = {key:value[0] for key,value in students_data.items() if value[1] >= 70}
print(best_pupils)
Output:
{1: 'Harry', 2: 'Ron', 3: 'Hermione'}

Various examples of creating lists and dictionary using for :

#times 2 if i is divisible by 3, otherwise plus 2
result = [i*2 if i%3==0 else i+2 for i in range(1,7)]
print(result)
Output: 
[3, 4, 6, 6, 7, 12]

#create a dictionary containing cube numbers from 1 to 4
n = 4
result = {i:i**3 for i in range(1,n+1)}
print(result)
Output: 
{1: 1, 2: 8, 3: 27, 4: 64}

#A list containing combinations of letters from 2 words
result = [i+j for i in "po" for j in "he"]
print(result)
Output: 
['ph', 'pe', 'oh', 'oe']

#Create a dictionary from a word. The key is in upper case, the value is double letter.
result = {x.upper(): x*2 for x in 'potter'}
print(result)
Output: 
{'P': 'pp', 'O': 'oo', 'T': 'tt', 'E': 'ee', 'R': 'rr'}

Python: If and For

Filed under: Python — Vincent Rainardi @ 6:11 am

if-elif-else ends with : and the next line must be indented (can be on the same line though).
Equal is ==, non equal is !=.
Use and, or to combine conditions.

a = 1
if a == 0: print("A") 
elif a != 1: print("B")
elif 4 > a == 8: print("C")
else: print("D")
Output: D
a,b = 5,6
if a==5 and b==7: print("A") 
else: print("B")
if a==5 or b==6: print("A")
else: print("B")
Output:
B
A

for can be used with rangestringlisttupledictionaries or enumerate.
range(a,b,c) means from a to b step c but does not include b.

for i in range(1,5,2): print(i)
Output:
1
3
for i in reversed(range(1,5,2)): print(i)
Output:
3
1

in can be used with a list (without range) like this:

for i in "abc": print(i)
for i in [1,4,2]: print(i)
for i in (1,4,2): print(i)

in can be used with a dictionary like this:

for i,j in {"name":"Harry","age":11}.items(): print(i,j)
Output:
name Harry
age 11

D = {1:["Harry", 11], 2:["Ron", 11], 3:["Hermione", 10]}
for i,j in D.items(): print(i,j)
Output
1 ["Harry", 11]
2 ["Ron", 11]
3 ["Hermione", 10]

for(i,j) can use with enumerate to get the counter in i, like this:

for i,j in enumerate("abc"): print(j, i)
Output:
a 0
b 1
c 2

In for loop we can use break, continue, pass :

a = ''
b = 'abcdef'
for i in range(0,6):
  a += b[i]
  if i == 2: break #pass or continue
print(a)
Output: abc

Use zip to loop around 2 lists together :

list1,list2 = [1,2,3],[10,20,30]
for i,j in zip(list1,list2): print('{0}  {1}'.format(i,j))
Output:
1  10
2  20
3  30

We can use while to make a loop too :

i = 0
while i <= 3:
    print(i)
    i += 1
Output:
0
1
2
3

Use add to append to the range used in the for loop.
Note: add doesn’t output anything.

a = {1, 2}
for i in range(3,5): #5 is not included
    print(a.add(i)) #add doesn't output anything
print(a)
Output:
None
None
{1, 2, 3, 4}

Combining for, if, modulo and pass:

for i in range(1,4): #4 is not included
    if (i % 2 == 0): #modulo means "the reminder"
        print(str(i) + " is divisible by 2")
        pass #after printing 2, continue with 2 (stay in the loop)
    print(str(i) + " is not divisible by 2")
Output:
1 is not divisible by 2
2 is divisible by 2
2 is not divisible by 2
3 is not divisible by 2

27 November 2020

Python: List, Tuple, Dictionary, Set

Filed under: Python — Vincent Rainardi @ 4:54 am

List: Use append function to add an element.
Use pop and remove functions to remove an element.

a = ['A','B',1,2]
print(a)
Output: ['A', 'B', 1, 2]

a.pop()
print(a)
Output: ['A', 'B', 1]

a.append('C')
print(a)
Output: ['A', 'B', 1, 'C']

a.remove('B')
print(a)
Output: ['A', 1, 'C']

List operators: * and +
Functions: len, max, sorted, index

print(a*2)
Output: ['A', 1, 'C', 'A', 1, 'C']

print(a+a)
Output: ['A', 1, 'C', 'A', 1, 'C']

a = [2,3,5,1]
print(len(a), max(a), sorted(a), a[0:3], a.index(5))
Output: 4 5 [1, 2, 3, 5] [2, 3, 5] 2

b = [[1,2],[3,4],[5,6]]
print(b[1], '|', b[0][1])
Output: [3, 4] | 2

a[n][m] means list n, element m
a[n:m] means list n to list m-1

a = [[1,2,3],[4,5,6],[7,8,9]]
print(a[0][0])
Output: 1

print(a[:])
Output: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

print(a[0:1]) 
Output: [[1, 2, 3]]

print(a[0:2]) 
Output: [[1, 2, 3], [4, 5, 6]]

print(a[1:3]) 
Output: [[4, 5, 6], [7, 8, 9]]

a = [1,2]
b = [3,4]
a.extend(b)
print(a)
Output: [1, 2, 3, 4]

a.insert(3,1)
print(a, a.count(1))
Output: [1, 2, 3, 1, 4] 2

Tuple is like a list but once created it can’t be changed.

a = ("A", 1, "B", 2)
print(a, a[1])
Output: ('A', 1, 'B', 2) 1

b = list(a)
print(b)
Output: ['A', 1, 'B', 2]

c = tuple(b)
print(c)
Output: ('A', 1, 'B', 2)

d = ([1,2],"A")
print(d)
Output: ([1, 2], 'A')

Dictionary uses labels instead of index.
To get the value of a key which may not exist, use get.
To add a key value pair, use =.
We can convert a dictionary to a list. Use values() to get the values, keys() to get the keys.
We can use dict to create a dictionary.
Use update to add a key value pair and del to remove a key value pair.

a = {'Name': 'Harry', 'Age': 11}
print(a, a['Name'])
Output: {'Name': 'Harry', 'Age': 11} Harry

Name = a.get('Name')
City = a.get('City','N/A')
print(Name, City)
Output: Harry N/A

a['City'] = 'London'
print(a)
Output: {'Name': 'Harry', 'Age': 11, 'City': 'London'}

print(list(a.values()), list(a.keys()), 'Harry' in list(a.values()))
Output: ['Harry', 11, 'London'] ['Name', 'Age', 'City'] True

a = dict(Name = 'Harry', Age = 11)
a.update({'City':'London'})
print(a, len(a))
Output: {'Name': 'Harry', 'Age': 11, 'City': 'London'} 3

del (a['City'])
print(a, len(a))
Output: {'Name': 'Harry', 'Age': 11} 2

a = {1: ['Harry',11], 2:['Ron',10]}
print(a)
Output: {1: ['Harry', 11], 2: ['Ron', 10]}

Using set we can do Venn diagram operations such as union, intersection and difference

a = [1,2,2,3,1]
b = set(a)
c = {3,4}
print(a,b,c)
Output: [1, 2, 2, 3, 1] {1, 2, 3} {3, 4}

print(b.union(c), b.intersection(c), b.difference(c), b.symmetric_difference(c))
Output: {1, 2, 3, 4} {3} {1, 2} {1, 2, 4}

a = [1,2,3,4]
print(a[-1])
Output: 4

S = "I love Python"
print(S[2:6], S[-11:-7])
Output: love love

L = [1,2,3]
print(L*2)
Output: [1, 2, 3, 1, 2, 3]

L = [[1, 2, 3], [4, 5, 6], [7, 8, 9, 10]]
print(L[2:])
Output: [[7, 8, 9, 10]]

C = [2, 5, 9, 12, 13, 15, 16, 17, 18, 19]
F = [2, 4, 5, 6, 7, 9, 13, 16]
H = [1, 2, 5, 9, 10, 11, 12, 13, 15]
print(sorted(set(C) & set(F) & set(H)))
Output: [2, 5, 9, 13]

print(sorted(set(C) & set(F) - set(H)))
Output: [16]

d = C + F + H
print(d)
Output: [2, 5, 9, 12, 13, 15, 16, 17, 18, 19, 2, 4, 5, 6, 7, 9, 13, 16, 1, 2, 5, 9, 10, 11, 12, 13, 15]

L = []
for i in d:
    if d.count(i) == 2: L.append(i)
print(sorted(list(set(L))))
Output: [12, 15, 16]

L = []
for i in range(1,21):
    if d.count(i) == 0: L.append(i)
print(sorted(list(set(L))))
Output: [3, 8, 14, 20]

Python: String and Array

Filed under: Python — Vincent Rainardi @ 4:23 am

To remove spaces from the left or right, use strip, lstrip and rstrip functions.

Example: a = " Test%%"
print(a.lstrip().rstrip("%"), a.strip("%").strip())
Output: Test Test

To remove all spaces (including in the middle) use the replace function.

Example: b = " %Te st %% A"
print(b.replace(" ","").replace("%",""))
Output: TestA

To split string use the split function.

Example: c = "A,B,C"
d = c.split(",")
print(d, d[1])
Output: ['A', 'B', 'C'] B

To join strings use the join function.

Example: e = " & ".join(d)
print(e)"
Output: A & B & C

To declare and initialise 3 variables at the same time use commas:

Example: a,b,c = "A", "B", "C"

To get a portion of the string use [a:b:c], which means from position a to position b, skipping c characters (a or b or c can be omitted).
Note: including b (b is included).
A negative index means counting from the right. The right most character is -1.

Example: a = "0123456789"
print(a[0:4], a[1:7:2], a[-2])
Output: 0123 135 8

Example: print(a[:5], a[:], a[::3])
Output: 01234 0123456789 0369

Example: a = "I love Python programming"
print(a[7:13], a[-18:-12], len(a))
Output: Python Python 25

Arithmetic operators are +, -, /, %, **, //
(the last 3 are modulo, power, floor division)

Example: a,b = 5,2
print(a%b, a**b, a//b)
Output: 1 25 2

Comparison operators are >, <, ==, !=

Example: print(a>b, a<b, a==b, a!=b)
Output: True False False True

The print function accepts parameters.

Example: a,b = "A","B"
print("{0} test {1}".format(a,b))
Output: A test B

To concatenate 2 strings use the + operator.

Example: print(a + " & " + b, "Line1\nLine2")
Output: A & B Line1
Line2

To print a double quote, enclose the string with a single quote.
Or escape it with a backslash.

Example: print('a"b', "a\"b")
Output: a"b  a"b

To print a single quote, inclose the string with a double quote.
Or escape it with a backslash.

Example: print("a'b", 'a\'b')
Output: a'b   a'b

To change line use \n.
If we use r (means raw string), \n doesn’t have an affect.

Example: print(r"a\n")
Output:  a\n

Set means distinct members.

Example: a = set('aba')
print(a)
Output: {'a', 'b'}

Set operations are -,|,&,^.

  • a-b: in a but not in b.
  • a|b: in a or b or both.
  • a&b: in both a and b.
  • a^b: in a or b but not both.
Example: a,b = set('ab'), set('ac')
print(a-b, a|b, a&b, a^b)
Output: {'b'} {'a', 'b', 'c'} {'a'} {'b', 'c'}

10 September 2020

Turtle

Filed under: Python — Vincent Rainardi @ 5:07 am

My son asked me to show graphics in programming and I thought it would be a good idea to show Turtle. In Python of course, as it is the most popular programming language, serving all kinds of communities not just AI. Turtle is a simple way to draw lines and circles. And it’s already built in Python, i.e. you don’t need to install anything.

To start with just type “import turtle”. This is to import the turtle library into python environment.

Then type “t = turtle.Turtle()”. This is to create a variable t so that we don’t have to type turtle.Turtle many many times. It opens a window like this with an arrow facing to the right. That arrow is the cursor. That window is our canvas, where we make our drawing.

To draw a line, type “t.forward(50)”. This is to move the arrow 50 pixel forward (to the right).

To change the same of the cursor, type “t.shape(“turtle”)”:

To draw a circle, type “t.circle(50)”:

To turn the turtle to the right 90 degrees, type “t.right(90)”:

And you can do “t.forward(50)” to draw a line again:

To lift the pen up so it won’t make any line, type “t.penup()”. Now we can move the turtle without drawing anything, for example: “t.setposition(0,-50)” like this:

(0,0) is where we began. It’s x,y so 0,-50 means the x is 0 and the y is -50 (50 down).

50,0 means x = 50 and y = 0, like below:

Remember that our pen is still up at the moment. To start drawing again, let’s put the pen down by typing “t.pendown()”. We can draw a circle again: “t.circle(25)”, like below left:

To undo what we did last, type “t.undo()”, like above right.

To fill with colour, type this:

  • t.fillcolor(“green”)
  • t.begin_fill()
  • t.circle(25)
  • t.end_fill()

And finally, to clear the screen, type: “t.clear()”.

As an example, to make a square, it is like below. We use a “for” loop.

for _ in range(4): {shift-enter}
  t.forward(25) {shift-enter}
  t.right(90) {enter 2x}

Another example: a filled hexagon.

t.fillcolor("green") {enter}
t.begin_fill(){enter}
for _ in range(6): {shift-enter}
   t.forward(25) {shift-enter}
   t.right(60) {enter 2x}
t.end_fill(){enter}

Have fun!

Blog at WordPress.com.