Decision Tree is one of the most popular algorithms in machine learning. It is relatively simple, yet able to produce good accuracy. But the main reason it is widely used is the interpretability. We can see how it works quite clearly. We can understand how it works.

Decision Tree is a supervised machine learning algorithm. So we train the model using a dataset, in order for it to learn. Then we can use it to predict the output. It can be used for both regression and classification. Regression is when the output is a number. Classification is when the output is a category.

In this article I would like to focus on Entropy and Information Gain, using investment funds as an example. Entropy is the level of disorder in the data.

__Entropy__

In thermodynamics, Entropy is the level of disorder or randomness in the system. Similary in data analytics, entropy is the level of disorder or randomness in the data. If we have 100 numbers and all of them is 5, then the data is in very good order. The level of disorder is zero. The randomness is zero. There is no randomness in the data. Everywhere you look you get 5. The entropy is zero.

If these 100 numbers contain different numbers, then the data is in a disorder state. The level or randomness is high. When you get a number, you might get number 4, or you might get number 7, or any other number. You don’t know what you are going to get. The data is “completely” random. The level of randomness in the data is very high. The entropy in data is very high.

The distribution of these different numbers in the data determine the entropy. If there are 4 possible numbers and they are distributed 25% each, then the entropy is very high. But if they are distributed 99%, 1%, 1%, 1% then the entropy is very low. And if it’s 70%, 10%, 10%, 10% the entropy is somewhere in between (medium).

The maximum value for entrophy is 1. The minimum value for entrophy is 0.

__Information Gain__

Now that we have a rough idea of what entropy is, let’s try to understand Information Gain.

A Decision Tree consists of many levels. In the picture below it consists of 2 levels. Level 1 consists of node A. Level 2 consists of node B and node C.

Information Gain is the decrease in entropy from one level to the next. Node B has entrophy = 0.85, a decrease of 0.1 from Node A’s entrophy which is 0.95. So Node B has information gain of 0.1. Similarly, Node C has information gain of 0.95 – 0.75 = 0.2.

When the entropy goes down from 0.95 to 0.75, why do we say that the amount of information is more (gaining)? Higher entrophy means the data is more uniform, lower entropy means the data is more distributed or varied. That’s why there is more information in the data, because the data is more varied. That’s why when the entropy decreases the amount of information is higher. We have “additonal” information. That is Information Gain.

__Calculating Entropy__

Now we know what Entropy is, and what Information Gain is. Let us now calculate the entropy.

First let’s find the formula for entropy. In thermodynamics, entropy is the logarithmic measure of the number of states

Entropy is the average of information content (link). The information content of an event E1 is the log of 1/(the probability of E1). The information content is called I. So I1 = log of (1/p(E1)).

If we have another event (E2), the information content is: I2 = log of (1/p(E2)).

The average of the information content I1 and I2 (or the entropy) is:

the sum of (information content for each event x the probability that event occuring)

= I1 x p(E1) + I2 x p(E2)

= log of (1/p(E1)) x p(E1) + log of (1/p(E2)) x p(E2)

= –log of p(E1) x p(E1) –log of p(E2) x p(E2)

If we have i events, the entropy is:

= -sum of (p(Ei) x log of p(Ei))

__Fund Price__

Now that we know how to caculate entropy, let us try to calculate the entropy of probability of the price of a fund going up in the next 1 year.

In the above table, the last column is the price of a fund 1 year from now, which can be higher or lower than today. This is denoted with “Up” or “Down”. This price is determined from 4 factors or features:

- The performance of the fund in the last 3 years (annualised, gross of fees).

This past performance is divided into 3 buckets: Down (less than zero, “Up between 0 and 2%”, and “Up more than 2%”. - The interest rate, for example LIBOR GBP 1 Year today.

This today interest rate is compared with the interest 1 year go, and divided into 3 buckets: today it’s higher than 1 year ago, lower than 1 year ago, or the same (constant). - The value of the companies that the fund invest in, by comparing the book value to the share price of the company today. Also the earning (the income) the companies make compared to the share price (cyclically adjusted). This company value factor is divided into 3 buckets: overvalued, undervalued and fair value.
- The ESG factors, i.e. Environment, Social and Governance factors such as polution, remuneration, the board of directors, employee rights, etc. This is also divided into 3 buckets, i.e. high (good), medium, and low (bad).

__The Four Factors__

1. Past performance

Funds which have been going up a lot, generally speaking, has the tendency to reverse back to the mean. Meaning that it’s going to go down. But another theory says that if the fund price has been going up, then it has the tendency to keep going up, because of the momentum. Who is right is up for a debate. In my opinion the momentum principle has stronger effect compared to the “reveral to the mean” principle.

2. Interest rate

Because the value/price of the fund is not only affected by the companies or shares in the fund, but also affected by external factors. The interest rate represent these external factors. When the interest rate is high, share prices growth is usually constraint because more investors money is invested in cash. On the contrary, when the interest rate is low, people don’t invest in cash and invest in shares instead (or bonds).

But the factor we are considering here is the change of interest rate. But the impact is generally the same. Generally speaking if the interest rate is going up then the investment in equity is decreasing, thus putting pressure on the share price, resulting lower share price.

3. Value

If the company valuation is too high, the investors become concerned psychologically, afraid of the price would go down. This concern creates pressure on the share price, and the share price will eventually goes down.

On the contrary, if the the company valuation is lower compared to similar companies in the same industry sector and in the same country (and similar size), then the investors would feel that this stock is cheap and would be more inclined to buy. And this naturally would put the price up.

4. ESG

Factors like climate change, energy management, health & safety, compensation, product quality and employee relation can affect the company value. Good ESG scores usually increase the value of companies in the fund, and therefore collectively increases the value of the fund.

On the contrary, concerns such as accidents, controversies, pollutions, excessive CEO compensation and issues with auditability/control on the board of directors are real risks to the company futures and therefore affect the their share price.

__Entropy at Top Level__

Now that we know the factors, let us calculate the Information Gain for each factor (feature). This Information Gain is the Entropy at the Top Level minus the Entropy at the branch level.

Of the total of 30 events, there are 12 “Price is down” events and 18 “Price is up” events.

The probability of the price of a fund going “down” event is 12/30 = 0.4 and the probability of an “up” event is 18/30 = 0.6.

The entropy at the top level is therefore:

-U*Log(U,2) -D*Log(D,2) where U is the probably of Up and D is the probability of Down

= -0.6*Log(0.6,2) -0.4*Log(0.4,2)

= 0.97

__Information Gain of the Performance branch__

The Information Gain of the Performance branch is calculated as follows:

First we calculate the entropy of the performance branch for “Less than 0”, which is:

-U*Log(U,2) -D*Log(D,2) where U is the probably of the price is going up when the performance is less than zero, and D is the probability of the price is going down when the performance is less than zero.

= -0.5 * Log(0.5,2) -0.5 * Log(0.5,2)

= 1

Then we calculate the entropy of the performance branch for “0 to 5%”, which is:

= -0.56 * Log(0.56,2) -0.44 * Log(0.44,2)

= 0.99

Then we calculate the entropy of the performance branch for “More than 5%”, which is:

= -0.69 * Log(0.69,2) -0.31 * Log(0.31,2)

= 0.89

Then we calculate the probability of the “Less than 0”, “0 to 5%” and “More than 5%” which are:

8/30 = 0.27, 9/30 = 0.3 and 13/30 = 0.43

So if Performance was the first branch, it would look like this:

Then we sum the weighted entropy for “Less than 0”, “0 to 5%” and “More than 5%”, to get the total entropy for the Performance branch:

1 * 0.27 + 0.99 * 0.3 + 0.89 * 0.43 = 0.95

So the Information Gain for the Performance branch is 0.97 – 0.95 = 0.02

__Information Gain for the Interest Rate, Value and ESG branches__

We can calculate the Information Gain for the Interest Rate branch, the Value branch and the ESG branch the same way:

Why do we calculate the entropy? Because we need entropy to know the Information Gain.

But why do we need to know the Information Gain? Because the decision tree would be more efficient if we put the factor with the largest Information Gain as the first branch (the highest level).

In this case, the factor with the largest Information Gain is Value, which has the Information Gain of 0.31. So Value should be the first branch, followed by ESG, Interest Rate and the last one is Performance.