In machine learning we have a technique called Ensembles, i.e. we combine multiple models. The more models we use, the higher the chance of getting right. That is understandable and expected. But the number of models being odd or even has a significant effect too. I didn’t expect that and in this short article I would like to share it.

I’ll start from the end. Below is the probability of using 2 to 7 models to predict a binary output, i.e. right or wrong. Each model has 75% chance of getting it right i.e. correctly predicting the output.

If we look at the top row (2, 4 and 6 models) the probability of the ensemble getting it right increases, i.e. 56%, 74%, 83%. If we look at the bottom row (3, 5 and 7 models) it also increases, i.e. 84%, 90%, 93%.

But from 3 models to 4 models it is down from 84% to 74% because we have 21% of “Not sure”. This 21% is when 2 models are right and 2 models are wrong and therefore the output is “Not sure”. Therefore we would rather use 3 models than 4 models because 3 models is better than 4 models, in terms of the chance of getting it right (correctly predicting the output).

The same thing happen between 5 and 6 models. The probability of the ensemble getting it right decreases from 90% to 83% because we have 13% of “Not sure”. This is where 3 models are right and and 3 model are wrong so the output is “Not sure”.

So when using ensembles to predict binary output we need to use odd number of models, because they don’t have “Not sure” where equal number of models are right and wrong.

We also need to remember that each model must have >50% chance of predicting the correct result. Because if not the model ensemble is weaker than the individual model. For example, if each model has only 40% of predicting the correct output, then using 3 models gives us 35%, 5 models 32% and 7 models 29% (see below).

The second thing that we need to remember when making an ensemble of models is that the models need to be independent, meaning that they have different areas of strength.

We can see that this “independent” principle is reflected in the calculation of each ensemble. For example: for 3 models, when all 3 models get it right, the probability is 75% x 75% x 75% (see below). This 75% x 75% x 75% means that the 3 models are completely independent to each other.

This “completely independent” is a prefect condition and it doesn’t happen in reality. So in the above case the probability of each of the 3 models getting it right is lower than 42%. But we have to try to get the models independent the best we can. Meaning that we need to get them as different as possible, with each model should have their own areas of speciality, their own areas of strength.

## Leave a Reply