Data Warehousing and Data Science

26 November 2021

Automating Machine Learning using Azure ML

Filed under: Data Warehousing — Vincent Rainardi @ 8:06 am

I have been using Google Colab (see my article here) and Jupyter to build and train various machine learning models including LR, KNN, PCA, SVM, XGBoost, RF, NLP, ANN, CNN, RNN, RL. I have been using Azure (Databricks, Data Lake, Data Factory, SQL, etc.) so I’m intrique to try Azure Machine Learning to see if it is as good as Colab.

The first thing I notice in Azure ML is Automated ML, which enables us to train an ML model without doing any coding. We specify the data, and Azure ML will try various algorithms, build various models, and evaluate them according to the criteria that we give.

This sounds too good to be true, but entirely possible. One of my ML projects is about credit card transactions. In that project I used 6 algorithms (LR, KNN, SVM, DT, RF, XGB) and each model has many hyperparameter. Each of these hyperparameters have many values to try. So to find the best model I had to do are a lot of hyperparameter tuning using GridSearch cross validation on training data. Once I found the best parameter for a model, I had to evaluate the performance on the test data using Area Under ROC Curve, or AUC. Then I had to select the best model based on that evaluation. And on top of that I need to find out the top features. Can all this be automated in Azure ML? Sounds too good to be true, but entirely possible.

First, I loaded the data using a procedure similar to this demo: link. Set the evaluation metric to AUC, set the train-test split using K-fold cross validation with K=3, set the ML algorithm to auto, set explain the best model = True, and set maximum concurrent session to 5. For the compute node use DS12 V2 with 4 CPUs, 28 GB memory and 200 GB SSD space (16×500 IOPS)

The top 10 models came out like this:

I expected XG Boost classifier (XGB) to be the top model and it is (I didn’t enable Neural Network in the AutoML). The top XGB model is using SparseNormalizer, which is expected because the data is skewed on many features. 2m 24s training time on 30k observations/examples on 4 CPUs/28 GB is not quick.

The eta (learning rate) is the step size shrinkage used in update to prevents overfitting. In this case it is 0.4 (the default is 0.3, range is from 0 to 1, see link, link). Gamma is the minimum loss reduction required to make a further partition on a leaf node of the tree, ranging from 0 to infinity (default is 0). It is a regularisation measure and in this case it is conversative (the larger the gamma the more conserative the model is). The maximum depth is 10. For comparison when I tuned by XGB model for credit card fraud data, the eta was 0.2, the gamma was 0 and the max depth was 6.

We can see it in more details by clicking the Algorithm Name, then click View Hyperparameters:

We can see the top influencing features like this:

F is the feature number, such as account age, location, or customer behaviour.

We also get a chart of the top feature against the probability of the predicted variable, like this: (I would prefer charting the top and second top features on the x and y axis but as this is out of the box it looks good and useful)

And we get the Precision-Recall chart out of the box too (you can choose which version of AUC to use, i.e. weighted, macro or micro:

The ROC is True Positive Rate (TPR) on the Y axis against False Positive Rate (FPR) on the X axis, so the above is not an ROC curve. But it gives us a good sense on how we can maximise recall or precision.

We want to recall to be as large as possible, and precision to be as large as possible but the AUC line limit them so it will always be a trade off between them. For example if you take the Weighted Average AUC line, the maximum of (recall – precision) might be point A. But in the case of credit card fraud you would want high recall, so we would choose point B instead of point C which is for high precision.

And AutoML in Azure ML also gives us the data transformation, such below:

We can see above that during the preprocessing of the data, for numerical features AutoML uses MeanInputer to mitigate missing values, whereas for categorical features CharGram count vectoriser and ModeCatInputter label encoder. Then it uses maximum absolute scaler before feeding the preprocessed data to the XGBoost model.

Overall I found that AutoML is useful. It tried various algorithms including Random Forest, Logistic Regression and XG Boost. Over 60 models it tried, in under 2 hours! The test AUC is 94.8% which is a good result for this data. And it gives us features importance as well. It tried various values of hyperparameters for each model, and chose the best values for us, automatically. Very, very easy to use. Welldone Microsoft! Of course, once we get the top models AutoML, then we can tune it further ourselves to get higher AUC. It is finding the top models which is very time consuming (it took me a week, but with AutoML it only took 2 hours).

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: