What Machine Learning Can Be Used For
Machine Learning is a field of computer science that gives computers the ability to learn without being explicitly programmed (Wikipedia). That is a very good definition but isn’t worth much without knowing concrete examples, so in this article I will focus on giving examples of what machine learning is currently used for, and can be used for.
Two Types of Usage
Broadly speaking Machine Learning can be used to do 2 things:
- If we have a set of data, Machine Learning can be used to classify that data into groups, or find associations between groups. This is called Unsupervised Learning.
For example: grouping galaxies using astronomy images, grouping customers using purchase data, users who like news A also like news B. - We have pairs of input and output Machine Learning can be used to predict the output of a new input. This is called Supervised Learning.
For example: recognising symptoms of skin cancer based on the skin lesion images, identifying hacking attempts in cyber security based on network data, predicting the price movements of equity indices, forecasting the weather.
A. Image Recognition
By far the most widely use of machine learning is image recognition. For example, recognising hand written letters or numbers, vehicle plate numbers, and faces.
A1. Handwriting Recognition
Machine Learning can be used to recognise hand written letters or numbers. This article (link) from Adam Geitley explains the how. How Machine Learning can recognise a hand written number 8. An image of number 8 is a series of numbers which represents how dark each pixel is. Using a machine learning algorithm called Convolutional Neural Network (CNN), computers can be trained to recognise a number 8. And other numbers. And letters. Convolutional Neural Network breaks the image into small tiles and pass the tiles into a small neural network. We then sample the output and pass the sample to another neural network.
We use handwriting recognition in the post office to sort thousands of letters into post code areas by reading the address written on the envelopes. It is also used to read the amounts and recipient names on the handwritten bank cheques.
A2. Plate Number Recognition
The same technique (CNN) is used for character recognition. Not handwritten character but the ones on images. For example, to read the licence plate numbers in the petrol stations. It is called ANPR (Automatic Number Plate Recognition). It is used at toll gates like Dartford Crossing, so drivers does not have to queue to pay. Instead, they can pay from their mobiles later (within 24 hours). It is also used in apartments, i.e. only recognised cars are allowed in (and out!). It is also used to reduce congestions in big cities like London, i.e. we have to pay £11.50 to go into central London. Thousands of cameras at the borders of central London detect which cars are coming in and out so the system can match the payments and the cars, and issue penalties to those who didn’t pay.
A3. Facial Recognition
We can use Machine Learning to recognise faces. This article (link, also from Adam Geitley) explains the how. It uses a series of 3 images to find 128 measurements for each person using HOG algorithm (Histogram of Oriented Gradients). 2 of the 3 images are the photos of that person, the third one is a photo of a different person. This way, each face is converted into a set of 128 numbers, which called an encoding.
It is used in Facebook. We used to tag photos in Facebook and the tagged person get notified “hey you are tagged in such and such photo”. But now we don’t need to tag the faces and the person still get notified. This is because Facebook recognises who the faces belong to. The notified person can then ask the photo to be taken down, or like the photo, or tag it.
Facial recognition is also used for payment. Faces can be converted into a set of 128 numbers, which uniquely identify each person. Well, almost. Alipay uses face recognition for payment in KFC in China (link, link).
Banks now uses face recognition identify customers. Instead of passwords or PIN, to login to their bank accounts customers use facial recognition using the camera on their laptop, tablet or phone. For example, Lloyds bank and HSBC do this (link), and so do First Direct and Tesco bank (link). In some banks, customer can also use their voice or finger print (on their mobile) to login.
Facial recognition is also used by the police. In a beer festival in China security cameras were installed at entrance gates and sounded an alarm when it recognised a face is found in the police database (link). At the Notting Hill Carnival in London, the police used it to match the faces in the crowd to the database of people previously arrested or under bail conditions. In Germany the police uses facial recognition to catch terrorists (link).
A4. Searching Photos
What I mean here is not search the text/label that accompanies the photos, but searching inside the pixels of the photos to find for example if there is a flower in those photos.
- Shutterstock allows us to search photos based on the composition (link).
- Facebook Lumos will (not yet) enable us to search photos (link).
- The best I think at the moment is Google Photos. We can now search, without labelling our photos first (link). I tried searching “curry”, “flower” and “snow” and it correctly displayed photos containing curry, flower and snow. Amazing!
The basic application is what Google Photos does above. Shutterstock, BBC, FB and every media company from should have this technology, but not yet. Media companies like newspapers and TV have millions of photos which take a lot of time to label them all. The cost saving would be enormous if employees can search inside those collection of photos, without labelling them.
The reason it takes the time is that we need to train the Machine Learning behind the photo search engine to recognised millions of words, which takes time. It’s probably a mini project to train a certain word such as flower, but to train millions of word would take years.
A5. Astronomy
Machine Learning can be used to group galaxies according to their shapes, i.e. spiral, elliptical, lenticular and irregular (link). Why don’t we group them manually? Because there are billions of them. We can also use Machine Learning to detect exoplanets using S4 algorithm (link) and CNN (link), rather than random forest (link). We can also use it to analyse galaxy spectrum (link, link).
Most of the application of Machine Learning in Astronomy is to analyse thousands of galaxy images using image recognition algorithms such as Convolutional Neural Network (CNN). Astronomy has a long history of using machine learning from the 90s. But it was using the “old” algorithm such as decision trees and random forest. From 2015 most of the image analysis in Astronomy is using deep learning algorithm such as CNN.
A6. Manufacturing
In manufacturing, Machine Learning is used to automatically detect overheating in motors, and defect in glass products (link). To detect overheating, the thermal images produced by infrared cameras are analysed using CNN. The glass defect is detected using normal-light photographs.
There are many Machine Learning application in manufacturing outside Image Recognition. Unlike Astronomy, manufacturing uses many different sensors, not just visual/images. Temperature, pressure, flow, weight, and many other measurements are analysed using Machine Learning for predictive maintenance and other purposes, see below for details.
A7. Health Care
Machine Learning is used to detect cancer by classifying the FNA images. It is used to detect breast cancer (link), salivary gland cancer, liver cancer and thyroid cancer.
Machine Learning is used to identify tuberculosis from chest x-ray images (link). They used Deep Convolutional Neural Network models to learn TB positive and TB negative x-rays. This is particularly useful in areas where there is no cardiothoracic radiologist.
A8. Retail
Lowe’s Home Improvement uses a robot to answer customer’s questions such as where is item X, and take the customer to the aisle where the item is located (link). It uses computer vision powered by Machine Learning to navigate the aisle in the store autonomously using camera and odometry. It also does inventory monitoring in real time by scanning the shelves in the aisle.
When we find a pair of jeans, glasses or handbag that we like, we can take a photo of it (or a screenshot), then uses an app called SnapShop from GoFIND to find the product in thousands of shops across many brands (link).
When we shop for clothes online, we can see how various garments fits us by trying it online using MeModel from Metail (link). We input our height, weight, bra size, waist, hips, hair style, gender, age and build so MeModel can predict and recommend various dresses, tops, skirts, trousers that fit and suit us. Retailers can see the demographic and body data of their customers, enabling them to provide suitable fashion that their customer like. The recommendation is a computer vision based, powered by Machine Learning algorithms.
B. Other Media Data
Non-image media data are video and audio. There are a lot of Machine Learning application for video data and audio data, e.g. natural language processing, self driving car. In this section I will list just the video and audio data, and will list the text and numeric data in section C.
B1. Self Driving Car
This is probably the one receiving the biggest media attention. It is not just the video feed that the deep learning algorithm needs to analyse, but also data from Radar, Lidar, GPS, and the controllers-actuators in the car itself (link). Kalman Filter is used to do combine data from different sensors to understand what’s around the car, including pedestrian, traffic signs and other vehicles. PID and MPC controllers are used with deep learning algorithm to regulate the engine RPM, breaks, gear, accelerator and wheels of the car so that it moves at the correct speed and direction, with self-tuning. Finally there is an element of routing as well, partly using neural network (link).
B2. Natural Language Processing (NLP)
NLP is about understanding human speech, both oral and written. It is also about producing speech, both oral and written. Speech recognition (link) is a major part of NLP, to translate voice into text. Stemming algorithms (link) are used the base form of each word. Word2Vec (link, link) is used for understanding the meaning of words (link). Dependency parsing is used to understand how each word in a sentence relates to each other (link). The Machine Learning algorithms used for NLP includes Deep Bi-directional RNN, Gated Recurrent Units and Long Short Term Memories (link). Convolutional Neural Networks is used for sentence classification (link) and Neural Machine Translation is used for large scale translation tasks (link).
Text Analytic (or Text Mining) is an important part of NLP. In the paragraph above, all items from Stemming onwards are in the Text Analytic domain. Basically, once the voices are translated into text, we are entering Text Analytic.
B3. Speech Processing
Rather than trying to understand the full meaning of human speech, Speech Processing focuses on just one particular aspect of it. For example:
- Converting Speech to Text: it is like a Dictaphone crossed with OCR scanner. It can translate what we said to it into a document.
- Speaker Identification: like a finger print, our voice can be used to identify us. It can be used in telephone banking to identify the customer (HSBC, Barclays), and in forensic to identify the criminal in offences such as kidnapping, extortion, blackmail threats, terrorist calls, ransom calls, etc (link).
- Speech Generation (or Speech Synthesis): it is concatenation of many recorded voice of different words read by human. If the word in the text does not exist in the “recorded word database” then the word is converted into phoneme (one unit of sound), such as /k/ in car, cool and kit. Each phoneme is converted into voice and concatenated.
- Noise Reduction: using a deep recurrent auto encoder neural network, we can reduce the noise on an input audio (link). This is done by training the network on stereo (noisy and clean) audio features to predict the clean features given noisy input.
- Emotional Recognition: given voices as input, a deep recurrent neural network can determine the emotion the speaker is in, for example: excited, angry, sad or neutral (link). This can also be done using facial image processing.
- Stress and Lie Detector: given a speech, using artificial neural network we can detect if the speaker is under stress or is lying (link). Similar result can also be achieved using facial image processing.
B4. Music
Machine Learning can be used to automatically detect the genre of a piece of music. The best approach is probably using Convolutional Neural Network and Bi-RNN block in parallel, which focus on the spatial features and temporal frame orders extraction (link).
Other application of Machine Learning in music include predicting the region of origin of a piece of folk/traditional music (link) and making music (link), usually using RNN and CNN algorithms.
B5. Video
Machine Learning can be used to recognise objects in video (link). It is done using CNN. As a team from Google and Stanford demonstrated (link) Machine Learning can be used to automatically classify a large set of videos into many classes. Using CNN they classified 1 million YouTube videos into 487 classes.
When we watch streaming videos on mobile phones or computers over 4G or WiFi, sometimes we experience buffering, i.e. long wait or break times before the video continues, or pixelation. The MIT CSAIL team used machine learning to switch between rate and buffer-based Adaptive Bit Rate, to minimize buffering (link).
Machine Learning can also be used for behaviour analysis in animal research. For example, using a video feed we can determine whether Drosophila (fruit flies) were on or off the egg-laying substrates for each frame of the videos (link).
C. Numerical Data
C1. Health Care
Last year Edward Choi et al from Georgia Institute of Technology and Sutter Health used Machine Learning to process data from Sutter-PAMF primary care patients to predict heart failure (link). They used 4 Machine Learning methods: Regularised Logistic Regression, Multilayer Perceptron, Support Vector Machine, K-Nearest Neighbour, and a Recurrent Neural Network called Gated Recurrent Unit. They used Theano (a deep learning software for Python) as well as Python Scikit-Learn.
Machine Learning was also used for predicting HIV/AIDS (link).
There are a lot of ML application with numerical data, such as Predictive Maintenance, Monitoring Efficiency, CRM, Robotic (link, link) as well as in Financial Services. But due to time limitation (I just started doing Andrew Ng’s Deep Learning course) I will have to write them another time.