Rise of CoreML
We iOS Developers are a lucky bunch – apart from the usual holidays in December, we enjoy a special Christmas every June – thanks to the World Wide Developers Conference organized by Apple. 2017 was no exception either. So when Apple unwrapped the boxes for us – out came the new HomePod Speaker, the new beast called iMac Pro, Mac OSX High Sierra – everything was awesome! Then there were the toys for developers. I probably have been a very nice guy all year because it was a pleasant surprise for me when Apple revealed their new CoreML – machine learning integration framework – because out of professional curiosity I have been dabbling with Machine Learning for the past few months. So having the opportunity to implement the power of machine learning in iOS – I could not wait to get my hands wet! Here’s an outline of what I learned:
What is Machine Learning
Before we jump right down the cliff, let’s discuss a little about that what’s beneath.
You see, when a human child puts her first step at the doorway of learning, she can not learn by herself. Instead, she needs her hand carefully held by the teacher, and with intensive guidance, she is steered along the path of acquiring knowledge. As she learns, she also gains experience.
The trusted friend who would, one day, take the job off her teacher’s careful hands and become her lifelong guide and companion – growing together as she passes through the oft-perilous ways of life. And exactly there, dear reader, a machine have differed from a human being thus far. A machine could be taught, but it could not teach itself – until – machine learning evolved. Machine Learning provides the experience factor to the intelligence system of a machine which is also known as Artificial Intelligence. It’s the science of getting computers to act without being explicitly programmed.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.
– Mitchell, T. (1997). Machine Learning. McGraw-Hill. p. 2. ISBN 0-07-042807-7.
Types of Learning
Based on algorithms used for training a machine to gain experience, Machine Learning can be grouped into two major categories – Supervised Learning and Unsupervised Learning. Supervised Learning is where a machine is trained with a complete set of labeled training data and outcomes. On the other hand, unsupervised learning is where the machine can not be trained with labeled training data. Using supervised learning a machine can either solve a classification or a regression problem. On the other hand, a machine can solve a clustering and some other types of problems using unsupervised learning. Following are some examples of the problems:
- Classification: The machine is given a dataset and based of specific parameters it classifies them into different categories. For example: Based on the size and shape of a tumor, the machine can classify a tumor to be malignant or benign.
- Regression: Based on various parameters, like product price, demand, distribution and other factors, based on historical data, a machine can predict the profit for the current or future years.
- Clustering: The best example of clustering should probably be the Google News. It uses an algorithm to group news of same topic and content and show them together. Pattern recognition plays a key part in clustering solutions.
Once such an algorithm is generated, a model can be generated which enables a machine to refer to it to make further predictions, inferences and deductions. Machine Learning tools can generate such a model, but once generated they can not be used in iOS apps as is. They need to be converted to Xcode supported .mlmodel format.
Apple provides the link to a few open source CoreML models that solve some classification problems like detecting the major object(s) in a picture or detecting a scene from a picture.
However, apart from these, any machine learning model generated by any machine learning tool can be converted into a CoreML model using CoreML Tools – that can be used in the app.
Core ML lets you integrate a broad variety of machine learning model types into your app. In addition to supporting extensive deep learning with over 30 layer types, it also supports standard models such as tree ensembles, SVMs, and generalized linear models. Because it’s built on top of low level technologies like Metal and Accelerate, Core ML seamlessly takes advantage of the CPU and GPU to provide maximum performance and efficiency. You can run machine learning models on the device so data doesn’t need to leave the device to be analyzed.
Using the CoreML model, and Vision framework, it’s really easy to build an iOS app that – given a photo – can detect scenes or major objects from that and display. I won’t go into the irrelevant details of building this app from scratch but rather would discuss the heart of the application – the fun part – and it’s just a few steps.
The Photo Recognition App
I will assume that the app is setup to provide an image, by either picking a photo from the native photo picker or by taking a photo with the camera.
Step 1. Now the first step would be to download a machine learning model from Apple’s website and include it in the app. Here I am using Inceptionv3 model listed in Apple’s website. This seems to be a very good model with much better accuracy than the others – although a bit heavy in size. Now Xcode does some heavy lifting for you. As soon as the model is added, Xcode generates a model class named after the model name. To see it, just highlight the model in Xcode files navigator:
In the next steps we would refer to this class as
Step 2. Now it’s time for some code. Import
CoreML frameworks which will aid us in our journey. Then implement the following code.
Here we create an instance of the
VNCoreMLModel class for the CoreML model Inceptionv3. It’s sometimes recommended to initialize this early so that it’s faster once the image is selected for recognition.
Step 3. Now we need to create a
VNCoreMLRequest which would query the MLModel with our image to find out what it is.
Here, first we create a
VNCoreMLRequest and specify a completion block once it finishes execution. The completion block just takes the first result from the prediction set received as an array of
VNClassificationObservation class. As I discussed before, classification is one type of observation. There are other types of observations like clustering, regression. Notice that,
VNClassificationObservation is a subclass of
VNCoreMLModelthat is based on a CoreML based model to run predictions with that model. Depending on the model the returned observation is either a
VNClassificationObservationfor classifier models,
VNPixelBufferObservationsfor image-to-image models or
VNMLFeatureValueObservationfor everything else.
Step 4. We are almost there. The last and final step is to actually executing the request. A job well suited for the one and only VNImageRequestHandler.
All the code listed above can be included in one method and once it is executed, the answerLabel prints the name of the major object on the picture along with the accuracy.
A note on accuracy
From the above screenshot, it might appear that the world of machine learning and prediction is all rainbows and unicorns like this, but in reality it’s far from that. Machine Learning is still in it’s infancy and has much room for improvement. As for the iOS app, it all depends on the model used, and its very easy to miss the optimal sweet spot and instead under-train or overtrain the model. In case of overtraining, the models start focussing on the quirks of the training set more and hence it’s accuracy gets diminished.
Using a CoreML model and Vision framework to leverage machine learning to create perception about the outside world opens up endless possibilities. Once the machine recognizes an object, it’s probably the next obvious step to respond to it. In iOS 11, ARKit provides Augmented Reality – one of many options to do something with this new super power the iPhones have got. I intend to touch up on that in my next post. Meanwhile, have fun and learn how to train your machine!!
All copyrights of images belong to their respective owners.