Computer Vision: using AI to detect plane models in real-time
- armstrongWebb
- Apr 10, 2021
- 2 min read

Challenge
Imagine that you had to build a system to visually identify plane models (say, for an airport). Well, you could engage a human 'system', also known as a 'plane spotter'. But, of course, people go on holiday and fall ill. No, an automated solution is required.
But will it be quick enough?
One of the challenges of real-time computer vision is to process images quickly enough to ensure the analysis is delivered in in real-time. With the average video generating 24 frames per second and each frame potentially having multiple objects to classify, it presents a demanding challenge.
You Only Look Once
You Only Look Once (YOLO) refers to an algorithm originally developed by Joe Redmon.
YOLO (v4) is consider a state-of-the art technique for object detection.
It makes use of a branch of AI, known as Deep Learning; specifically, multi-layer Neural Networks, to detect objects in an image and classify them.
A simple Deep Learning NN might have, for example, 5-10 neural network layers. YOLO has more than 250!
Because YOLO is very efficient at scanning an image, it can complete its Deep Learning assessment very rapidly. How rapidly? Well, that depends whether YOLO is coupled with hardware that can carry out analyses in parallel. If it is, an image can be processed within a fiftieth of a second, which is fast enough to process a live video feed.
Training YOLO
To maximise the predictive capability of an Deep Learning NN, they are often trained on many hundreds - or thousands - of images. But even with far fewer images, they can produce pretty good results.
I created a Deep Learning model, using YOLO, and trained it to distinguish between images of the Airbus-A320 and Boeing-737 families. And, because I occasionally have a life outside AI, I only used 85 images.
It's worth bearing in mind that, for example, the A320 family contains different models - eg A318 and the A320neo - with a different physical appearance. This adds to the complexity of the classification task.
After the training phase was completed, I ran the model against a set of photos that it hadn't 'seen' before. With the following results:

The YOLO model drew a boundary box around each image (blue for Airbus-320, yellow for Boeing-737) and provided a confidence level (between 0 and 1) that the image shown was indeed what had been predicted. As you can see, most images were predicted with 0.9 (ie 90%) certainly, with one dropping to 0.5 - that of the Boeing-737 amongst a bunch of other planes. And all of the predictions - in this set of images - are accurate.
The YOLO model created predictions for all 12 images in a fifth of a second.
Bringing it all together - 'real-time' object classification
A YOLO Deep Learning model - with the same 85 training images that were used to predict the static images above - was used to process a 23 second video clip of the two airplane models. The prediction was completed within 23 seconds, so qualifying as being real-time in performance - the video clip is below. As you can see. YOLO updates its prediction confidence score many times per second.
Comments