Computer vision is a subset of Artificial Intelligence. It is an interdisciplinary field that deals with helping computers understand the contents of images and videos.
Our daily lives are awash with digital images. There is no shortage of image content, especially with the rise of smartphone cameras and photo/video sharing communities like Facebook, Instagram, Youtube, and many others. This is partly the reason why computer vision is one of the hottest fields within AI at the moment.
The other reason why it’s an in-demand field is its complexity. In essence, computer vision tries to replicate human vision. However, what is literally child’s play for a human requires sophisticated machine vision solutions learning algorithms such as Convolutional Neural Networks. What’s the deal? Why is this task so complex for machines? The primary reason is that machines don’t have the abstract models that a human brain applies to an image – developed processing billions of life experiences. To a machine, an image is simply a set of pixels that need to be processed mathematically.
For instance, if I were to show you the image below and ask you to tell me what objects are there; you’d say pens, clips, highlighter, cello tape, and sticky notes. But what would you say if I were to just show you the image and ask, what it’s about – you’d just say, it’s a pen stand without thinking. The machine, however, would have a matrix of pixels according to its coordinates along with attributes such as color.
To do what you did without thinking, we’d have to train a machine learning model that recognizes the different kinds of pen stands and can classify an object as one. And that model would work only for a pen stand. Imagine how many objects there are in the universe. That’s how image recognition works.
The impact of computer vision in our daily lives is palpable. If you’ve seen (or heard of) the “auto-tag” feature on Facebook, videos of driverless cars, or even the new Amazon Go outlet, you’ve experienced the magnificence of computer vision. However, these aren’t the only areas where it’s applicable. Below are some of the most popular applications of computer vision:
Table of Contents
#1 – Content Organization
Computer vision isn’t a thing of the future. It’s right here on your phone. Apple Photos, Google Photos–all of them tag images to structure your albums. This is typical image classification at work. There also exists custom image recognition software for more specific purposes. Not to mention that Youtube is experimenting successfully with annotating video content. There is also a lot going on to help users dig through hours of video by typing in the content they’re looking for instead of manually watching the entire video.
#2 – Facial Recognition
If your smartphone or your laptop has a Face ID feature, you’ve already experienced facial recognition. But this isn’t the only application of facial recognition.
FR is being used to aid forensic investigations by recognizing individuals in security footage. And it’s even being tested at ATMs and other high-risk facilities to validate identity. (not unlike a futuristic TV show).
Today, it’s being used in the retail sector to identify shoplifters and prevent theft. Take AI Guardsman, a machine learning system created by the Japanese company NTT East. It attempts to catch crooks in the act. How does it do that? It scans the live video stream for suspicious activities such as a thief looking for blind spots or nervously checking their surroundings. As soon as it detects it, it informs the management. Accurate face recognition is still a challenge for these systems but substantial progress has been made.
#3 – Autonomous Vehicles
Computer vision can help cars make sense of their surroundings. A smart vehicle has cameras that capture videos from different angles. The system processes these videos in real-time and recognizes objects close to the car, which it steers away from.
There is still a long way to go for automobiles to become completely autonomous however, it’s already disrupting the automotive industry.
While these are some of the popular applications of computer vision, there are dozens of other applications:
- Optical character recognition (OCR) aka recognizing and converting text from images like scanned documents.
- Automated checkouts in retail which use object detection algorithms to make shopping more convenient and hassle-free.
- Medical imaging aka scanning and comparing the condition of internal organs with their healthy counterparts to spot diseases.
- Match moving or inserting computer graphics into live-action footage (popularly called CGI in the entertainment industry).
- Motion capture aka recording the process of movement of people or objects. This is often used in video games to authentically animate athletes or martial artists.
To Sum Up
The applications of computer vision are far-reaching and yet, despite 40 years in the making, it’s still a field that is rapidly progressing. As the algorithms evolve, not only will computer vision models become easier to train, they will also become more efficient. They will be able to extract more information from images to be more accurate. Not to mention that it will be combined with pre-existing technologies to create even more innovative solutions. Natural Language Generation can work in conjunction with Computer Vision to interpret the surroundings for the visually impaired. And of course, Computer Vision will be a key contributor to Artificial General Intelligence i.e. to actually help machines think, reason, plan, learn like a human.