Convolutional Neural Network: Updated Guide 2020

Anshul Jain

Oct 03, 2020

In our previous article, we discussed one of the types of neural networks, Recurrent Neural Network, and how it is used by deep learning and machine learning for successful data assessment as well as for solving real-world problems. In this article today, our focus will be on another type of neural network, Convolutional Neural Network, which is being used by organizations like Facebook, Amazon, Google, Instagram, and Pinterest for a variety of functions. Convolutional Neural Networks is one of those critical branches of artificial neural networks that have allowed researchers worldwide to explore more complex technologies like deep learning and further enhance their functionality and features.

But before we move on to talking about the impact it is crucial that we unravel how convolutional neural networks work and why convolutional neural networks are better than other types of neural networks, among other things.

So, if you want to get an in-depth understanding of Convolutional Neural Networks and its architecture, layers, and more, this article is for you.

What is Convolutional Neural Network?

A technology underlying the advancement in Computer Vision, Image Classification, and Deep Learning, Convolutional Neural Networks have a special place in the ever-expanding field of artificial intelligence and machine learning. Considered a class of deep neural networks, Convolutional Neural Networks, also referred to as CNN and ConvNet are especially prevalent in analyzing visual imagery. Hence, it is commonly used in areas such as natural language processing for text classification, medical image classification, financial time series, etc.

Since ConvNet takes inspiration from the biological process of the brain, it has a regularized version of multilayer perceptrons, i.e. it has fully connected neurons where neurons present in the previous layer are connected to all neurons in the other/next layers. Moreover, it has a shared-weights architecture and translation invariance characteristics because of which it is also known as shift invariant or Space Invariant Artificial Neural Networks (SIANN).

Today, combined with GPUs and parallel computing, convolutional neural networks have proven to be successful in powering vision in robots and self-driving cars through its ability to identify objects, faces, and traffic signs in these machines surrounding environments.

Features of Convolutional Neural Network:

Though traditional multilayer perceptron (MLP) models were in use for image recognition for centuries, it had certain drawbacks that made it unsuitable for various tasks, such as their inability to scale well with higher resolution images. Convolutional neural networks were able to mitigate these drawbacks, with the help of its dynamic features, which allows it to emulate the behavior of the visual cortex.

Hence, CNN features that mitigate the challenges posed by MLP architecture are:

3D Volume of Neurons: In CNN neurons are arranged in 3 dimensions and have completely connected layers that form suitable CNN architecture.
Local Connectivity: The connectivity pattern between neurons in CNN enforces spatial locality, which ensures that the strongest response is produced to the input pattern based on the learned filters.
Shared Weights: The shared parameterization by each filter in CNN allows for translational equivariance under changes in the locations of input features. This is another important feature that sets CNN apart from the traditional multiplayer perceptron model.
Pooling: The most distinctive feature of CCN, pooling, reduces the size of feature map in its layers and offers translational invariance to the features contained therein, which further allows it to be more robust to variations.

What are Convolutional Neural Networks Used For?

The popularity of convolutional neural networks has increased drastically in recent years due to its ability to eliminate the need for manual extraction features as well as producing optimum recognitional results. These together make CNN useful for the following:

Classification: Describe and classify visual content.
Recognize: Help machines recognize objects within the scenery/environment.
Gather: Put together the recognized objects into a group or cluster.

Convolutional Neural Networks Architecture:

As stated earlier, the architecture of convolutional neural networks takes its inspiration from the functionality of the visual cortex and is designed to mimic the connectivity patterns of the human brain’s neurons. Moreover, like the Recurrent Neural Network (RNN), CNN also follows the basic architecture of neural networks, which includes the Input layer, Hidden Layer, which consists of a series of convolutional layers and an output layer.

The objective of these layers is to perform an operation that alters the data with the intent of learning data-specific features. However, the factors that differentiate CNN’s architecture from that of RNN’s is that it is composed of various layers, which act as an optimal architecture for image recognition as well as pattern recognition.

These layers of CNN are further discussed in detail below.

Layers in Convolutional Neural Networks:

The CNN architecture is made up of various distinct layers that use the differential function to transform the input volume into an output volume. These layers are responsible for enabling convolutional networks to process more complex images than the regular neural networks, as it helps deliver a labeled output that can be easily classified and recognized.

These distinct layers include:

Input: This layer consists of the raw pixel values of the input image.
Convolutional Layers (CONV): The first and the most important layer of CNN, the convolution or convolutional layers, extracts features from the input image. It consists of a set of learnable filters known as kernels, which are convolved across the input volume through convolution operation, with an objective to compute the dot product between entries of the filter and the input. Finally, the activation maps from all filters are summed up into one number that represents all the pixels the filter observed.

Convolutional layer further involves:
- Local Connectivity: In scenarios with high-dimensional inputs, CONV does not connect neurons to all neurons, as it does not take into account the spatial structure of the data but enforces a sparse local connectivity pattern to connected each neuron to only a small region of the input volume.
- Spatial Arrangement: The volume of the convolutional layer is controlled by three hyperparameters:
  - Depth: It controls the number of neurons in a layer, that are connected to the same region of the input volume.
  - Stride: Controls all the depth columns are allocated around the spatial dimensions. Eg. If stride is 2 then we move the filters 2 pixels at a time.
  - Zero Padding: Provides control of the output volume spatial size.
- Parameter Sharing: It is used in convolutional layers to control the number of free parameters.
Pooling Layers (POOL): A form of non-linear down-sampling, the pooling layer is responsible for reducing the data dimensions by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Among the several non-linear functions used for pooling, the following two are most popular:
- Max Pooling: This type of pooling uses the maximum value from each of a cluster of neurons at the previous layer.
- Average Pooling: Whereas, average pooling takes average value from each of a cluster of neurons at the previous layer.
Rectified Linear Unit Layer: Abbreviated as ReLU Layer, Rectified Linear Unit layer is used to apply the non-saturating activation function:

{\textstyle f(x)=\max(0,x)}{\textstyle f(x)=\max(0,x)}

Apart from applying the activation function, ReLU layer also removes negative values from an activation map and increases the nonlinear properties of the decision function, without affecting CNN's receptive fields. Moreover, its ability to quickly train neural networks several times, without affecting generalization accuracy makes it one the most suitable activation functions.
Fully Connected Layers (FC): Similar to the traditional multi-layer perceptron neural network (MLP), the FC layer is responsible for connecting each neuron in one layer to every neuron in another layer as well as for high-level reasoning in the neural network. The importance of this layer is immense as the flattened matrix has to go through these fully connected layers to classify the images accurately, as it is connected to all activations in the previous layer.
Loss layer: The final layer of the Convolutional Neural Network, the objective of this Loss Layer is to specify how the deviations between the output and true labels are penalized by training. Here, different appropriate loss functions are used for different tasks.

Now that we understand the basic structure and architecture of CNN, let’s move on to understanding how CNN actually works.

How Convolutional Neural Networks Work?

In order to perform an accurate feature identification and classification, CNN has tens or hundreds of layers, each of which is trained to detect different features of the input image/object.

Once the features are detected, filters are applied to the training image at different resolutions. These filters are initially simple and are only increased in complexity later to identify features that uniquely define the image/object. Finally, the output of the convolved images is used as the input for the next layer. This process is followed until the final output is reached.

In short, each neuron receives some inputs, performs a dot product, and optionally follows it with a non-linearity to perform precise image classification.

Examples of Convolutional Neural Networks:

Have you ever wondered how recommendations work on e-commerce websites like Amazon or how object detection plays an integral role in creating better self-driving cars? Well, all these are made possible by convolutional neural networks. Let us take an example to further understand how this works:

Let’s assume that you have an image of a black panther and you want to identify whether it is really a black panther or any other animal or object. To reach an accurate conclusion using Convolutional Neural Network, you will need to:

Input Layer: Feed the image pixels in the form of arrays to the input layer or the convolutional layer of the neural network, which is used to classify the object in the image.
Hidden Layer: This layer is responsible for feature extraction through different calculations and manipulations, which is made possible by multiple hidden layers.
Output Layer: Finally, the fully connected layer identifies the object in the image as a black panther.

Applications of Convolutional Neural Networks:

With the plethora of advantages associated with CNN, it is no surprise that this technique has become a primary choice for organizations in various sectors. From classification and feature extraction applications to industrial surface inspection, CNNs have been successfully applied to numerous applications in miscellaneous industries. A few of these include:

Healthcare: The healthcare industry is leveraging the capabilities of CNN to diagnose diseases like pneumonia, breast cancer, diabetes, etc. as well as for drug discovery through computer vision, AtomNet (introduced by Atomwise in 2015), and more.
Surveillance: Modern security systems are using CNN’s computer vision abilities to further enhance surveillance and make identification of theft, crime, or violence in video footage possible in real-time.
Advertising: An area that has been already revolutionized by CNN is advertising. Today, marketing experts and popular organizations like Amazon use programmatic buying and data-driven personalized advertising to better target users.
Time Series Forecasting: CNNs are now considered to be the best option for time series forecasting, as it can be implemented more effectively than RNN to learn time series dependencies. This is made possible by its ability to not suffer from vanishing gradients, which is a major problem for RNNs.
Agriculture: To determine the health and viability of their crops, the agricultural industry now relies on hyperspectral or multispectral sensors powered by CNN to take images of crops and analyze them with computer vision.
Computer Games: From checkers to AlphaGo, computer games use CNN for a variety of functions such as for choosing moves, evaluating positions, identifying the location and type of pieces, understanding the difference in the number of pieces between the two sides, among other things.

Advantages of Convolutional Neural Networks:

From enabling machines and systems to perform an accurate visual assessment to helping them produce state-of-the-art recognition results, convolutional neural networks are the most beneficial type of artificial neural network that is frequently used along with its counterpart, Recurrent Neural Networks to deliver most appropriate solutions to problems. That’s not all!

It offers various other advantages like:

Follows the concept of parameter sharing.
Captures the spatial features from an image.
Helps minimize and simplify computation, without losing the essence of the data.
Suitable for handling image classification, it uses the same knowledge across all image locations.
Requires lower pre-processing compared to other deep learning classification algorithms.
The network can process the standard MNIST dataset, containing images of handwritten digits.

Disadvantages of Convolutional Neural Networks:

As one of the most favored neural network types, convolutional neural networks are, no doubt, the best option for various visual assessment problems. However, there are certain disadvantages associated with this neural network that make it difficult for some to use, like:

It does not encode the position and orientation of the objects.
CNNs require a large dataset to process and train the neural network.

Conclusion:

Central to deep learning and machine learning, Convolutional Neural Networks has reached a new level of sophistication and importance in the past few years. From being a basic technique, it has advanced to a technology, which is now responsible for the advancement of a range of applications, implementation of computer vision, effortless identification of images and videos, and more. Today, be it analyzing security footage or enabling the automation of vehicles and machines, underlying each futuristic application and technology is this network.