An Introduction to Neural Networks
Neural networks or also known as Artificial Neural Networks (ANN) are networks that utilize complex mathematical models for information processing. They are based on the model of the functioning of neurons and synapses in the brain of human beings. Similar to the human brain, a neural network connects simple nodes, also known as neurons or units. And a collection of such nodes forms a network of nodes, hence the name "neural network."
Similar to the human brain, in a neural network, an array of algorithms are used to identify and recognize relationships in data sets. Neural networks are designed to adapt to dynamic input scenarios; with the result, the best possible outcomes are provided by the network without having to rework the design of the output for further processing.
From a utilization standpoint, Neural Networks are being used on a variety of technologies and applications such as video games, computer vision, speech recognition, social network filtering, playing board, machine translation, and medical diagnosis. Surprisingly, neural networks are being used for activities that are traditional and creative, like painting and art.
Components of Neural Network
At this point, it is important to know and understand what constitutes a neural network and its components.
Neural Networks consist of artificial neurons that are similar to the biological model of neurons. It receives data input and then combines the input with its internal activation state as well as with an optional threshold activation function. Then by using an output function, it produces the output.
The initial inputs are data from various external sources, such as voice files, images, and documents. The final outputs could be recognizing a voice input or an object in an image or text. The significance of the activation function is that it provides a seamless and differentiable transition as input values change dynamically. So, a small change or shift in input data produces a small change in the output.
2. Connections and Weights
Neural Network consists of connections and weights, where each connection throws an output of one neuron, which becomes an input to another neuron in the network. A weight is assigned to each connection, and it represents its relative importance on the neural network. Any given neuron can have many to many relationships with multiple inputs and output connections.
It is an organization of the neurons into multiple layers. This aptly applies in the areas of Deep Learning. It is designed in such a way that Neurons are connected to the immediate neighboring layers of neurons. What this means is that neurons of one layer connect only to neurons of the immediately preceding and immediately following layers. The input layer is one that receives external data, and the layer that delivers the final result is the output layer. There can be more hidden layers or no layers in between them. In some scenarios, un-layered and single-layered networks are also possible, and multiple connection patterns are possible between two layers. It is so versatile and maximal that a fully connected neuron set is possible where every neuron in one layer is connected to every neuron in the next layer. It is so flexible that a group of neurons can be pooled in one layer to get connected to a single neuron in the adjacent layer, where the number of neurons in that layer is reduced. This kind of connection gives rise to what is known as Feedforward Networks, and only such connections form a Directed Acyclic Graph. Further, networks that permit connections between neurons in previous or the same layers are called Recurrent Networks. We shall see the types of Neural Networks later in this article.
A hyperparameter is an initial parameter whose value is a constant, and it is set before the beginning of the learning process of the neurons. The subsequent values of parameters are derived during the process of learning. Some examples of hyperparameters are Learning Rate, Hidden Layer Number, and Batch Size. The values of some of the hyperparameters can have dependencies on the value of other hyperparameters. For example, the overall number of layers can have a dependency on the size of some layers.
Learning is the process by which the network adapts itself to handle a task better by factoring in sample data observations. Learning involves calibrating the weights and optional threshold values of the network to obtain more accurate results. This is performed and achieved by minimizing the errors that are observed. The process of learning reaches an optimum when additional observations that are examined do not contribute to the reduction of the error rate. It must be noted that even after the learning process is complete, the error rate in most scenarios does not reach "0". If the error rate is too high even after the learning process, the network requires to be redesigned.
In practical application, a cost function is defined, and this is evaluated periodically during the learning process. The learning process continues as long as the output numbers continue to decline. The cost function is frequently assessed and is defined as a statistic value that can only be approximated. An example could be a learning process for the word 'Dog.' The outputs are numbers, and when observation of the error is low, the difference between the output values (almost certainly a dog) and the correct answer (dog) is minuscule. The learning process attempts to reduce the aggregate differences in the values across observations. Most learning models use optimization theory and statistical estimation.
6. Learning Rate
The learning rate for each observation defines the size of the corrective steps that the model takes to adjust for errors. A high learning rate could reduce the training time, but the output can be less accurate. So in practice, a lower learning rate is preferred, which takes longer, but has the potential to deliver greater accuracy. Optimization techniques such as Quickprop are primarily targeted at enhancing the speed of error minimization. Other learning improvement techniques mainly attempt to achieve higher reliability of scores.
To improve the rate of convergence where the errors eventually reduce to the maximum and to avoid oscillation inside the network such as alternating connection weights, Adaptive learning rate models are used that increase or decrease as required. There is a concept of 'Momentum' that allows a balance between the gradient and the previous change to be weighted so that the weight calibration is dependent to a certain degree on the previous change. A Momentum close to '0' indicates the gradient, while a value close to '1' indicates the last change.
7. Cost Function
A cost function in simple terms measures the performance of a neural network model. It also may have dependencies on other variables like weights and biases. A cost function is not a vector but a single value. It provides a measure of how well a neural network performs as a whole. It may be possible to define an ad hoc cost function. Still, in most situations, the choice of the value of cost function is derived by the function's desirable properties like convexity or because the cost function's inputs emerge from models such as in the probabilistic model where the posterior probability can be used as an inverse cost.
8. Propagation Function
The propagation function computes the weighted sum of the input to a neuron from the outputs of its predecessor neurons and their connections. A bias term can be appended to the result of the propagation function
Backpropagation is a methodology for adjusting the connection weights to calibrate for each occurrence of an error in the learning process. The error value is divided among the connections in an equal manner. In technical terms, during the process of backpropagation, the gradient viz. the derivative of the cost function associated with a given state for the weights is calculated. Typically, the updates on the weights are done using stochastic gradient descent modeling or other methods.
Types Of Neural Networks
1. Feed-forward Neural Network
This is the simplest model of a Neural network. Feed-forward neural networks are fast while using; however, from a training perspective, it is a little slow and takes time. Most of the vision and speech recognition applications use some form of feed-forward type of neural network.
The feed-forward network is non-linear. The primary reason for these networks to be called as feed-forward is that the flow of data takes place in the forward direction more so the data travels in a unidirectional way viz. input to output. Different functions can be arranged to depict these networks. Each model can be depicted as a graph where the functional groups are described. An example could be, three functions f(1), input layer one, f(2) is layer two, and f(3) is the output layer. So the information is passed from the input layer to the next layer where the computation takes place, which in turn gets passed to the output layer.
2. Radial Basis Functions (RBF) Neural Network
In this type of neural network, the data is grouped based on its distance from a center point. In situations where there is no training data, the data is grouped, and a center point is created. This network is designed to look for data points that are similar to each other and then group the data. An example application of this type of neural network is Power Restoration Systems.
To explain further for better understanding, a Radial Basis Function (RBF) neural network has three layers - an input layer, a hidden layer, and an output layer. The hidden layer is non-linear, and the output layer is linear. Applications of RBF networks are image processing, speech recognition, and medical diagnosis.
RBF Networks – The three layers - Details:
1. Input Layer
For each predictor variable, there is one neuron in the input layer, and in the situation of categorical variables, N-1 neurons are utilized where N represents the number of categories. The standardization of the range of the values is performed by the input neurons, where the median value is subtracted and divided by the interquartile range. Subsequently, the input neurons feed each of the values to the neurons in the hidden layer.
2. Hidden Layer
This layer consists of a variable number of neurons, and the training process determines the exact number. Every neuron contains a radial basis function centered on a point. The number of dimensions and the number of predictor variables is the same every neuron. For each dimension, the spread or the radius of the RBF function could be different. The training process defines and determines the centers and spreads. The hidden neuron computes the Euclidean distance of the test case from the neuron's center point. The values thus obtained after applying the RBF kernel function to the distance using spread values, is passed to the summation layer.
3. Summation Layer
The output value of a neuron from a hidden layer is multiplied by a weight associated with the neuron and shifted to the summation function where the weighted values are added, and the sum is presented as the output of the network. Wherever there is a classification dilemma, there is one output with a separate group of weights and a summation unit for each category target. The output value is a probability that the case that is being studied or under evaluation has that particular category.
RBF networks are quite identical to K-Means clustering, PNN, and GRNN networks.
- In PNN/GRNN networks, each point in the training file has one neuron. In the case of RBF networks, there are variable numbers of neurons which are generally lesser than the number of training points.
- In small to medium-sized training sets, PNN or GRNN networks generally are more accurate than RBF networks. The downside is that PNN or GRNN networks are not practically suitable for large training sets.
3. Kohonen Self-organizing Neural Network
As per – Scholarpedia, supported by Brain Corporation, "Kohonen Network, which is also called Self-Organizing Map (SOM), is used for the visualization and analysis of high-dimensional data, specifically experimentally acquired information. It is a computational method where it defines an ordered mapping and projects on to a regular two-dimensional grid from a set of given data points.
The SOM was primarily developed for visualization of distributions of metric vectors, like ordered sets of measurement values and statistical attributes. It can practically be shown that the mutual pairwise distances of data can be defined by utilizing a SOM-type mapping for any data set or items. SOM computational methods can be applied to non-vectorial data sets as well, such as strings of symbols and sequences of segments in organic molecules."
4. Recurrent Neural Network
In a Recurrent Neural Network (RNN), the previous step's output is given as input to the following step. In conventional neural networks, the entire inputs and outputs are independent of each other. However, when there is a need to predict the next word of a sentence, the previous words are needed, and that necessitates a need to remember the previous words. So RNN was developed and was designed to solve this issue of remembering the previous input with the help of a Hidden Layer.
The paramount feature of RNN is its Hidden state, where the information about a sequence is remembered. It has "memory," which remembers and recalls all information of what has been calculated in the previous computational steps. In RNN, the same parameters are used for each input since the task is the same irrespective of whether it performs the computation for inputs or hidden layers to produce the output. This significantly helps by reducing the complexity of parameters as compared to other neural networks. RNN has a wide variety of applications, one of which is TTS (text-to-speech) synthesis.
Advantages of Recurrent Neural Network
- RNN is intelligent enough to remember every piece of information across the network and is very useful in time series prediction. This is the primary reason that it is used in such kind of applications as it can remember previous inputs as well.
- Inputs of any length can be processed in this model.
Disadvantages of Recurrent Neural Network
- Exploding and gradient vanishing is common in this model.
- Training an RNN is quite a challenging task.
- It cannot process very long sequences if using 'tanh' or 'relu' as an activation function.
5. Convolution Neural Network
One of the well-known algorithms for machine learning, more specifically, deep learning, is a Convolutional Neural Network (CNN or ConvNet). In CNN, the model learns to execute tasks directly from images, video, text, or sound. CNNs find patterns in images and pictures to recognize objects, faces, and scenes. The learning happens directly from image data. They classify images by the use of patterns and eliminate the requirement for human interaction for feature extraction.
It is interesting to know that CNN is powerful when it comes to applications that require object recognition and computer vision, such as self-driving vehicles and face-recognition. Based on the application, one can develop a CNN from scratch. Pre-trained models can also be used for the data set. CNNs are architected quite well for image recognition and pattern detection. The advancements in GPUs and Parallel computing have made CNNs very robust and capable of delivering high quality in automated driving and facial recognition. CNNs are versatile in that they learn to identify the differences between a traffic signal and a pedestrian.
Why are CNNs useful?
- CNNs eliminates the need for manual or human intervened feature extraction efforts.
- CNNs deliver the highest quality results in recognition results.
- CNNs enabling building on pre-existing networks, thus retraining for new recognition tasks is made possible.
- Micromanagement of input features is possible. In this sense, the input features are handled as batches. This permits the network to remember an image in several parts.
6. Modular Neural Network (MNN)
In a Modular neural network, the results are collectively contributed by several independent networks. These independent neural networks perform several sub-tasks constructed by each of these neural networks. This type of activity provides a group of unique inputs as compared to other neural networks.
Further, in this type of neural network, modularity helps in lessening the complexity of a problem to be solved since these modular networks completely break down the computational processes into small components. The speed of the computation is also significantly improved since the number of connections is broken down, which therefore reduces the requirement for interactions between these neural networks. Also, the total time of processing is dependent on the number of neurons that are involved in the computation process.
An interesting fact is that Modular Neural Networks are probably the fastest-growing areas in Artificial Intelligence.
As we have seen, there are several types of Neural Networks, and each of these is applied for different requirements to achieve desired outcomes. The significant thing about neural networks is that they are modeled designed keeping in mind the way the neurons in the brain work. So, what can be expected eventually is that these networks will learn more and improve more with more data and utilization. The difference between traditional Machine Learning (ML) algorithms and neural networks is that ML will tend to stagnate after a point. In contrast, Neural networks can truly grow in performance and outcome with increased data and usage.
That is one of the reasons as to why many industry experts staunchly believe that neural networks will be the basic framework on which next-generation Artificial Intelligence will be built and grow. For sure, by now, you would have got a good understanding of the concept of Neural Networks and its types.
People are also reading:
- What is Natural Languages Processing?
- Introduction to Classification Algorithm
- What is Data Validation?
- Naive Bayes Algorithm
- Data Science Applications
- Data Science Internship
- Breadth-First Search Algorithm
- What is Apriori Algorithm?
- Best Data Analysis Software
- What is Unsupervised Learning?
- Data Analysis Technique
- What is Reinforcement Learning?