Hello learners! Welcome to the next episode of Neural Networks. Today, we are learning about a neural network architecture named Vision Transformer, or ViT. It is specially designed for image classification. Neural networks have been the trending topic in deep learning in the last decade and it seems that the studies and application of these networks are going to continue because they are now used even in daily life. The role of neural network architecture in this regard is important.
In this session, we will start our study with the introduction of the Vision Transformer. We’ll see how it works and for this, we’ll see the step-by-step introduction of each point about the vision transformer. After that, we’ll move towards the difference between ViT and CNN and in the end, we’ll discuss the applications of vision transformers. If you want to know all of these then let’s start reading.
The vision transformer is a type of neural network architecture that is designed for the field of image recognition. It is the latest achievement in deep learning and it has revolutionized image processing and recognition. This architecture has challenged the dominance of convolutional neural networks (CNN), which is a great success because we know that CNN has been the standard in image recognition systems.
The ViT works in the following way:
It divides the images into patches of fixed-size
Employs the transformer-like architecture on them
Each patch is linearly embedded
Position embeddings are added to the patches
A sequence of vectors is created, which is then fed into the transformer encoder
We will talk more about how it works, but let’s look at how ViT was introduced in a market to understand its importance in image recognition.
The vision transformer was introduced in a paper in 2020 titled “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” This paper was written by different researchers, including Alexey Dosovitskiy, Lucas Beyer, and Alexander Kolesnikov, and was presented at the conference on Neural Information Processing Systems (NeurIPS). This paper has different key concepts, including:
Image Tokenization
Transformer Encoder for Images
Positional Embeddings
Scalability
Comparison with CNNs
Pre-training and Fine-tuning
Some of these features will be discussed in this article.
The vision transformer is one of the latest architectures but it has dominated other techniques because of its remarkable performance. Here are some features that make it unique, among others:
ViT uses the transform architecture for the implementation of its work. We know that transformer architecture is based on the self-attention mechanism; therefore, it can capture information about the different parts of the sequence input. The basic working of Vi is to divide the images into patches, so after that, the transformer architecture helps to get the information from different patches of the image.
This is an important feature of ViT that allows it to extract and represent global information effectively. This information is extracted from the patches made during the implementation of ViT.
The classification token is considered a placeholder in the whole sequence created through the patch embeddings. The main purpose of the classification token is to act as the central point of all the patches. Here, the information from these patches is connected in the form of a single vector of the image.
The classification token is used with the sel-attention mechanism in the transformer encoder. This is the point where each patch interacts with the classification token and as a result, it gathers information about the image.
The classification token helps in the gathering of the final image after getting the information from the encoder layers.
The vision transformer architecture has the ability to train large datasets, which makes it more useful and efficient. The ViT is pre-trained on large sets such as ImageNet, which helps it learn from the general features of the images. Once it is fully trained, the training process using the small dataset is performed on it to get it working on the targeted domains.
One of the best features of ViT is its scalability, which makes it a perfect choice for image recognition. When the resolution of the images increases during the training process, the architecture does not change. The ViT has the working mechanisms to work in such scenarios. This makes it possible to work on high-resolution images and provide fine-grained information about them.
Now that we know the basic terms and working style of vision transformers, we can move forward with the step-by-step process of how vision transform architecture works. Here are these steps:
The first step in the vision transformer is to get the input image and divide it into non-overlapping patches of a fixed size. This is called image tokenization and here, each patch is called a token. When reconnected together, these patches can create the original input image. This step provides the basis for the next steps.
Till now, the information in the ViT is in pictorial format. Now, each patch is embedded with a vector to convert the information into a transformer-compatible format. This helps with smooth and effective working.
The next step is to assign the patches all spatial information and for this, positional embeddings are required. These are added to the token embeddings and help the model understand the position of all the patches of images.
These embeddings are an important part of ViT because, in this case, the spatial relationship among the image pixels is not inherently present. This step allows the model to understand the detailed information in the input.
Once the above steps are complete, the tokenized and embedded image patches are then passed to the transformer encoder for processing. It consists of multiple layers and each of them has a self-attention mechanism and feed forward neural network.
Here, the self-attention mechanism is able to capture the relationship between the different parts of the input. As a result, it takes the following features into consideration:
The global context of the image
Long dependencies of the image
As we have discussed before, the classification head has information on all the patches. It is a central point that gets information from all other parts and it represents the entire image. This information is fed into the linear classifier to get the class labels. At the end of this step, the information from all the parts of the image is now present for further action.
The vision transformers are pre-trained on large data sets, which not only makes the training process easy but also more efficient. Here are two phases of training for ViT:
The pre-training process is where large datasets are used. Here, the model learns the basic features of the images.
The fine-tuning process in which the small and related dataset is used to train the model on the specific features.
This step also involves the self-attention mechanism. Here, the model is now able to get all the information about the relationship among the token pairs of the images. In this way, it better captures the long dependencies and gets information about the global context.
All these steps are important in the process and the training process is incomplete without any of them.
The importance and features of the vision transformer can be understood by comparing it with the convolutional neural network. CNNs are one of the most effective and useful neural networks for image recognition and related tasks but with the introduction of a vision transformer, CNNs are considered less useful. Here are the key differences between these two:
The core difference between ViT and CNN is the way they adopt feature extraction. The ViT utilizes the self-attention mechanism for feature extraction. This helps it identify long-range dependencies. Here, the relationship between the patches is understood more efficiently and information on the global context is also known in a better way.
In CNN, feature extraction is done with the help of convolutional filters. These filters are applied to the small overlapping regions of the images and local features are successfully extracted. All the local textures and patterns are obtained in this way.
The ViT uses a transformer-based architecture, which is similar to natural language processing. As mentioned before, the ViT has the following:
Encoder with multiple self-attention layers and a final classifier head. These multiple layers allow the ViT to provide better performance.
CNN uses a feed-forward architecture and the main components of the networks are:
Convolutional layers
Pooling layers
Activation functions
Both of these have some important points that must be kept in mind when choosing them. Here are the positive points of both of these:
The ViT has the following features that make it useful:
Vit can handle global context effectively
It is less sensitive to image size and resolution
It is efficient for parallel processing, making it fast
CNN, on the other hand, has some features that ViT lacks, such as:
It learns local features efficiently
It has the explicit nature of filters so it shows Interpretability
It is well-established and computationally efficient
So all these were the basic differences, the following table will allow you to compare both of these side by side:
Feature |
Convolutional Neural Network |
Vision Transformer |
Feature Extraction |
Convolutional filters |
Self-attention mechanism |
Architecture |
Feedforward |
Transformer-based |
Strengths |
Local features Interpretability Computational efficiency |
Global context Less sensitive to image size Parallel processing |
Weaknesses |
Long-range dependencies Image size and resolution Filter design |
More computational resources' interpretability Small images |
Applications |
Image classification Object detection Image recognition Video recognition Medical imaging |
Image classification Object detection Image segmentation |
Current Trends |
N/A |
Increasing popularity ViT and CNN combinations Interpretability and efficiency improvements |
The introduction of the ViT is not old and it has already been implemented in different fields. Here is the overview of some applications of the ViT where it is currently used:
The most common and prominent use of ViT is in image classification. It has provided remarkable performance with datasets like ImageNet and CIFAR-100. The vision transformer has classified the images into different groups that provide the user with a guarantee of their best performance.
The pre-training process of the vision transformer has allowed it to perform object detection in the images. This network is trained specially to detect objects from large datasets. It does it with the help of an additional detection head that makes it able to predict bounding boxes and confidence scores for the required objects from the images.
The images can be classified into different groups using the vision transformer. It provides a pixel-level prediction that allows it to make decisions in great detail. This makes it suitable for applications such as medical imaging and autonomous driving.
The vision transformer is used for the generation of realistic images using the existing data sets. This is useful for applications such as image editing, content creation, artistic exploration, etc.
Hence, we have read a lot about the vision transformer neural network architecture. We have started with the basic introduction, where we see the core concepts and the flow of the vision transformer’s work. After that, we saw the details of the steps that are used in ViT and then we compared it with CNN to understand why it is considered better than CNN in many aspects. In the end, we have seen the applications of ViT to understand its scope. I hope you liked the content and if you are confused at any point, you can ask in the comment section.
Hello pupils! Welcome to the next session of the neural network series. I hope you are doing good. In the previous part of this series, I showed the double deep Q networks and discussed their differences from the deep Q network to make things clear. Today, I am going to visit a very popular neural network with you. This is the spiking neural network that mimics the functionality of the biological neurons with the help of spikes. This is a different neural network than the traditional networks and you will see the details of each point.
In this lecture, we’ll understand the introduction of the spiking neural network. We’ll discuss all the basic terms that are used while studying the SNN. After that, we’ll move on to the steps of using SNN in detail. In the end, we’ll move towards the applications of the SNN and understand how its similar structure to the brain helps to improve different applications.
The spiking neural networks (SNN) show a unique and inspiring neural network approach that is a perfect combination of deep learning neural networks, biological structure, and computational neuroscience. For their performance, the SNN uses spikes or pulses of electrical conductivity to communicate the information from one place to another. It is defined as:
"The spiking neural networks (SNN) are deep learning artificial neural networks that are inspired by biological structure and mechanisms and work with the help of discrete and precisely designed events known as spikes."
In traditional neural networks, continuous values are used to represent the activation functions but here, the continuous values are smooth and easy to implement with better performance.
The last decade has witnessed the seamless applications and features of artificial neural networks. But the history of these networks is older than this. The spiking neural networks can be traced back to the early neural networks. Here are some important highlights of the introduction and growth of SNN:
In 1952, Alan Hodgkin and Andrew Huxley were the first to publish their thoughts in research about squid giant axons’s action potential. This helped others understand the biophysical basis and this was the foundation for the idea of spiking.
In the same decade, Warren McCulloch and Walter Pitts presented the McCulloch-Pitts neuron, which is the first mathematical neuron model. This model is the foundation of early artificial neural networks. It utilizes the binary activation values.
In the 1960s, Frank Rosenblatt was successful in developing the perceptrons. It is a single-layer artificial neural network that is able to perform simple and basic tasks. This was first appreciated well but after that, people started criticizing it because it was useful on a very small level.
In 1970, Bernard Widrow and Ted Hoff presented Adaptive Linear Neuron (ADALINE). It is also a single-layer neural network but it works on continuously valued activation functions. Other people worked more on its improvements and as a result, better networks and outputs were seen during this time.
In the 2000s, research was performed on the neurons and this gave rise to mimicking structure in SNN. It resulted in the interest of other scientists in these techniques and the work on the spikings was boosted. This was the time when new algorithms and techniques were introduced for the SNN, and the improved performance not only showed more interest among the people but also broadened the domains of the SNN.
Currently, SNN is being used in different fields such as robotics, healthcare, artificial intelligence, etc. You will see the details of applications at the end of this article.
It's better to understand the basic concepts to understand the working principles and applications of SNN. These are the terms often used when dealing with spiking neural networks:
The spikes are the fundamental unit of communication in the spiking neural networks. These are also known as action potentials and are the brief pulses of electrical activity.
A spike is a sudden, rapid, and transient change that represents the output of the neuron.
These are in the form of firing neurons and are responsible for the transition of the neurons in the whole network.
The SNN relies on the spikes for the transmission of the data. This point is different from the traditional neural network where continuous activation functions are required for this purpose.
The information on the spikes like the timing and frequency are important factors of the network.
If the spikes have a precise relative timing to each other then these can encode the temporal information. Hence the SNN capture the dynamic nature of the biological neural system.
Spikes also play a fundamental role in the computational capabilities. They have multiple features related to computational capabilities such as:
Temporal data more effectively
Handle the complex spatiotemporal pattern
Potentially operate in a more energy-efficient manner (as compared to traditional artificial neural networks)
The advancement in the spikes research is resulting in more powerful SNNs.
In biological neurons, the cell membrane is responsible for maintaining the difference between the intracellular and extracellular environments. A similar concept is also present in the membrane potential of the spiking neural networks. Usually, the membrane potential is different in both these environments.
The membrane potential is the key concept in SNN that describes the electric potential difference across the cell membrane.
This is the dynamic quantity therefore, it changes with time and determines if the neuron has to generate the spike or not.
The neuron in SNN has the threshold membrane potential (discussed below). If the potential is less than this, no change occurs in it, Otherwise, the spike is generated.
The threshold potential is a specific minimum voltage level that a neuron must reach to generate the action potential (spike). Hence, it can be considered as a border of potential values and this is described as:
If
Potential values Then Neuron does not produce a spike If Potential values>=threshold value Then Neuron produces spike In SNN, the synaptic Weight is the measure of the connection strength of two neurons. This has an effect on the influence of one neuron on the other. Strong synaptic weight means a more substantial effect on the receiving neuron. As a result, there are more chances of firing the spike because of the incoming signal from such a neuron. The opposite case is in the weak neuron. As the name suggests, the excitatory input of the SNN is the type of input signal that results in more firing of spikes. The excitatory input results in the following processes in SNN: The input results in the depolarization of the neuron The membrane potential increases because of depolarization The potential may reach the threshold potential value The result of this value can be in the firing of a spike The inhibitory input is the opposite of the excitatory input. This results in the inhibition of the firing of spikes. The following processes occur in neurons when inhibitory input is added: The inhibitory input results in the hyperpolarization of the neuron The overall membrane potential decreases The neuron moves far from the threshold potential value There are less chances of spike firing A better understanding of this concept will be achieved when you know the following terms: A presynaptic neuron is one that sends the signal to the other neuron. The neuron that receives the signal from the presynaptic neuron. A port synaptic potential is any change in the membrane potential caused by the presynaptic neuron. It is the combinational effect of the excitatory input and inhibitory input. The collective effect of both of these changes the values of the membrane potential and if it touches the threshold potential, it results in the spike generation and vice versa. Temporal coding is the process of encoding the information in the neuron of SNN. Temporal coding is a more reliable method in SNN because it does not just rely on the firing rate of spikes but it also involves the information of the occurrence of spikes. In this way, the more precise and detailed information of the data. The rate coding is another type of coding where the average timing of neuron firing is involved. It involves information on the average firing rate of spikes. Other related information such as spikes in frequency over a given time. It is a different coding method from the temporal coding. The synapses are an important concept in SNN and it is defined as: "The synapses in SNN are the specialized junctions between two neurons and these play a crucial role in the communication between these two." In synapses, the synaptic plasticity is their ability to change their strength according to the experience in the SNN. it is done by making changes in the weights of synapses and as a result, the connection is modified to a stronger or weaker force according to the case. This is an important feature to understand. Just like the biological learning principles, that move towards the optimization of the whole system according to environment, the learning process of SNN is intelligent enough to provide the best performance. It means the modification of the synaptic weights according to the current condition of the network. As a result, the system of SNN works to move towards stability and optimization according to the environment. Through the basic concepts of the spiking neural network, the working principle of the spiking neural network is clear to you. Now, there is a need to discuss the flow of all the processes occurring in SNN. The working in SNN is accomplished in five steps given next: The setting of input and Synaptic Weights Membrane Potential Update process Spike Generation in SNN Spike Propagation in SNN Learning and Plasticity for the final results in SNN Here are the details of each step that will be easy for you to understand: The first step is to initialize the neurons to create the network. Each neuron has its specific features such as membrane potential, threshold values, etc. The information of a specific neuron is based on the spikes. These have synaptic weights that determine the strength of the presynaptic neuron to the postsynaptic neuron. Once the network is arranged successfully according to the requirements, the firing of the spikes occurs. Here, when the presynaptic neuron generates spikes, it transmits the signals. There is an effect on the potential difference of postsynaptic neurons. The nature of synapses decides if the signal is an inhibitory input or an excitatory input (as discussed above). The membrane potential continuously updates throughout the whole process. The overall effect of both these inputs results in the final membrane potential of neurons at a specific point. The membrane potential has a specific threshold value. If the potential reaches this value, the postsynaptic neuron fires the spikes. The inhibitory and excitatory inputs collectively influence the timing of the spikes. Every neuron can encode information like spiking frequency, etc. The firing of spikes results in the propagation of the signal to the next neuron in the network. This process is continuous throughout the network and results in the influence of the signal on sending and receiving neurons. The propagation of the spikes occurs throughout the network and after some time, the weight of the neuron is modified in the process of synaptic plasticity. This process depends on the multiple values in neurons and it affects the learning process of the network. This not only helps in the growth and learning of the network but allows it to adopt new information and stimulate multiple processes throughout the network. Spiking neural networks are one of the most popular emerging techniques in deep learning. The working of these networks is different from that of traditional neural networks; therefore, they have a little bit different and complex applications. Here are some of the main domains where SNN is being used along with other neural networks but the output of the SNN is different from others: In neuromorphic computations, the SNN is used for the development of specialized hardware and software systems. These are the copies or mimicry of the structure and features of the human brain. These computing chips are used for different purposes where memory and related features are required. For instance, the SNN is used in neuromorphic chips that offer high processing speed and efficiency in energy usage. The SNN plays a role in areas where sensory information is required to get better output. For instance, in fields where vision or audio recognition is required for the output, SNN is used for better processing because these can work on the spatiotemporal patterns. As a result, SNN has major applications in speech, voice, and vision recognition systems. The spiking neural networks are used in the specialized cameras. These are called event-based cameras and are designed to capture the changes of the event in the frame, unlike traditional cameras. These cameras have applications such as: Object tracking Motion analysis Gesture recognition Motion detection There are different processes in the field of brain-computer interfaces that can be improved with the help of SNN. For instance, communication or control processes are made better using this neural network because it has the feature of temporal dynamics. This allows it to do better with spiking behaviours, just like the human brain. The brain-like working of SNN is suitable for cognitive modeling. Usually, the researchers use SNN to understand the functionality and working of the neural networks and learn how they deal with cognitive mechanisms and learning tasks. SNN can work on the temporal aspects that help them in processes like: Information processing Decision making Human cognition This helps to improve the functionality of the system. One of the important applications of SNN is in neuroprosthetics, where it is implemented on specialized hardware chips. These chips are designed to be used in processes like edge computation and processing using sensors. As a result, these present parallelism and efficiency. Hence, today we have seen the details of spiking neural networks. These are the modern networks that are based on a similar structure of the brain. We started with the basic definition of SNN and saw the core concept that helped us understand the flow of the spiking neural network. After that, we have seen the details of the application of SNN to understand that it is widely used in domains where human brain-like behavior is required. I hope you find this article useful. If you have any questions, you can ask them in the comment section.Synaptic Weight in Spike Neural Network
Excitatory Input in SNN
Inhibitory Input in SNN
Post-Synaptic Potential (PSP) in SNN
Temporal Coding in SNN
Rate Coding in SNN
Synaptic Plasticity
Learning in SNNs
Working of Spiking Neural Networks
Initialization of Neurons in SNN
Update in the Membrane Potential of SNN
Spike Generation in SNN
Spike Propagation in SNN
Learning and Plasticity for the final results in SNN
Applications of Spiking Neural Networks
Neuromorphic Computation with SNN
Sensory Processing Using SNN
Spiking Neural Networks in Event-based Cameras
Brain-Computer Interface (BCI) and SNN
Cognitive Modeling Process using SNN
Use of SNN in Neuroprosthetics
Hey pupils! Welcome to the next session on modern neural networks. We are studying the basic neural networks that are revolutionizing different domains of life. In the previous session, we read the Deep Q Networks (DQN) Reinforcement Learning (add link). There, the basic concepts and applications were discussed in detail. Today, we will move towards another neural network, which is an improvement in the deep Q network and is named the double deep Q network.
In this article, we will point towards the basic workings of DQN as well so I recommend you read the deep Q networks if you don’t have a grip on this topic. We will introduce the DDQN in detail and will know the basic needs for improvement in the deep Q network. After that, we’ll discuss the history of these networks and learn about the evolution of this process. In the end, we will see the details of each step in the double-deep Q network. The comparison between DQN and DDQN will be helpful for you to understand the basic concepts. This is going to be very informative so let’s start with our first topic.
The double deep Q network is the advanced form of the Dqqp Q Network (DQN). We know that DQN was the revolutionary approach in Atari 2600 games because it utilizes the deep learning algorithm to learn from the simple raw game input. As a result, it provides a super human-like performance in the games. Yet, in some situations, the overestimation was observed in the action’s value; therefore, a suboptimal situation is observed. After different research and feedback from the users, the Double Deep Q Learning method was introduced. The need for the double deep Q network will be understood by studying the history of the whole process.
The history of the double deep Q network is interwoven with the evolution process of deep reinforcement learning. Here is the step-by-step history of how the double deep Q network emerged from the DQN.
In 2013, a researcher from Google DeepMind named Volodymyr Mnih and the team published a paper in which they introduced deep networks. According to the paper, the Deep Q network (DQN) is a revolutionary network that combines neural networks and reinforcement learning together.
The DQN made an immediate impact on the game industry because it was so powerful that it could surpass all the human players. Different researchers moved towards this network and created different applications and algorithms related to it.
The DQN gained fame soon and attracted a large audience, but there were some limitations to this neural network. As discussed before, the overestimation bias of DQN was the problem in some cases that led the researchers to make improvements in the algorithm. The overestimation was in the case of action values and it resulted in slow convergence in some specific scenarios.
In 2015, a team of scientists introduced the Double Deep Q Network as an improvement of its first version. The highlighted names in this research are listed below:
Ziyu Zhang
Terrance Urban
Martin Wainwright
Shane Legg (from Deep Mind)
They have improved it by applying the decoupling of action selection and action evaluation processes. Moreover, they have paid attention to deep reinforcement learning and tried to provide more effective performance.
The DDQN was successful in providing a solid impact on different fields. The DQN was impactful on the Ataari 2600 games only but this version has applications in other domains of life as well. We will discuss the applications in detail soon in this article.
The details of evolution at every step can be examined through the table given here:
Event |
Date |
Description |
Deep Q-Networks (DQN) Introduction |
2013 |
|
DQN Limitations Identified |
Late 2010s |
|
Double Deep Q-Networks (DDQN) Proposed |
2015 |
To address DQN's overestimation bias, Ziyu Zhang, Terrance Urban, Martin Wainwright, and Shane Legg propose DDQN. |
DDQN Methodology |
2015 |
DDQN employs two Q-networks
It effectively reduces overestimation bias through decoupling. |
DDQN Evaluation |
2015-2016 |
|
DDQN Applications |
2016-Present |
DDQN's success paves the way for its application in various domains, including:
|
DDQN Legacy |
Ongoing |
DDQN's contributions have established deep reinforcement learning (DRL) as a powerful tool for solving complex decision-making problems in real-world applications. |
The working mechanism of the DDQN is divided into different steps. These are listed below:
Action Selection and Action Evaluation
Q value Estimation Process
Replay and Target Q-network Update
Main Q-network Update
Let’s find the details of each step:
The DDQN has improved its working because it combines the action selection and action evaluation processes. For this, the DDQN has to use two separate Q networks. Here are the details of this network:
The main Q network is responsible for the selection of the particular action that has the highest prediction Q value. This value is important because it is considered the expected future reward of the network for the particular state.
It is a copy of the main Q network and it is used to evaluate the Q values the main network predicts. In this way, the Q values are passed through two separate networks. The difference between the workings of these networks is that this network updates less frequently and makes the values more stable; therefore, these values are less overestimated.
The following steps are carried out in the Q value estimation selection:
The first step is searching for state representation. The agent works and gets the state representation from the environment. This is usually in the form of visual input or some numerical parameters that will be used for further processing.
This state representation move is fed into the main Q network as an input. As a result of different calculations, the output values for the possible action are shown.
Now, among all these values, the agent selects the one Q value from the main Q value that has the highest prediction.
The values in the previous step are not that efficient. To refine the results, the DDQN applies the experience replay. It uses reply memory and random sampling to store past data and update the Q networks. Here are the details of doing this:
First of all, the agent interacts with the environment and collects a stream of experiences. Each of the streams has the following information:
The current state of the network
Action taken
The reward received in the network
The next state of the network
The results obtained are stored in replay memory.
The random batch of values from the memory is sampled at regular intervals. In this way, the evaluation of the action's performance is updated for each experience. It is done to get the Q values of the actions.
The target Q network updates the whole system by providing the accumulative errors therefore, the main Q network gets frequent updates and as a result, better performance is seen. The main Q network gets continuously learns and this results in better Q value updates.
Both of these networks are widely used in different applications of life but the main purpose of this article is to provide the best information regarding the double deep Q networks. This can be understood by comparing it with its previous version which is a deep Q network. In research, the difference between the cumulative reward at periodic intervals is shown through the image given next:
Here is the comparison of these two on the basis of fundamental parameters that will allow you to understand the need of DDQN:
As discussed before, the basic point where these two networks are differentiated is the overestimation bias. Here is a short recap of how these two networks work with respect to this parameter:
The traditional DQN is susceptible to overestimation bias therefore, Q values are overestimated and result in suboptimal policies.
The double deep Q networks are designed to deal with the overestimation and provide an accurate estimation of Q values. The separate channels to deal with the action selection and evaluation help it to deal with the overestimation.
The presence of two networks not only helps in the overestimation but also in problems such as action selection and evaluation, Q value estimation, etc.
In DQN, the overestimation results in the instability of the results at different stages which can cause the convergence in the overall results.
To overcome this situation, in DDQN, a special mechanism helps to improve the stability and as a result, better convergence is seen.
The deep Q networks employ the target network for the purpose of training stabilisation. However these target networks are directly used for the action selection and evaluation therefore, it has less accuracy.
The issue is solved in DDQN because of the periodic updations and it is done with the parameter of the online network. As a result, a stable training process provides better output in DDQN.
The performance of DQN is appreciable in different fields of real life. The issue of overestimation causes errors in some cases. So, it has a remarkable performance as compared to different neural networks but less than the DDQN.
In DDQN, fewer errors are shown because of the better network structure and working principle.
Here is the table that will highlight all the points given above in just a glance:
Feature |
DQN |
DDQN |
Overestimation Bias |
Prone to overestimation bias |
Effectively reduces overestimation bias |
Stability and Convergence |
Less stable due to overestimation bias |
More stable due to target Q-network |
Target Network Update in Q Networks |
Direct use of target network for action selection and evaluation |
Periodic updates of the target network using online network parameters |
Overall Performance |
Remarkable performance but prone to errors due to overestimation |
Superior performance with fewer errors |
Additional Parameters |
N/A |
Reduced overestimation bias leads to more accurate Q-value estimates |
The applications of both these networks seem alike but the basic difference is the performance and accuracy.
Hence, the double deep Q network is an improvement over the deep Q networks. The main difference between these two is that the DDQN has less overestimation of the action’s value. This makes it more suitable for different fields of life. We started with the basic introduction of the DDQN and then tried to compare it with the DQN so that you may understand the need for this improvement. After that, we read the details of the process carried out in DDQN from start to finish. In the end, we saw the details of the comparison between these two networks. I hope it was a helpful article for you. If you have any questions, you can ask them in the comment section.
Hello readers! Welcome to the next episode of the Deep Learning Algorithm. We are studying modern neural networks and today we will see the details of a reinforcement learning algorithm named Deep Q networks or, in short, DQN. This is one of the popular modern neural networks that combines deep learning and the principles of Q learning and provides complex control policies.
Today, we are studying the basic introduction of deep Q Networks. For this, we have to understand the basic concepts that are reinforcement learning and Q learning. After that, we’ll understand how these two collectively are used in an effective neural network. In the end, we’ll discuss how DQN is extensively used in different fields of daily life. Let’s start with the basic concepts.
Unlike this learning, supervised learning is done with the help of labeled data. Here are some important components of the reinforcement learning method that will help you understand the workings of deep Q networks:
Fundamental Components of Reinforcement Learning |
|
Name of Component |
Detail |
Agent |
An agent is a software program, robot, human, or any other entity that learns and makes decisions within the environment. |
Environment |
In reinforcement, the environment is the closed world where the agent operates with other things within the environment through which the agent interacts and perceives. |
Action |
The decision or the movement the agent takes within the environment at the given state. |
State |
At any specific time, the complete set of all the information the agent has is called the state of the system. |
Reward |
|
Policy |
A policy is a strategy or mapping based on the states. The main purpose of reinforcement learning is to design policies that maximize the long-term reward of the agent. |
Value Function |
It is the expectation of future rewards for the agent from the given set of states. |
Q learning is a type of reinforcement learning algorithm that is denoted by Q(s,a). Here, here,
Q= Q learning function
s= state of the learning
a= action of the learning
This is called the action value function of the learning algorithm. The main purpose of Q learning is to find the optimal policy to maximize the expected cumulative reward. Here are the basic concepts of Q learning:
In Q learning, the agent and environment interaction is done through the state action pair. We defined the state and action in the previous section. The interaction between these two is important in the learning process in different ways.
The core update rule for Q learning is the Bellman equation. This updates the Q values iteratively on the basis of rewards received during the process. Moreover, future values are also estimated through this equation. The Bellman equation is given next:
Q(s,a)←(1−α)⋅Q(s,a)+α⋅[R(s,a)+γ⋅maxa′Q(s′,a′)]
Here,
γ = discount factor of the function which is used to balance between immediate and future rewards.
R(s, a) = immediate reward of taking the action “a” within the state “s”.
α= The learning rate that controls the step size of the update. It is always between 0 and maxa′Q(s′,a′) = The prediction of the maximum Q values over the next state s′ and action value a′
The deep Q networks are the type of neural networks that provide different models such as the simulation of video games by using the Q learning we have just discussed. These networks use reinforcement learning specifically for solving the problem through the mechanism in which the agent sequentially makes a decision and provides the maximum cumulative reward. This is a perfect combination of learning with the deep neural network that makes it efficient enough to deal with the high dimensional input space.
This is considered the off-policy temporal difference method because it considers the future rewards and updates the value function of the present state-action pair. It is considered a successful neural network because it can solve complex reinforcement problems efficiently.
The Deep Q network finds applications in different domains of life where the optimization of the results and decision-making is the basic step. Usually, the optimized outputs are obtained in this network therefore, it is used in different ways. Here are some highlighted applications of the Deep Q Networks:
The Atari 2600 games are also known as the Atari Video Computer System (VCS). It was released in 1977 and is a home video controller system. The Atari 2600 and Deep Q Network are two different types of fields and when connected together, they sparked a revolution in artificial intelligence.
The Deep Q network makes the Atari games and learns in different ways. Here are some of the ways in which DQN makes the Atari 2600 train ground:
Learning from pixels
Q learning with deep learning
Overcoming Sparse Rewards
Just like reinforcement learning, DQN is used in the field of robotics for the robotic control and manipulation of different processes.
It is used for learning specific processes in the robots such as:
Grasping the objects
Navigate to environments
Tool manipulation
The feature of DQN to handle the high dimensional sensory inputs makes it a good option in robotic training where these robots have to perceive and create interaction with their complex surrounding.
The DQN is used in autonomous vehicles through which the vehicles can make complex decisions even in a heavy traffic flow.
Different techniques used with the deep Q network in these vehicles allow them to perform basic tasks efficiently such as:
Navigation of the road
Decision-making in heavy traffic
Avoid the obstacles on the road
DQN can learn the policies from adaptive learning and consider various factors for better performance. In this way. It helps to provide a safe and intelligent vehicular system.
Just like other neural networks, the DQN is revolutionizing the medical health field. It assists the experts in different tasks and makes sure they get the perfect results. Some of such tasks where DQN is used are:
Medical diagnosis
Treatment optimization
Drug discovery
DQN can analyze the medical record history and help the doctors to have a more informed background of the patient and diseases.
It is used for the personalized treatment plans for the individual patients.
Deep Q learning helps with resource management with the help of policies learned through optimal resource management.
It is used in fields like energy management systems usually for renewable energy sources.
In video streaming, deep Q networks are used for a better experience. The agents of the Q network learn to adjust the video quality on the basis of different scenarios such as the network speed, type of network, user’s preference, etc.
Moreover, it can be applied in different fields of life where complex learning is required based on current and past situations to predict future outcomes. Some other examples are the implementation of deep Q learning in the educational system, supply chain management, finance, and related fields.
Hence in this way, we have learned the basic concepts of Deep Q learning. We started with some basic concepts that are helpful in understanding the introduction of the DQN. These included reinforcement learning and Q learning. After that, when we saw the introduction of the Deep Q network it was easy for us to understand the working. In the end, we saw the application of DQN in detail to understand its working. Now, I hope you know the basic introduction of DQN and if you want to know details of any point mentioned above, you can ask in the comment section.
Hello students! I hope you are doing great. Today, we are talking about the decoders in the proteus. We know that decoders are the building blocks of any digital electronic device. These electronic circuits are used for different purposes, such as memory addressing, signal demultiplexing, and control signal generation. These decoders have different types and we are discussing the 3 to 8 line decoders.
In this tutorial, we will start learning the basic concept of decoders. We’ll also understand what the 3-to-8line decoders are and how we connect this concept with the 74LS138 IC in proteus. We’ll discuss this IC in detail and use it in the project to present the detailed work.
Where To Buy? | ||||
---|---|---|---|---|
No. | Components | Distributor | Link To Buy | |
1 | 74LS138 | Amazon | Buy Now |
A three to eight line decoder is an electronic device that takes three inputs and based on their combination, provides one of its eight outputs. In simple words, the 3 to 8 line decoder gets three inputs and reads the binary combination of its input. As a result, the single output is obtained at the output of the decoder. Here are the basic concepts to understand its working:
A 3 to 8 line decoder has three input pins which are usually denoted as A, B and C. These correspond to the three bits of the binary code. The term binary means these can only be 0 or 1 and no other digits are allowed. This can be the raw bits from the user or can be the output signal from the circuits’ device that becomes the input of the decoder.
The 3 to 8 decoder has eight possible output pins. These are usually denoted as Y0, Y1, Y2,..., Y7 and the output is obtained only at one of these pins. The output depends on the binary combination of the input provided to it. In large circuits, its output is fed into any other component and the circuit works.
As mentioned before, the combination of the binary input decides the output. Only one of the eight output pins of the decoder gets high which means, only one output has the value of one and all others are zero. The high pin is considered active and all other pins are said to be inactive.
The truth talbe of all the inputs and possible output of 3 to 8 decoders are given here:
Input MSB (A) |
Input B |
Input LSB (C) |
Active Output |
Y0 |
Y1 |
Y2 |
Y3 |
Y4 |
Y5 |
Y6 |
Y7 |
0 |
0 |
0 |
Y0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
Y1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
Y2 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
Y3 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
Y4 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
1 |
Y5 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
Y6 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
Y7 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
Here,
MSB= Most significant bit
LSB= Least significant bit
I hope the above concepts are now clear with the help of this truth table.
The 74LS138 is a popular integrated circuit IC that is commonly used 3 to 8 line decoder. It is one of the members of 74LS therefore, it is named so. The 74LS is a group of transistor transistor logic (TTL) chips. The basic feature of this IC is to get three inputs and provide the signal on only one pin of the output automatically based on the binary inputs. In addition to the input, output, and functionality of the 74LS138, there are some additional features listed below:
The 74LS138 has the cascading feature which means, two or more 74LS138 can be connected together to enhance the number of output lines. The circuit is arranged in such a way that the output of one 74LS138 IC becomes the input of the other and as a result, more than one ICs can work together.
The structure of this IC is designed in such a way that it provides high-speed operation. It is done because the decoders are supposed to decode the input so quickly that its output may stimulate other functions of the circuits.
The TTL compatibility of the 74LS138 makes it more accurate. The LS in its name indicate that these are part of low-power shotkey series therefore, these can be operated at the 5V power supply. This makes it ideal for multiple electronic circuits and these do not require any additional device to get accurate power.
These ICs are versatile because they come in different packages and the users can have the right set of ICs depending on the circuit he is using. Two common packages of this IC are given next:
DIP (Dual Inline Package)
SOP (Small Outline Package)
It has multiple modes of operation therefore, it has versatile applications.
Before using any IC in the circuit, it is important to understand its pinouts. The 73LS138 has the 16 pins structure that which is shown here:
The detailed names and features of these pins can be matched with the table given below:
Pin Number |
Pin Name |
Pin Function |
1 |
A |
Address input pin |
2 |
B |
Address input pin |
3 |
C |
Address input pin |
4 |
G2A |
Active low enable pin |
5 |
G2B |
Active low enable pin |
6 |
G1 |
Active high enable pin |
7 |
Y7 |
Output pin |
8 |
GND |
Ground pin |
9 |
Y6 |
Output pin 6 |
10 |
Y5 |
Output pin 5 |
11 |
Y4 |
Output pin 4 |
12 |
Y3 |
Output pin 3 |
13 |
Y2 |
Output pin 2 |
14 |
Y1 |
Output pin 1 |
15 |
Y0 |
Output pin 0 |
16 |
VCC |
Power supply pin |
The structure and working of this IC can be understood by creating a project with it and for this, we have chosen the Porteus to show the detailed working. Here are the steps to create the project of a 3 to 8 line decoder in Proteus:
Open your Proteus software.
Create a new project.
Go to the pick library by clicking the “P” button at the left side of the screen. It will show you a search box with details of the components.
Here, type 74LS138 and you will see the following search:
Double click on the IC to collect it on your devices.
Selecting this IC, click on the working sheet to place it there.
You can see the pins and labels of this IC.
The 74LS138 requires some additional components to be used as a decoder. Here is the project where we are using it as 3 to 8 line decoder:
74LS138 IC
8 LEDs of different colors
Switch SPDT
Switch SPST
Switch Mom
Switch (simple)
Connecting wires
Go to the pick library and get all the components of the circuits one after the other.
Set the 74LS138 IC in the working area.
On the left side of the IC, arrange the switches to be used as the input devices.
On the left side of the IC, arrange the LEDs that will indicate the output.
Go toto the terminal mode from the left side of the screen and arrange the ground and power terminals with the required devices.
The circuit at this point must look like the following image:
Connect all of these with the help of connecting wires. For convenience, I am using the labels to have better work:
Once you have connected all the components, the circuit is ready to use. In the left bottom corner, search for the play button and run the project.
Change the input with the help of switches and check for the output LEDs. You will see the circuit works exactly according to the truth table.
The 74LS138 is designed to be used as a 3 to 8 line so there is no need to connect different ICs and components to design the working of this decoder.
The input and output pins are present with this IC therefore, the user simply connects the switches as an input device. A switch has only two possible states that are either on or off therefore, it is an ideal way to present the binary input.
Usually, LEDs are used as the output devices so that when they get the signal, they are turned on and vice versa.
The ground and power terminals are used to complete the circuit.
Pins 4, 5, and 6 are called the enabled pins. These are labeled as E1, E2, and E3 pins. Out of these, E1 and E2 are considered as the active low pins which means, these are active only when they are pulled down. On the other hand, the E3 is considered an active high; hence it activates the output only when it is pulled high.
Once the circuit is complete, the user can change the binary inputs through the switches and check for the output LEDs.
The combination of inputs results in the required output hence the user can easily design the circuit without making any technical changes.
Today, we have seen the details of 74LS138 decoder IC in Proteus. We started with the basic introduction of a decoder and saw what is the 3 to 8 line decoder isdecoder. After that, we saw the truth table and the features of a 3 to 8 line decoder. We saw how 74LS128 works and in the end, we designed the circuit of a 3 to 8 line decoder using 74LS138. The circuit was easy and we saw it working in detail. If you have any questions, you can ask in the comment section.
Step into the world of precision engineering—where custom CNC machined parts transform raw materials into the sinews and bones of your next big project. Like a tailor crafting a bespoke suit, CNC machining offers an unparalleled fit for your specific requirements.
The prospect of holding your idea in your hands, not just on paper, is the realm where imagination meets implementation. But what options lie at your fingertips? Let's explore the paths to turning those digital blueprints into tangible assets.
Before the whirring of machines begins, your quest starts with choosing the right material—a decision as critical as selecting the foundation for a skyscraper. Each material whispers its own strengths and secrets, waiting to align with your project's demands.
For starters, aluminum stands out as a front-runner in popularity due to its lightweight yet robust nature —an ally for components in aerospace or portable devices. Imagine the sleek body of a drone or the frame of a prototype sports car; they likely share an aluminum heartbeat.
Stainless steel steps forward for projects where endurance and rust resistance are paramount. Think of medical devices that can withstand repetitive sterilization or marine parts whispering secrets to ocean waves without fear of corrosion.
Image Source: Pixabay
Delving deeper into specialties, titanium emerges when the strength-to-weight ratio is not just a preference but a necessity—ideal for high-performance sectors such as motorsports or prosthetics.
Brass occupies a niche where electrical conductivity must dance elegantly with malleability—perhaps in custom electronic connectors or intricate musical instruments.
Each material imparts its essence to your project, shaping not just function but also future possibilities. Which one will be the bedrock for your engineering aspirations?
The next step on our journey approaches like the unveiling of a trail in dense fog—selecting the appropriate CNC machining process that will breathe life into your vision. Each method manifests its prowess through sparks and shavings, ready to tackle complexity with finesse.
Better yet, since there are a variety of machines from Revelation Machinery on offer, with second-hand units representing better value than new equivalents, you can pick one of the following without breaking the bank or limiting yourself in terms of functionality and features.
3-axis milling is like the steadfast hiker; it's reliable and perfect for parts with fairly simple geometries. If your project involves creating a prototype bracket or a basic gear, this could be your marching tune. But when contours call for more intricate choreography, 5-axis milling pirouettes onto the stage. It invites you to envision turbine blades sculpted with aerodynamic grace or an ergonomic joystick that fits into hands as naturally as pebbles on a beach.
Image Source: Pixabay
Turning—the spinning dance between material and tool—offers cylindrical mastery manifested in objects rotating around their own axis. This is where items such as shafts for motors or precision rollers for conveyor systems are born from rotation's embrace.
But what if your piece hides complex internal features, akin to secret passages within a castle? Enter EDM—Electrical Discharge Machining —a process where electrical sparks rather than physical cutting tools unlock hidden gems. Ideal perhaps for making intricate molds used in injection molding machines that will churn out hundreds of thousands of perfectly replicated plastic knights.
As if wielding a magic wand, wire EDM carves with finesse where traditional tools cannot tread, slicing through hardened steel as easily as a hot knife through butter. Consider the labyrinthine path of a lightweight gear or the delicate framework of an instrument sensor—wire EDM is your guide through these intricate landscapes.
Then there’s the level-headed sibling in this family, plunge/sinker EDM—an ace up your sleeve when three-dimensional complexity calls. It's perfect for forming punch and die combinations used in manufacturing presses that shape sheet metal into automotive body panels or appliance housings with clockwork precision.
The truth nestled within these processes promises tailored solutions to even the most enigmatic engineering puzzles. Your custom CNC machined part will emerge from its fiery birthright not just created, but crafted with intent. In this emporium of efficiency and accuracy, which CNC sorcery will you enlist to transform your concept into creation?
Now that the form has been forged, it's time for the maestro—finishing—to step up and conduct a symphony of surfaces. This is where rough edges soften and exteriors gleam, ready for their grand debut.
Anodizing tiptoes onto stage left, offering its protective embrace to aluminum parts. It’s a finish that doesn't just add a splash of color but also bolsters resistance to wear and corrosion. Picture an aerospace fitting beaming with radiant blue or a fire engine red bicycle frame standing resilient against scratches and weathering.
Powder coating strides in with its own brand of rugged beauty—a finish that cloaks objects in a uniform, durable skin impervious to the elements. Outdoor machinery basks in its shielding layer, flaunting colors that withstand sun, rain, and the passage of seasons.
Image Source: Pixabay
For components that need to glide together as smoothly as ballroom dancers, you’ll want to consider precision grinding. Imagine automotive pistons or mechanical bearing races—their surfaces milled down to microscopic levels for tolerances tighter than a drum skin.
Perhaps your masterpiece calls for an understated elegance; then bead blasting might brush across the scene. It leaves behind a matte texture that diffuses light and speaks to sophistication. Its application speaks volumes on products where glare is the enemy and understated aesthetics are paramount—like the dashboard of a luxury car or the casing of high-end audio equipment, where touch and sight merge into user experience.
Let's not forget electroplating—the alchemist's choice that transmutes base metals into gold, well, in appearance at least. Here we witness components such as plumbing fixtures or electronic connectors being vested in extra layers for improved conductivity and aesthetic appeal, shimmering with purpose and resilience.
If subtlety is your aim, then passivation is your unassuming guardian. Stainless steel medical instruments or food processing parts bask in this chemical bath, emerging more stoic against rust and degradation—an invisible shield for an unspoken duty.
As the encore approaches with laser etching taking center stage, customization reaches another level. It allows you to adorn surfaces with serial numbers, logos, or intricate patterns—turning each part into a storyteller of its own journey from concept to finality.
All this info should set you up to make smart decisions ahead of creating custom CNC machined parts for any engineering project you have in the pipeline. And it’s worth restating that as well as choosing carefully, buying used machinery is another way to get great results that will make your budget manageable.