Perceptron: The Genesis
In the summer of 1958 a 30-year old research psychologist Frank Rosenblatt invented a machine that can be fed with some punch cards and the machine was capable of telling which cards were marked on the right and which cards were marked on the right [1]. The name of the machine was Perceptron. Although the perceptron was built as a machine at that time the main idea was that it was a single layer neural network which could solve linearly separable classification problems. Regarding perceptron Rosenblatt wrote that it is “a machine capable of perceiving, recognizing and identifying its surroundings without any human training or control.” [1]. That was the start and rest is the history of Artificial Intelligence.
Well, not so smooth actually. His claims were fiercely debated in the fledgling AI community. In 1969 two famous researchers Marvin Minsky and Seymour Papert wrote a book where they showed that the perceptron was unable to solve the nonlinear XOR function [2]. This single claim dealt a huge blow in the interest of AI research and caused an AI winter that lasted over a decade. Most of the community missed the fact that just an additional layer could solve the XOR function easily. Nonetheless the majority thought it was just a hype and forgot about AI for 10 years. Then came the 80s and there were some new interests in AI research. But, again AI research saw another winter from late 80s to early 90s.Then Geoffrey Hinton and some of his famous students reignited the interested in AI and turbocharged AI research which was culminated in 2012 when Hinton and computer scientist Alex Krizhevsky solved the ImageNet challenge by achieving an error rate of 16%, a humongous 9% improvement over previous best of 25% [3]. This achievement showed the power of Deep Learning and AI and AI interest started to get broader and broader since then.
Now, I do not want to write the entire history of AI but it’s fascinating and worth learning. Perceptron is such an important algorithm in the history of AI that I just wanted to give some context here before I show how it works. It is too important to go without an introduction! Let’s jump on the details of this simple algorithm and discuss how it works.
Perceptron Algorithm – Under the Hood:
Perceptron is the simplest form of Neural Network. It has only one layer (output layer) and step loss function. It works on binary classification of linearly separable cases.
Following is a diagram of a perceptron model. It has inputs and an output layer. There is no hidden layer in a perceptron. The output neuron in the output layer has a threshold activation function that predicts a specific value if the output is below a threshold and another specific value if the output is above the threshold.
Let’s start with a simple example to show how the perceptron algorithm works.
Assume we have three input neurons: x1, x2, and x3.
Initial weights connecting input neurons to the output neuron: w1, w2, w3
Desired output: 0, This is the output we want the model to predict.
Assume, threshold = 0
Threshold activation function:
f(x) =1, if w.x+b>0 … … eq(1)
f(x) =0, otherwise … … eq(2)
In the above we are using an activation function that produces a value of 1 when output is greater than threshold of 0 and produces a value of 0 when output is less than or equal to the threshold of 0.
We update weights of the perceptron according to the following formula:
wnew = wold + α * (desired – ouput) * input [2]
Here, α is the learning rate, which determines how fast the model will approach to learn the correct weights.
A Simple Example Demonstration of Perceptron Learning:
Let’s assume following is our example problem:
Here, the inputs are 1, 0, 1 and the initial weights w1, w2, w3 are 0.6, 0.3, 0.9 respectively.
Assume desired output = 0
Threshold = 0
Learning rate, α = 1
And bias, b = 0.1
The perceptron algorithm will work in this way: If the model activation value is greater than the threshold value of 0 then the model will predict 1 as we mentioned in eq(1). If it is less than the threshold value of 0 then the model will predict 0.
Now, with the given input values and initial weights we can calculate the value of the activation function:
f(x) = x1*w1 + x2*w2 + x3*w3 + b
f(x) = 1*0.6 + 0*0.3 + 1*0.9 + 0.1 = 1.6
Now, here the activation value is 1.6 which is greater than the threshold value of 0. Thus, with these weights, according to our eq(1), the model will predict 1. But, remember our desired output is 0.
So, we need to update the weights to see if we get the activation value less than the threshold so that the model can predict 0.
w1new = w1 + α * (desired – output) * input
Thus, w1new = 0.6 + 1 * (0 – 1) * 1 = -0.4
Similarly, w2new = 0.3 + 1 * (0 – 1) * 0 = 0.3
And w3new = 0.9 + 1 * (0 – 1) * 1 = -0.1
Now, we have updated all three weights for the next iteration. Let’s check what activation function value we get with these updated weights:
f(x) = x1*w1new + x2*w2new + x3*w3new + b
Or, f(x) = 1 * (-0.4) + 0 * (0.3) + 1 * (-0.1) + 0.1 = -0.4
Now, this new value of the activation is less than the threshold value of 0.
So, the model will now predict 0, our desired output. Our model has already learned the correct weights (w1 = -0.4, w2 = 0.3, w3 = -0.1) to use to solve this problem!
We have trained our model in just one iteration thanks to a huge learning rate of 1.0. But, this is the nutshell of the perceptron algorithm. We start with some weights and then update the weights to recalculate the activation function and continue until our model gives us the desired output.
We have seen from the above demonstration how perceptron works and can be able to solve linearly separable classification problems. Perceptron is a very simple yet very important algorithm. It is kind of the grand-daddy of the more powerful artificial neural networks. Given it is so simple yet carries some essence of the more powerful neural networks it is always worth to learn perceptron first before delving into the details of the more intricate neural networks. One of the main limitations of perceptron is that it cannot solve the problems that are not linearly separable, such as the XOR function. However, by adding only one hidden layer (with some certain number of neurons depending on our problem) between input and output layers a neural network model can learn any nonlinear functions including the XOR function gracefully. In the next article we will add that coveted hidden layer and make it a neural network that can learn any nonlinear function. Until then stay tuned!
References:
- Professor’s perceptron paved the way for AI – 60 years too soon (https://news.cornell.edu/stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon)
- Perceptron (https://en.wikipedia.org/wiki/Perceptron)
- ImageNet (https://en.wikipedia.org/wiki/ImageNet#ImageNet_Challenge)