In this post I hope to quickly lay some of the foundations for readers who are new to AI. To begin, I will demystify the "Neural Network" analogy.
What is a neuron
When we say "Neuron" or "unit" we are referring to the fundamental unit of a neural network. Each unit is a mathematical function which will receive an input and transform it into an output. We will call the input x, the function f(x) and the output y. The so called "neuron" is simply this:
x → f(x) → y
The function f(x) is called the activation function. It is usually chosen to be the sigmoid function, but I'll explain that choice in a later post.
What is a Neural Network?
A neural network is nothing but a collection of neurons where they are linked, in the sense that the output of one will be the input of another. It is said that a unit "passes" an output to another unit. The units are organized into "layers" where all the units of one layer are connected to all the units of another (usually the next) layer. The classic diagram of a basic neural network is shown below. The units on the left are input units (they each receive a numerical input) and together they form the "input layer" of the network. The middle layer is called a "hidden layer". Each unit in the input layer passes a value to each unit in the hidden layer. The hidden layer, in turn, passess values to each of the output units which in turn form the "output layer".
A few notes regarding these networks:
What is the network Input?
A feature is some measurable characteristic of the data you input to the network, for example, if the data is television series watched, the relevant feature might be the genre. The features of the input data contains information relevant in determining the output of the network. The features commonly used are numbers in a vector, but they could also be text such as letters and strings. They can also be completely arbitrary such as the number of black pixels in an image. The key point is that they must carry information that is relevant to the output of the network.
What is the network Output?
The output can be whatever information the programmer finds relevant. Generally speaking the output is a statistical prediction. Examples include what an image would look like without noise, the odds a sports team will win, a probability or confidence level for a certain classification problem (is this an image of a cat vs non-cat).
Both the input and the output will depend on the specific purpose of the neural network. The key is that the output is sufficient for the purpose of the network and that the input contains sufficient information to determine the output. For example, we might predict that images of clouds might be a suitable input for the output of the probability of rain. On the other hand, a poor prediction could be that the number of times the letter "s" appears in an article can predict the age of its readers. It is plausible that the appearance of clouds contains the information required for probability of rain. It is however implausible that the times "s" is used has anything to do with the age of the readers.
Now you have a basic understanding of what a neural network is! Hopefully seems a lot less spooky, but no less interesting; these things are incredible.
So, we have established that each unit in the input will be fed a value and they will all pass their output to each unit in the next layer. If you were thinking critically you may be wondering whether all networks do the same thing? Of course, the answer is no!
The reason we do not have this problem are parameters called "weights". Any connection between two units will be assigned a numeric weight which will act as a coefficient to multiply the value being passed between two units. These weights can be modified which allows the network to "learn". By adjusting these weights strategically, we are able to make the neural networks do all the crazy things we see in the media such as drive cars, make paintings etc. Of course, it's somewhat complicated how we arrange units into layers, and layers into networks but fundamentally they are as described above.
Here is a diagram showing four inputs being multiplied by their respective weights before they are summed to form the value which is input into the activation function. In the previous diagram, each unit in the hidden and output layers would act like this, even though the weights weren't shown.
I hope that this article was able to explain the fundamental building blocks of AI and clarify a couple of the associated buzz words. I want to keep these fundamentals posts short so that readers will not feel bogged down in long strings of ideas. That said, at least for the first few posts, each one will be heavily reliant on the previous one. For example, in the next article I am going to explain the steps the network follows in order to strategically modify the weights so that a network of functions can perform a useful task such as de-noising an image.