# Neural Network in Never

## Introduction

Never is a functional programming language which includes matrices as first class objects. It is very likely that you hear about Never for the first time. I will demonstrate its major functions by implementing a simple neural network. In fact it is an example of neural network basic component known as perceptron. I will try to explain everything step by step so you can follow the article easily.

## Sigmoid

``````func sigmoid(x : float) -> float
{
1.0 / (1.0 + exp(-x))
}
``````

Almost all programming languages include a concept of a function. In the above example function `sigmoid` takes `x` as formal parameter and returns function value. The function transforms almost all values below zero to zero and almost all values above zero to one. It is used in neural networks to have zero-one results for parameters.

## Linear Congruential Generator

``````func randomize(seed : int) -> () -> int
{
let v = seed;
func rand() -> int
{
v = (v * 1103515245 + 12345) % 2147483648
}
rand
}
``````

Functions in Never can be nested to any level. They can also access parameters or local variables defined in their syntactic scope. In the above example function `rand` can access and modify variable `v` defined in the scope of function `randomize`. Each time the function is invoked variable `v` will be modified and can be treated as local counter.

Functions are also first class objects which means they can be accepted or returned by other functions. In the above example function `rand` is returned by function `randomize`. What is also noticeable is that `v` becomes closed in the scope of function returned. Because of that sometimes functions such as `rand` are called closures.

In particular function `rand` is a linear congruential generator or in simple words a function which generates pseud-random values. It will be used later to initialize neural network.

## Matrix Algebra

``````func print_matrix(W[D1, D2] : float) -> int
{
let r = 0;
for (r = 0; r < D1; r = r + 1)
{
let c = 0;
for (c = 0; c < D2; c = c + 1)
{
prints(W[r, c] + " ")
};
prints("\n")
}
}
``````

Never supports matrices which can be passed and returned by functions as any other type. In the above example function `print_matrix` takes two dimensional array `W` as its formal parameter. Function dimensions are accessible by parameters `D1` and `D2`. Such matrix passing style is known as conformant arrays and lets to avoid problems of accessing matrces elements out of their bounds. The above function prints matrix elements in row order. First all elements in columns of row 0, next of row 1, and so on. Function `prints` takes string as parameter and prints it on console. Addition operator is overloaded (which means it can take arguments of different types) and in the above case adds float and string parameter by changing float into its string representation.

``````func one_matrix(W[D1, D2] : float) -> int
{
let r = 0;
for (r = 0; r < D1; r = r + 1)
{
let c = 0;
for (c = 0; c < D2; c = c + 1)
{
W[r, c] = 1.0
}
}
}
``````

Functions can also modify passed parameters. The above function initializes all matrix elements to one. Later this matrix will be used to perform matrix subtraction.

``````func rand_matrix(W[D1, D2] : float, rand() -> int) -> int
{
let r = 0;
for (r = 0; r < D1; r = r + 1)
{
let c = 0;
for (c = 0; c < D2; c = c + 1)
{
W[r, c] = (rand() % 1000) / 1000.0
}
}
}
``````
``````func sigmoid_matrix(W[D1, D2] : float) -> [_,_] : float
{
let S = {[ D1, D2 ]} : float;

let r = 0;
for (r = 0; r < D1; r = r + 1)
{
let c = 0;
for (c = 0; c < D2; c = c + 1)
{
S[r, c] = sigmoid(W[r, c])
}
};
S
}
``````

Of course we may initialize matrix elements to any value. Function `rand_matrix` assigns values generated by function `rand` to matrix elements. The example also presents how function `rand` is passed as a parameter. Second example shows how function `sigmoid_matrix` calculates sigmoid values for each matrix elements.

``````func T_matrix(W[D1, D2] : float) -> [_,_] : float
{
let T = {[ D2, D1 ]} : float;

let r = 0;
for (r = 0; r < D1; r = r + 1)
{
let c = 0;
for (c = 0; c < D2; c = c + 1)
{
T[c, r] = W[r, c]
}
};
T
}
``````

One of common matrix operation is transposition. The function `T_matrix` returns matrix whose element at index `[r, c]` is placed at index `[c, r]` in the returned matrix.

``````func Hadamard_matrix(W1[D1, D2] : float, W2[D3, D4] : float) -> [_,_] : float
{
let H = {[ D1, D2 ]} : float;

let r = 0;
for (r = 0; r < D1; r = r + 1)
{
let c = 0;
for (c = 0; c < D2; c = c + 1)
{
H[r, c] = W1[r, c] * W2[r, c]
}
};
H
}
``````

Matrices can also be multiplied. Hadamard multiplication is a special kind of multiplication where each element of matrix `W1` is multiplied by corresponding element of matrix `W2`. Of course each matrix should have same dimensions size. Matrix multiplication which uses dot product to calculate values of its elements is implemented in Never by operator `*`.

As all basic building operations are described we can implement neural network.

## Forward Propagation Supervised learning is one of methods used to generate neural network. The algorithm has set of inputs and set of expected results. The difference between expected result and output generated by the network is used to change neural network parameters. The algorithm may be formalized in matrix algebra.

As usual lets start with basic components which are needed. As stated above we need set of inputs `x` and expected results `y`. They may be expressed in Never as follows:

``````    let x = [ [0, 1, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 1] ] : float;
``````
``````    let y = [ [1, 0, 1, 0] ] : float;
let yT = T_matrix(y);
``````

Expected results `y` are single values. To simplify notation instead of typing `y = [ , , ,  ]` it is more convenient to define vector `y = [ 1, 0, 1, 0 ]` and transpose it. As you may notice it is expected the the network will ignore boundary values `x0`, `x2` and return only value of `x1`.

Input weights `W` are first initialized to random values. The following code can set `[3, 1]` matrix.

``````    let W = {[ 3, 1 ]} : float;
let rand = randomize(165);
rand_matrix(W, rand);
``````

Now output calculation formula is simple. What is also worth noticing is that array `s` contains four values for each input sample.

``````    let z = {[ 4, 1 ]} : float;
let s = {[ 4, 1 ]} : float;

z = x * W;
s = sigmoid_matrix(z);
``````

This step finishes forward propagation. Actually, if values in matrix `W` were correct then we would get also proper output values. Unfortunatelly they need to adjusted by backpropagation phase.

## Backpropagation

The network needs to pass several learning cycles to change random values `W` set in initial phase into such which will give smallest error and thus expected results.

Lets recall that matrix `s` contains output values for each input value. Error can be calculated by subtracting each element of `yT` from `s`. Of course we keep these values in a matrix.

``````    let err = {[ 4, 1 ]} : float;
err = yT - s;
``````

Now we should correct weights in matrix `W` so that values for each input will lead to smallest error. For that sigmoid derivation is used. One may easily check that first derivative of function `s = 1.0 / (1.0 + exp(-x))` is `sD = s * (1 - s)`. First derivative multiplied by error (or function gradient) lets to move towards minimum of a function. This method is known as gradient descent. Please note that we want to correct values in `W` by subtracting error. Also we want to move in gradient's opposite direction. Thus plus sign in the equation. To summarize backpropagation equation is `W = W + xT * err * sD`.

All calculation are done for each input value at once. Matrix of input values is given by transpose of matrix `T`. To multiply values by their corresponding elements matrix Hadamard multiplication is used. Value `1` is replaced by matrix with all elements initialized to `1`.

``````    let sD = {[ 4, 1 ]} : float;
let one = {[ 4, 1 ]} : float;

sD = Hadamard_matrix(s, one - s);
W = W + xT * Hadamard_matrix(err, sD)
``````

Learning cycles are executed using loop over forward and backpropagation phases.

``````    let i = 0;
for (i = 0; i < 1000; i = i + 1)
``````

## Listing

``````func nn() -> int
{
let W = {[ 3, 1 ]} : float;
let one = {[ 4, 1 ]} : float;
let rand = randomize(165);

one_matrix(one);
rand_matrix(W, rand);

let z = {[ 4, 1 ]} : float;
let s = {[ 4, 1 ]} : float;

let i = 0;
for (i = 0; i < 1000; i = i + 1)
{
let x = [ [0.0f, 1.0f, 0.0f],
[1.0f, 0.0f, 0.0f],
[1.0f, 1.0f, 1.0f],
[0.0f, 0.0f, 1.0f] ] : float;

z = x * W;
s = sigmoid_matrix(z);

let y = [ [1.0f, 0.0f, 1.0f, 0.0f] ] : float;
let yT = T_matrix(y);
let err = yT - s;
let sD = Hadamard_matrix(s, one - s);

W = W + T_matrix(x) * Hadamard_matrix(err, sD)
};

z = ([[ 0.0f, 1.0f, 0.0f ]] : float) * W;
s = sigmoid_matrix(z);
print_matrix(s);

z = ([[ 0.0f, 1.0f, 0.0f ]] : float) * W;
s = sigmoid_matrix(z);
print_matrix(s);

0
}
``````

The above code presents the whole network learning algorithm on one listing.

## Beginning at the End

``````func main() -> int
{
nn();
}
``````

All code in Never begins in `main` function. Thus we only execute function `nn` inside it.

[ X ] y expected
[ 1, 1, 1 ] 0.96 1.00
[ 1, 1, 0 ] 1.00 1.00
[ 1, 0, 1 ] 0.00 0.00
[ 1, 0, 0 ] 0.05 0.00
[ 0, 1, 1 ] 1.00 1.00
[ 0, 1, 0 ] 1.00 1.00
[ 0, 0, 1 ] 0.05 0.00
[ 0, 0, 0 ] 0.50 0.00

Of course it is interesting to verify what neural network responses are. I prepared a code which outputs values for every possible input. As you may see in seven out of eight cases the network returns expected values.

## Summary

The article demonstrates how Never language can be used to define a simple neural network. Also supervised neural network learning algorithm is demonstrated. I hope you liked the article and learnt something useful about neural networks, matrix algebra or Never programming language.