From a neuron to a layer · NorthGradient

A single neuron turns a list of inputs into one number. That is rarely enough. A layer is simply many neurons placed side by side, all reading the same inputs at the same time, each producing its own output. The useful trick is that the whole layer can be written as a single equation.

A layer is many neurons reading the same inputs at once. Stacking their weights into a matrix lets one equation describe the entire layer.

Many neurons, same inputs

Give the same input vector $\mathbf{x}$ to $m$ different neurons. Neuron $j$ has its own weights $\mathbf{w}_j$ and its own bias $b_j$ , so it computes its own sum exactly as before:

z_j = \mathbf{w}_j \cdot \mathbf{x} + b_j

Here $z_j$ is neuron $j$ ‘s raw sum, $\mathbf{w}_j$ is that neuron’s weight vector, and $b_j$ is its bias. Nothing here is new: it is lesson 1 repeated once per neuron.

One equation for the whole layer

Writing $m$ separate sums is tedious. Instead, stack every neuron’s weights as the rows of a single weight matrix $W$ , and stack the biases into one vector. The whole layer becomes:

\mathbf{z} = W\mathbf{x} + \mathbf{b}, \qquad \mathbf{a} = \sigma(\mathbf{z})

Symbol by symbol:

$\mathbf{x} \in \mathbb{R}^{n}$ is the input vector, with $n$ inputs shared by every neuron.
$W \in \mathbb{R}^{m \times n}$ is the weight matrix. It has $m$ rows (one per neuron) and $n$ columns (one per input). The entry $W_{ji}$ is the weight from input $i$ into neuron $j$ .
$\mathbf{b} \in \mathbb{R}^{m}$ is the bias vector, one bias per neuron.
$\mathbf{z} \in \mathbb{R}^{m}$ is the vector of raw sums, where row $j$ is exactly $z_j = \sum_{i=1}^{n} W_{ji} x_i + b_j$ .
$\sigma$ is applied to each entry on its own, so $a_j = \sigma(z_j)$ .
$\mathbf{a} \in \mathbb{R}^{m}$ is the layer’s output: $m$ numbers, one per neuron.

A layer: every input connects to every neuron, and each connection is one entry of the weight matrix W.

A worked example

Take two inputs $\mathbf{x} = [2, 3]$ and a layer of two neurons. Neuron 1 reuses lesson 1’s weights, and neuron 2 gets its own:

W = \begin{bmatrix} 0.5 & -1 \\ 1 & 0.5 \end{bmatrix}, \qquad \mathbf{b} = \begin{bmatrix} 1 \\ -2 \end{bmatrix}

The raw sums are $z_1 = (0.5)(2) + (-1)(3) + 1 = -1$ and $z_2 = (1)(2) + (0.5)(3) - 2 = 1.5$ . Applying the sigmoid to each gives the layer output. The same calculation in code:

import math

# two inputs, shared by every neuron in the layer
x = [2.0, 3.0]

# each row is one neuron's weights; this layer has two neurons
W = [[0.5, -1.0],
     [1.0,  0.5]]
# one bias per neuron
b = [1.0, -2.0]

# sigmoid activation, applied to one number
def sigmoid(t):
    return 1 / (1 + math.exp(-t))

# for each neuron: weighted sum of inputs, add its bias, then activate
a = [sigmoid(sum(w_i * x_i for w_i, x_i in zip(row, x)) + b_j)
     for row, b_j in zip(W, b)]

print(a)  # [0.2689414213699951, 0.8175744761936437]

The first output reuses the sigmoid value from lesson 2, confirming that a layer really is just the single neuron repeated and gathered into one equation.

In the next lesson, we will feed one layer’s output into another layer, which is what makes a network deep.