What the network represents

The last four lessons built the machine. Now the question is what it can express. When a network ends in an output used to pick a class, it divides the input space into regions, one per class. The surface separating those regions is the decision boundary, and its shape is exactly what determines which problems the network can solve.

A single neuron can only split the input with a straight line. Stacking layers with nonlinear activations bends that boundary into curves.

A neuron draws a line

A neuron labels a point by the sign of its sum $z = \mathbf{w}\cdot\mathbf{x} + b$ . The points where it switches sign are the points where $z = 0$ . In two dimensions:

w_1 x_1 + w_2 x_2 + b = 0

$x_1, x_2$ are the two coordinates of the input point.
$w_1, w_2$ are the weights, and $b$ is the bias.
The set of points satisfying this equation is a straight line. On one side $z > 0$ , on the other $z < 0$ .

If the neuron uses a sigmoid and we threshold its output at $0.5$ , the boundary is the same line, because $\sigma(z) = 0.5$ exactly when $z = 0$ . A single neuron is a straight-line classifier and nothing more. Here it is implementing the AND pattern, where only the point $(1,1)$ is class 1:

# a single neuron used as a linear classifier
w = [1.0, 1.0]
b = -1.5

# z > 0 means the point lies on the positive side of the line w . x + b = 0
def classify(point):
    z = sum(w_i * x_i for w_i, x_i in zip(w, point)) + b
    return 1 if z > 0 else 0

# test the four corners of the unit square
for point in [(0, 0), (0, 1), (1, 0), (1, 1)]:
    print(point, "->", classify(point))
# (0, 0) -> 0
# (0, 1) -> 0
# (1, 0) -> 0
# (1, 1) -> 1

One straight line cleanly separates that pattern, because the single class-1 corner sits on its own side of a line.

Curves need depth

Now change the target to XOR, which is class 1 when the two inputs differ: $(0,1)$ and $(1,0)$ are class 1, while $(0,0)$ and $(1,1)$ are class 0. The two class-1 points sit on one diagonal and the two class-0 points on the other. Any straight line you draw keeps one class-1 point together with a class-0 point, so a single neuron cannot do it.

This is the geometric payoff of lesson 2. Without a nonlinear activation, stacking layers collapses back to one line and changes nothing. With a nonlinearity between layers, the network composes several lines into a bent boundary that a single line cannot produce.

One straight line cannot separate XOR (left); stacking layers bends the boundary into a region that holds both class-1 points and excludes both class-0 points (right).

The right panel shows a boundary made of two lines forming a diagonal band, the kind of shape a small hidden layer can produce. It holds both class-1 points inside and leaves both class-0 points outside.

There is still a gap. We chose the AND weights by hand, but we have not said how a network finds the weights that produce a boundary like the one on the right.

In the next lesson, we will define a loss function: a single number that measures how wrong the network’s outputs are, which is the first step toward learning those weights.