The Smallest Brain You Can Build

A perceptron is the smallest brain you can build. A number goes in. A yes-or-no answer comes. That’s the whole point.

This seems very simple. But this small idea is the seed of every neural network running today. In this post we build a perceptron from scratch in Python, and we watch it learn, live, in your browser. No heavy math. No big library. Just a weight, a bias and a loop.

I’m not a native English speaker, and I’m still learning this area myself. So I’ll explain it like I needed someone to explain it to me. Slowly, and from the ground up.

What is a perceptron?#

In 1958, a researcher named Frank Rosenblatt built a machine he called the perceptron.

It was inspired by a brain cell, a neuron. A neuron receives signals, and if those signals are strong enough, it activates. Rosenblatt copied that idea into mathematics:

output = 1   if (w · x + b) > 0
         0   otherwise

Here x The input is w is the weight, and b There is prejudice. Don’t worry about those words right now. We’ll meet each of them by building something real.

Think like a first person#

Before a machine decides anything, let us see a human being decide. Meet John Doe. She has a job offer, and she must answer one question: should she take it?

John does not toss a coin. He weighs things. Some factors matter more to him than others.

factor (input) price How much does John care (weight)
additional salary High Very
lives in the same city no, he has to move Very

John multiplies each factor by how much he cares about it, then adds everything up. If the total is high enough, he says yes. If not, he says no.

That’s a perceptron. factors are inputs. how much he cares weight. And “high enough” is a threshold he keeps in his mind. Hold on to that threshold. Later we will give it a name: bias.

A perceptron modeled as a neuronTwo inputs, Pay and Same City, were each multiplied by a weight, added together with a bias, and transformed into a single yes-or-no output.w₁w₂+ bSalarySameCityΣtake the offer?yes no
How John Doe makes decisions: Each input is multiplied by a weight, the results are summed with a bias, and the total becomes a yes-or-no answer.

The simplest possible decision: is this number positive?#

Let us shrink the problem until almost nothing is left. An input. One question.

Is this number positive?

This is it. Feed a number to the machine. It should answer true for positive and false for negative.

The machine makes its guess as follows:

prediction = (weight * value + bias) > 0

Multiply the input by the weights, add the bias and check if the result is above zero. If so, it predicts truth. If not, it makes a wrong prediction. This is a short formula classifierAlso called decision function.

Initially, the weights and bias are just random numbers. So the machine makes a bad guess. Now comes the only clever part: it learns from its mistakes.

if prediction != result:
    error = result - prediction      # True - False = 1, False - True = -1
    weight += learning_rate * error * value
    bias   += learning_rate * error

When the estimate is wrong, we move the weights and bias in the right direction. Mistake Tells us which way to push. learning rate This decides how big each push is. We do this for each instance, then repeat the entire pass. A complete pass over the data is called a Era. eras repeat Training.

Here is that perfect machine. Press train And watch it learn. Each green dot is a positive number (true), each red dot is negative (false), and the blue dashed line is where it decided to split them.

Era 0
weight 0
Partiality 0
Limit
accuracy 0%



It snaps into place almost instantly. Look at the readout: the range is right around 0And prejudice sits close by 0 Very.

That is no accident. For this problem, we never needed bias. Which is strange, because bias is considered important. To see why this matters, we need a tougher question.

What is the decision boundary?#

That blue line has a name: The decision boundary. This is the exact point where the machine switches from telling the truth to being wrong.

We can calculate it. where the border sits w · x + b = 0. solve for x: :

decision_boundary = -bias / weight

The threshold for “is this number positive” should be at 0 and it is. Now see what happens when the correct answer is not zero.

Why do we need prejudice? student-pass example#

New problem. Same machine. We give it a test score from 0 to 100, and we ask:

Did the student pass?

The rules are simple: a score of 50 or higher passes. Therefore the decision should sit on the border 50Not at 0.

Let us try to solve it in the same way as we solved it last time, using only weights. In the demo below, Turn off “Use Bias” and press train.

Era 0
weight 0
Partiality 0
Limit
accuracy 0%




Check accuracy. It gets up to about 50 percent and then gets stuck. It can’t perform better, no matter how long you train it.

Here’s why. Without any prejudice, the formula is justified weight * score. The score of each test is a positive number. So if the weight is positive, the machine calls Everyone Student one pass. If the weight is negative, it makes everyone fail. The limit is stuck at 0, and cannot move. A line through zero cannot distinguish “below 50” from “50 and above.”

Now Turn “Use Bias” back on and press train again. Accuracy increases to 100 percent, and the range moves and parks closer to 50.

That’s the whole job of bias. Weight determines stability. Partiality Moves the range left or right so that it sits where north actually is. Memorization decision_boundary = -bias / weight. The range can be anything with bias. Without one, it’s stuck at zero forever.

A sentence to remember: When your inputs sit too far from zero, you need a bias to move the line to them.

How does a perceptron learn? Age and learning rate#

During training you see two dials: epoch and learning rate.

One Era Have a full pass over all data. The machine rarely gets everything right in one go, so we go over and over again. More epochs means more chances to correct mistakes. This is why accuracy increases as you keep training.

learning rate Each improvement is sized. The code has this learning_rate Multiplier:

weight += learning_rate * error * value

Small steps, careful but slow. Big strides are fast but can move and jump. Choosing it well is part of the craft. Here we experimented 0.1Which is soft enough to remain stable.

Why do we normalize data?#

There is a cool problem hidden in the example nearby. Look at that updated line again:

weight += learning_rate * error * value

Correction is multiplied value. For exam scores, value Can be as big as 100. So even a single wrong guess can add up to a huge amount of weight. The machine still learns, but it wanders around instead of settling smoothly.

is fixed standardization: Shrink the input to a small, well-defined range before training. The simplest version is to divide each score by the largest possible score, so 0 to 100 becomes 0 to 1.

In the demo below, first press train Look for normalization to stop and the accuracy line moving upward. Then Turn on “Normalize data”Reset, and train again. Same machine, same answer, but it gets there in a fraction of the ages, and the climb is effortless.

Era 0
weight 0
Partiality 0
Limit
accuracy 0%




An honest note. With such an input, normalization mostly gives you speed and peace of mind. This becomes necessary when your inputs reside on very different scales. Think about John Doe: His salary was measured in thousands of dollars, but “same city” was only 0 or 1. Without normalization, the dollar will drown everything else, and the machine will basically ignore the city. By placing both on the same scale, each factor gets a fair judgment. (Dividing by the maximum is an easier version; a common general method is to subtract the mean and divide by the spread, called standardization.)

Full Perceptron in Python#

Here is the complete program of “Is this number positive”, with nothing hidden. It is so short that it can be read in one go.

import random

learning_rate = 0.1
EPOCHS = 100

weight = random.uniform(-1, 1)
bias   = random.uniform(-1, 1)

# positive numbers are True, negative numbers are False
data  = [(i * 0.1, True)  for i in range(1, 501)]
data += [(i * 0.1, False) for i in range(-500, 0)]
random.shuffle(data)

for epoch in range(EPOCHS):
    for value, result in data:
        prediction = (weight * value + bias) > 0
        if prediction != result:
            error = result - prediction          # +1 or -1
            weight += learning_rate * error * value
            bias   += learning_rate * error

decision_boundary = -bias / weight
print(f"weight = {weight:.3f}")
print(f"bias   = {bias:.3f}")
print(f"decision boundary = {decision_boundary:.3f}")

To turn it into a student-pass machine, you change two things: Create data test scores result = score >= 50And, if you want to feel the pain of missing bias, freeze the bias at 0. Everything else remains the same.

Gratitude#

The main inspiration for this post came from the fantastic video chatGPT, which is made up of 100 million of these [The Perceptron] By Welch Labs. If you’re a visual learner and want to see the rich history and hardware behind these concepts, I highly recommend checking it out!

What will happen next?#

You have just built a working perceptron. It takes an input, evaluates it, adds a bias and makes a decision. It learns from its mistakes one era at a time.

A neuron can only draw one straight line. The magic starts when you stack them: the output of one neuron becomes the input of the next. Put enough layers of them together and you get a neural network that can learn shapes far more complex than a single line. But every single one of those neurons is doing exactly what you just saw. A weight, a bias, a judgment.

If you want the non-technical story of how I ended up writing code in Canada, I wrote about it here: The Outsider Who Ships Anyway.

Thanks for making this with me. Now go change the numbers and break it down. This is the fastest way to learn.



<a href

Leave a Comment