Building Your First Neural Network in Python from Scratch
Neural Networks Guide
The best way to understand neural networks is to build one yourself. In this tutorial, we'll create a complete neural network from scratch using only NumPy — no PyTorch, no TensorFlow, just pure math and Python.
Prerequisites
Basic Python knowledge and an understanding of linear algebra (matrix multiplication). Familiarity with derivatives is helpful but not required.
Step 1: Define the Network Structure
We'll build a simple 3-layer network: an input layer, one hidden layer, and an output layer. Our goal is to train it to learn the XOR function — a classic problem that a single perceptron cannot solve.
import numpy as np
# XOR dataset
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([[0], [1], [1], [0]])
# Initialize weights randomly
np.random.seed(42)
W1 = np.random.randn(2, 4) * 0.5
b1 = np.zeros((1, 4))
W2 = np.random.randn(4, 1) * 0.5
b2 = np.zeros((1, 1)) Step 2: Define Activation Functions
We need the sigmoid function for our activations and its derivative for backpropagation:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(z):
s = sigmoid(z)
return s * (1 - s) Step 3: The Training Loop
Now we put it all together — forward pass, loss computation, backward pass, and weight update — repeated for many iterations:
learning_rate = 1.0
for epoch in range(10000):
# Forward pass
z1 = X @ W1 + b1
a1 = sigmoid(z1)
z2 = a1 @ W2 + b2
a2 = sigmoid(z2)
# Compute loss (MSE)
loss = np.mean((a2 - y) ** 2)
# Backward pass
dz2 = (a2 - y) * sigmoid_derivative(z2)
dW2 = a1.T @ dz2 / 4
db2 = np.sum(dz2, axis=0) / 4
dz1 = (dz2 @ W2.T) * sigmoid_derivative(z1)
dW1 = X.T @ dz1 / 4
db1 = np.sum(dz1, axis=0) / 4
# Update weights
W2 -= learning_rate * dW2
b2 -= learning_rate * db2
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
print(f"Predictions: {a2.round(2).flatten()}")
# Output: [0.02, 0.98, 0.98, 0.02] ≈ XOR! Understanding What Happened
After 10,000 iterations, our network learned to approximate XOR perfectly. The hidden layer learned to create two intermediate features that, when combined by the output layer, produce the correct XOR output.
This is the fundamental power of neural networks: by stacking layers of simple operations, they can learn to represent complex, non-linear functions. A single layer couldn't solve XOR, but two layers can — and deeper networks can solve far more complex problems.
Key Takeaways
- Neural networks are just chains of matrix multiplications and non-linear activation functions.
- Weight initialization matters — random initialization breaks symmetry and allows different neurons to learn different features.
- The training loop (forward → loss → backward → update) is the same whether you have 4 parameters or 175 billion.
Next Steps
Now that you've built a network from scratch, try modifying it! Add more hidden layers, experiment with different learning rates, or try a different dataset. Then explore our backpropagation deep-dive for the mathematical foundations.