Creating a Handwriting Recognition Model Using CNNs and the MNIST Dataset

Nivaan Vedante
Jul 12, 2025
3 min read

Have you ever wondered how AI-powered scanners can look at messy handwriting and instantly convert it into clean, editable text? What if you could build one yourself — a system that not only reads handwritten digits but also recognizes full alphabets? That’s exactly what I set out to do for this project — building a deep learning-powered OCR system from scratch, trained on the MNIST and EMNIST datasets.

WHY HANDWRITTEN CHARACTER RECOGNITION?

Handwritten OCR is one of the oldest and most practical applications of AI. It powers everything from digitizing old forms to scanning handwritten notes into digital text. And yet, many OCR systems struggle when the input isn’t perfect — distorted digits, cursive letters, or variations in writing style.

I wanted to build a lightweight yet accurate neural network that could learn to recognize both digits (0–9) and letters (A–Z, a–z) using only raw image data.

Imagine an AI that looks at a 28x28 grayscale image and confidently tells you: “That’s a 7.” Or even better, “That’s a lowercase g.” That’s the kind of AI I built with just Python, TensorFlow, and a lot of curiosity.

THE BUILDING BLOCKS

To bring this project to life, I relied on a stack of reliable and powerful tools:

TensorFlow / Keras — to build and train deep learning models (CNNs)
MNIST & EMNIST Datasets — for digits and alphabets respectively
Matplotlib — to visualize predictions
OpenCV (optional) — to later allow camera-based inference or real-time OCR

The core idea: build a Convolutional Neural Network (CNN) that can detect patterns in handwriting — loops, strokes, curves — and classify them into correct characters.

THE CHALLENGES I FACED

1. EMNIST ≠ MNIST

While MNIST is super clean and well-known, EMNIST is more complex:

It’s split into multiple subsets (letters, balanced, byclass, etc.).
Characters are often tilted or faint.
Labels aren’t just 0–9 but map to ASCII characters.

How I fixed it: I used the EMNIST “balanced” subset and carefully decoded the label indices to get actual characters. I also preprocessed images (rotating, flipping) to match proper orientation.

2. CNN Architecture Tuning

I initially used a very deep CNN with lots of filters. While it performed well on MNIST, it overfit on EMNIST due to its complexity.

Solution: I reduced the number of layers, added dropout, and used early stopping. The final architecture was a compact but powerful CNN with just enough capacity to generalize.

3. Prediction Confidence

When classifying 62 characters (A-Z, a-z, 0–9), some looked extremely similar — e.g., O vs 0, or l vs 1.

Fix: I added softmax confidence filtering and printed top-3 predictions for ambiguous cases — just like modern OCR engines.

4. Training Time and Memory

Training on EMNIST takes longer and uses more memory than MNIST.

Fix: I trained on smaller batches, used data generators, and stored checkpoints to resume interrupted training.

WHAT I LEARNED

CNNs are incredibly good at picking up subtle patterns in low-res images.
The quality of your preprocessing can dramatically affect model accuracy — especially with handwritten data.
EMNIST is underused but highly valuable for character-level OCR research.
Combining digit and letter recognition into one model is a real-world skill — like reading alphanumeric license plates or serial codes.

WHY IT MATTERS

This kind of OCR system can power:

Form digitization for hospitals, banks, and schools
Handwriting-to-text converters for note-taking apps
AI readers for visually impaired users
Even fun tools like AI-powered typing trainers that learn from your handwriting!

By training on both MNIST and EMNIST, I built a foundation for full-fledged OCR engines — the kind used in document scanners, postal automation, or AI reading assistants.