Lesson 1

Understanding the Problem: Why We Need a Better Way to Store Mappings

Understanding the Problem

Before learning about a hash table, it helps to first see a problem that programmers run into all the time. Once you feel the pain of this problem, you will understand exactly why hash tables were invented.

Mapping one kind of data to another

When we write programs, we often need to connect two pieces of information together. For example, think about a classroom. Every student has:

A name, like Ram or Neha — this is a piece of text (a *string*).
A roll number, like 14551 — this is a whole number (an *integer*).

We want to link each name to its roll number:

Ram → 14551
Neha → 14555
Prakhar → 14567
Shikhar → 14564

This kind of "this goes with that" connection is called a mapping. The name is the thing we look up — we call it the key. The roll number is the answer we get back — we call it the value.

So the real question is: *how do we store these key–value pairs in our program?*

A simple idea: two separate arrays

An *array* is just a row of boxes that hold data, where each box has a number called an index (0, 1, 2, 3, and so on).

One natural idea is to use two arrays side by side:

A names array that holds the strings.
A roll num array that holds the integers.

We line them up so that the same index in both arrays belongs to the same student:

index:      0       1        2          3
names:    Ram     Neha    Prakhar    Shikhar
roll num: 14551   14555    14567      14564

Because Shikhar sits at index 3 in names, his roll number also sits at index 3 in roll num, which is 14564. The matching index is the glue that holds the pair together.

This is easy to set up. But the real test of any storage idea is: *how easily can we get our data back out?*

The trouble: searching by name is slow

Suppose someone asks: "What is Shikhar's roll number?"

We know his name, but we do not know his index. So the computer has no choice but to start at the beginning and check the names one by one:

Look at index 0 → Ram. Not Shikhar. Keep going.
Look at index 1 → Neha. Not Shikhar. Keep going.
Look at index 2 → Prakhar. Not Shikhar. Keep going.
Look at index 3 → Shikhar. Found him! His index is 3.
Now read roll num[3] → the answer is 14564.

Checking every box from the start until you find the right one is called a linear scan (or *linear search*). It works, but notice the problem: to find one student, we may have to look at *all* of them.

With only 4 students this feels fine. But imagine the roll numbers of every student, in every class, in every school of a whole city — millions of names. Searching one by one through millions of boxes every single time is painfully slow. This is the heart of the problem.

Why this approach falls short

Storing the mapping in two arrays is intuitive, but it has two serious limitations:

Bad performance. Searching an array means scanning it from the start. In the worst case you check every single element. We describe this speed as O(N) time, which simply means: if you have N items, the work grows in step with N. Double the students, and the search can take twice as long.

Fixed size. A normal array's size is decided when it is created and cannot grow or shrink afterward. If you made room for 100 students and a 101st joins, you are stuck.

We could fix the second problem by using a dynamic array (one that can grow as needed). But that does *nothing* for the first problem — searching by name would still be a slow linear scan.

Where this is heading

So we are left wishing for something better. What we really want is a data structure that lets us jump straight to a student's roll number from their name — instantly, without scanning — and that can also grow as more data arrives.

That magical-sounding data structure exists. It is called a hash table, and solving this exact problem is what it was built for.

Understanding the Problem: Why We Need a Better Way to Store Mappings — Diagram — click to zoom (scroll / drag to pan)

← Start Hash Tables: Fast Key-Value Storage →