Lesson 3

Defining a Hash Function

A hash function sits at the heart of every hash table. Before we look at how a hash table works inside, we need to really understand what a hash function is. And to do that, we first need to understand a plain old mathematical function.

What is a mathematical function?

Think of a function as a simple rule that takes something in and gives one thing back. You put a value in, and you always get exactly one value out.

In math language, a function goes from a set K to a set V:

K is called the domain — the set of all values you are allowed to put *in*.
V is called the codomain — the set of all values that can come *out*.

The rule promises one important thing: every value in K is assigned exactly one value in V. Same input, same output, every single time.

These sets can hold any type of data — numbers, text strings, whole objects, anything.

Here are three simple number functions to make this concrete:

f(x) = x + 2
f(x) = x² + 4x + 3
f(x) = x % 5

A normal math function puts no limit on the size of its domain or codomain. Both can be small and fixed, or endlessly large (infinite). For example, f(x) = x + 2 can take any number you imagine, and it can also produce any number — both sides are infinite.

What makes a hash function special?

A hash function is just a special kind of mathematical function. Here is the one rule that makes it special:

> A hash function can take inputs from a set of *any* size (even infinite), but its outputs must come from a finite set — a set with a fixed, limited number of values.

So:

The domain (the inputs, called keys) can be infinite or fixed.
The codomain (the outputs) must be finite. These outputs are called hash values (also known as hash codes, digests, or just hashes).

This means every hash function is a mathematical function, but not every mathematical function is a hash function. Hash functions are a small group living inside the larger world of all functions.

Let's test our three examples:

f(x) = x + 2 → output grows forever as x grows, so the output set is infinite. Not a hash function.
f(x) = x² + 4x + 3 → same problem, the output keeps growing. Not a hash function.
f(x) = x % 5 → the remainder when you divide by 5 can only ever be 0, 1, 2, 3, or 4. That is a fixed, finite set of just 5 possible outputs. This one is a hash function!

The % (modulo) example is the key idea: no matter how huge the input number is, you are always squeezed into one of only 5 buckets.

Collisions: the unavoidable problem

Now think carefully. A hash function can have an infinite number of possible inputs, but only a finite number of possible outputs. If you keep pouring more and more inputs into a small number of output slots, some of them are forced to land on the same slot.

When two different inputs produce the same hash value, we call this a collision.

A quick example with f(x) = x % 5:

f(7) = 2
f(12) = 2

Two different keys, 7 and 12, both map to 2. That's a collision.

Here is the important takeaway: collisions are inevitable. No matter how clever your hash function is, mapping infinitely many keys into a finite space *must* cause some overlaps. You cannot make collisions disappear completely.

But you are not helpless. You can choose your hash function carefully to make collisions rare. So what does a good hash function look like?

It has a low chance of collision — it spreads inputs evenly across the output slots.
It is fast — it computes the hash value quickly.

Keep these two goals in mind. They will guide everything we do when we build real hash tables next.

Defining a Hash Function — Diagram — click to zoom (scroll / drag to pan)

← Hash Tables: Fast Key-Value Storage What Makes a Hash Function Good →