Before we learn what a graph is, let's understand *why* graphs even exist. The best way to do this is to look at a real problem that older, simpler data structures cannot handle well.
A data structure is just a way to store data so we can find it and use it again easily. There are two big families we have seen so far:
These two families are great for many jobs. But they share a hidden limit: they are good at simple relationships. A tree, for example, can show a parent-and-child link, but each child has only one parent. Real life is often messier than that.
Many real problems need many-to-many relationships. This means one item can be connected to many others, and those others can connect back to many more — in any direction, with no clear "top" or "bottom."
A tree cannot do this. To see why, let's build something real.
Imagine we want to build a travel website that shows flight connections between cities. Our raw data is just a list of which city flies to which city:
Notice the shape of this. City A connects to two cities (F and D). City D is reached from A and from C, and it also flies to E. The connections loop around and cross each other. There is no single "root" city and no neat parent-child layers. This is a many-to-many web of links.
We want our website to answer two kinds of user questions. Let's look at both and see where our old tools fail.
A user types in their current city and their destination city. We want to tell them the smallest number of flights (hops) to get there.
For example, going from City A to City H:
So the best answer is 2 hops. To find this, a computer must explore all the paths and pick the shortest one.
Can a tree store this? No. A tree must be acyclic, which means it is not allowed to have cycles (loops that come back to where they started). But our flight data *has* cycles — you can leave a city and, through other cities, come back to it. By its very definition, a tree forbids this, so the data simply does not fit.
What about linked lists? One idea is to store each connection as a tiny linked list of two cities, like this:
A next F next head null
A next D next head null
G next B next head null
B next C next head null
D next E next head null
E next H next head null
F next H next head null
C next D next head null
Each little list holds two cities and the link between them. This *can* store the data — but answering "what is the shortest route from A to H?" now needs a slow, complicated algorithm. The computer would have to dig through every separate list again and again to follow a path. It works, but it is inefficient.
Now we add prices. Each connection also has an airfare:
500200600100300200100700A user gives us a starting city and an amount of money. We want to tell them the maximum number of flights they can take without going over their budget. Here the destination does not matter — we only care about flying as much as possible for the money they have.
This is even harder. Now every connection carries two pieces of information: which cities it links *and* how much it costs. Neither a plain linked list nor a tree gives us a clean place to store that extra cost on each link, and exploring all the money-spending paths quickly becomes a mess.
We have a kind of data that is:
None of our linear structures (arrays, lists, stacks, queues) and none of our hierarchical structures (trees, heaps) can model and store this data both *easily* and *efficiently*. We are missing the right tool.
That missing tool is the graph — a data structure built exactly for connections like these. In the next lessons, we will see how a graph stores cities and flights naturally, and how it answers both of these questions efficiently.