Hierarchical data access in MySQL using nested sets
As you may know, MySQL is a relational database system. It consists of flat tables, which can be joined together in queries. Relations between these tables can only be specified in a way that is one-to-one/one-to-many.
This suits most situations, but when you start getting to hierarchical data, such as multiple level categories (as used on this site), these types of relations start to become non-optimal.
Take for instance the adjacency list method. Each entry in the table is given a column with the id of its parent node:
id | name | parent_id |
---|---|---|
1 | parent | 0 |
2 | child 1 | 1 |
3 | sub-child 1 | 2 |
In order to find all descendants of a node, there would need to be a query once for the parent, then once per node:
function getNodes($db, $parentId) {
$sql = 'SELECT * FROM nodes WHERE parent_id = ?';
$nodes = array()
foreach ($db->fetchAll($sql, array($parentId), Zend_Db::FETCH_OBJ) as $node) {
$node->subnodes = getNodes($db, $node->id);
$nodes[] = $node;
}
return $nodes;
}
An alternative method is to use nested sets. This takes advantage of MySQL’s index range selection by using two columns to specify a range in which children exist:
id | name | left | right |
---|---|---|---|
1 | parent | 1 | 6 |
2 | child 1 | 2 | 5 |
3 | subchild 1 | 3 | 4 |
In this case all descendants of a node can be found between the parent node’s left and right columns.
Combining this with the adjacent list method, you can optimise the conversion of a MySQL result set into a multi-level array of objects:
function getNodes($db, $parentId) {
$root = (object) array(
'id' => $parentId,
);
$stack = array($root);
$sql = 'SELECT nodes.*
FROM nodes, nodes AS parent
WHERE nodes.left > parent.left AND nodes.left < parent.right
AND parent.id = ?
ORDER BY nodes.left';
foreach ($db->fetchAll($sql, array($parentId), Zend_Db::FETCH_OBJ) as $node) {
while ($node->parent_id != $stack[count($stack)-1]->id) {
array_pop($stack);
}
$stack[count($stack)-1]->subnodes[] = $node;
array_push($stack, $node);
}
return $root->subnodes;
}
Of course this method is at the expense of insert/modification of nodes in the table, which would need more code to do, but usually these happen much less often than selects in production environments.