As you may know, MySQL is a relational database system. It consists of flat tables, which can be joined together in queries. Relations between these tables can only be specified in a way that is one-to-one/one-to-many.
This suits most situations, but when you start getting to hierarchical data, such as multiple level categories (as used on this site), these types of relations start to become non-optimal.
Take for instance the adjacency list method. Each entry in the table is given a column with the id of its parent node:
| id | name | parent_id |
| 1 | parent | 0 |
| 2 | child 1 | 1 |
| 3 | sub-child 1 | 2 |
In order to find all descendants of a node, there would need to be a query once for the parent, then once per node:
function getNodes($db, $parentId) { $sql = 'SELECT * FROM nodes WHERE parent_id = ?'; $nodes = array() foreach ($db->fetchAll($sql, array($parentId), Zend_Db::FETCH_OBJ) as $node) { $node->subnodes = getNodes($db, $node->id); $nodes[] = $node; } return $nodes; }
An alternative method is to use nested sets. This takes advantage of MySQL’s index range selection by using two columns to specify a range in which children exist:
| id | name | left | right |
| 1 | parent | 1 | 6 |
| 2 | child 1 | 2 | 5 |
| 3 | subchild 1 | 3 | 4 |
In this case all descendants of a node can be found between the parent node’s left and right columns.
Combining this with the adjacent list method, you can optimise the conversion of a MySQL result set into a multi-level array of objects:
function getNodes($db, $parentId) { $root = (object) array( 'id' => $parentId, ); $stack = array($root); $sql = 'SELECT nodes.* FROM nodes, nodes AS parent WHERE nodes.left > parent.left AND nodes.left < parent.right AND parent.id = ? ORDER BY nodes.left'; foreach ($db->fetchAll($sql, array($parentId), Zend_Db::FETCH_OBJ) as $node) { while ($node->parent_id != $stack[count($stack)-1]->id) { array_pop($stack); } $stack[count($stack)-1]->subnodes[] = $node; array_push($stack, $node); } return $root->subnodes; }
Of course this method is at the expense of insert/modification of nodes in the table, which would need more code to do, but usually these happen much less often than selects in production environments.