Maybe you have wondered how, with multiple inheritance, the computer figures out which methods to call. There is a number of different Method Resolution Order (MRO) algorithms that have different advantages and disadvantages. These algorithms take a class tree and figure out a linear order in which to try the classes.
There are a number of important properties these algorithms should have:
- Inheritance: If \(C\) inherits from \(A\) and \(B\), it should come in the linearization before both \(A\) and \(B\). That sounds trivial, but there are algorithms that fail this property.
- Multiplicity: If \(C\) inherits from \(A\) and \(B\) in this order, \(A\) should come in the linearization before \(B\).
- Monotonicity: If you linearize a class \(C\) and the linearization places \(A\) before \(B\) in the linearized order, every linearization of the parents of \(C\) should preserve the relative order of \(A\) before \(B\). The idea behind this is that the linearization of the parents can never change when adding new child classes to the inheritance hierarchy. This is essential, because otherwise if you extended your code with new classes, the old code might change behaviour because it now calls different methods in a different parent class.
There are a number of different algorithms like
- Leftmost Preorder Depth First Search (LPDFS), which searches, well, depth first. If there is more than one parent, it first takes the leftmost one, before returning th the next one, etc. Unfortunately, this one breaks inheritance, since examples can be constructed where the grandparent classes are tried before the parent classes.
- LPDFS with Duplicate Cancellation: It fixes the issue by first allowing duplicates to be found and then removes them and only leaving the last occurence. Unfortunately, this one fails monotonicity.
- Reverse Postorder Rightmost DFS: Traverses the inheritance graph in depth first, this time taking the rightmost parent and reverses the order. This is equivalent to topological sorting, but still fails multiplicity.
- Refined Postorder Rightmost DFS: run the regular RPRDFS and check if there are conflicts between the edges, e.g. multiplicity. If that is the case, add an explicity edge and rerun. Boy, this algorithm is ugly and monotonicity is still not guaranteed.
So in 1996 a new algorithm was found that does it differently: C3. C3 stands for nothing in particular. It is an algorithm that will generate a monotonic linearization! C3 has been taking the world in storm after being published originally for Dylan it was adopted for Python 2.3 (Python 2.2 used LPDFS (DC), earlier versions just LPDFS), but also in Perl 6 and Perl 5.
It can be described by these formulæ, where \(L\) is the linearization function and \(C(B_1, B_2)\) means \(C\) inherits from \(B_1\) and \(B_2\) in that order. Bear with me, I’ll explain what these mean shortly.
\[ L[C(B_1…B_n)] = C \cdot \bigsqcup (L[B_1], …, L[B_n], B_1 … B_n) \]
\[ L(Object) = Object \]
This is for the linearization. It is very simple, it just takes the first element, \(C\) out and delegates the rest to the “merging” function, \(\bigsqcup\):
\[ \underset{i} \bigsqcup (L_i) = \begin{cases} c \cdot (\bigsqcup_i(L_i \setminus c)) & if \exists_{\text{min} k} \forall_j c = head(L_k) \notin tail(L_j) \\ fail & \text{otherwise} \end{cases} \]
What it does is it checks if there is a \(c\) that is a \(head\) of any of the lists that does not occur in any tail of the lists. It there is one (there might be multiple, in which case it takes the first), it determines this to be the next “step” and adds it to the linearization list. It removes then removes the \(c\) that was found from the list and recurses to find the next step. If it doesn’t find one, it fails. This is important to know, because linearization might not always be possible preserving all the properties mentioned above. In this case, C3 opts to go for correctnes and admit that there is no possible result.
How about we implement it? Sure, to understand it better, we can try. We could use a language that supports multiple inheritance, the concept that bought us this linearization problem to start with, like OCaml. Let’s use OCaml but not use inheritance (or objects) at all, since there isn’t even any need for it.
First we need a type
1
|
|
Then let’s implement the \(L\) function:
1 2 3 |
|
So far so good, we named \(L\) c3_exn
(because it implements C3 and might
throw exceptions) and \(\bigsqcup\) merge
. Let’s continue with merge
.
This is the formula with the bracket, so let’s do the bracket here as well:
1 2 3 4 5 6 7 8 |
|
head_not_in_tails
gives us the first element (remember min
in the formula)
of the heads of the lists that does not occur in any tail of any of the lists.
There might be None
in which case we have to admit that there is no
linearization. But if there is, we remove this element from all the lists
\(L_i \setminus c\) and continue on. At this point I deviate from the formula
(I believe the formula is wrong at that point) and check if there is even something
left, to terminate the recursion. If there is, we try recursively and if there
isn’t, we just found the end of the linearization list.
What is still missing is remove
, which removes the element from all the lists:
1 2 3 |
|
It also removes empty lists for convenience, although that makes no difference to the algorithm at all. Now we need to determine how to find the first \(c\) that is a \(head(L_k)\) but is not in any \(tail(L_j)\):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
First we implement some helpers, because it is customary in OCaml to reinvent
standard libraries all the time. head
returns the head of a list and tail
returns the tail. If the list are empty, []
will be returned. This is useful,
because if we use concat_map
on those, we get all heads and all tails, even
if we have empty lists. So heads
and tails
will always be some list, though
maybe empty. But that is not a problem. Then we fold_left
over the heads
,
with a function that looks for the first head that is not in tail. If it
doesn’t find one, it just returns None
.
Lo and behold, our implementation is complete. If we test it with the example from Wikipedia, we get exacly the right order:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
We get [Z, K1, K2, K3, D, A, B, C, E, O]
, which is correct.
If you are interested more in this topic, I can recommend the slides of the programming languages lecture that inspired the whole implementation. It also includes another set of example classes which demonstrate a linearization failing. Naturally, our code does the right thing.
If you feel like playing with the code, you can check out c3.ml. You can also check one or another implementation in Python.