Deriving The Y-Combinator
In the previous Mindshift blog, we looked at creating a recursive function using only anonymous functions. However, that function was only capable of performing a single task (calculating numbers in the Fibonacci Series).
What we need however is a general purpose iterator. We have now arrived at the point where we can derive the Y-Combinator.
Please remember that wherever you see the name "Y-Combinator", we could also use the name "Fixed Point Combinator". Please read Mindshift: Part 9 for an explanation of the fixed point of a function.
But Before We Start...
It is important to understand that there is no single version of the Y-Combinator; rather, it is a set of functions where any member
Y satisfies the criteria
Y(f) === f(Y(f)) for some function
f. So, strictly speaking, there are infinitely many Y-Combinators.
What we're going to deal with here however, is the version of the Y-Combinator derived by Haskell Curry. This is because his version is both the most widely cited and is also the least intuitive!
Haskell Curry's Y-Combinator is often quoted in the following form (using Lambda Calculus notation):
Y = λf·(λx·f(x x))
This expression is perfectly correct and has both an inherent beauty and, due to its symmetry, no small amount of mystery. However, it is nothing more than a description of the finished expression, and as such, is completely useless as the starting point for helping people understand the Y-Combinator.
So, Where Did We Get Up To?
By the end of the previous blog, we had successfully calculated the Fibonacci sequence using nothing but a recursive arrangement of anonymous functions. The conceptual breakthrough was that we were able to get a function to call itself recursively without it needing to know its own name.
Our task here is to turn this function from one that implements the specific recursive function
fib_next, into one that can handle any recursive function.
So more refactoring is needed.
The Most Basic Combinators
In the simplest sense, a combinator is any function that takes some number of functions as parameters and combines them in some way.
The simplest of all combinators is known as the
IDENTITY function - and it does nothing more than return its parameter completely unmodified.
Dave Keenan has rather humorously renamed the
IDENTITY function to be the
IDIOT function - because it is too dumb to perform any processing on its parameter.
Although we don't need to use it here, the
IDENTITY function is not as useless as it might at first appear.
We have however, already met two other very simple combinators; the
Admittedly, we are stretching the definition of the verb "to combine" to its breaking point here because neither
FALSE actually combine their parameter values. They simply ignore one and return the other - but second only to the
IDENTITY function, these are the simplest (dumbest?) of all combinators.
As far as recursion is concerned, the recursive equivalent of the
IDENTITY function is the Universal, or U-Combinator. This is a higher-order function that does nothing more than take a single function as a parameter and return whatever value is created by passing that function to itself. Under such circumstances, it is usual to expect a partial function as the return value.
FALSE, the universal combinator is not as useless as it looks; in fact, we've already been using it.
Deriving the Y-Combinator
Step 1: Apply the Universal Combinator
Now that we know about the universal combinator, we can optimise our code (slightly). This particular step is not a requirement and can be omitted if desired - since right at the very end, we will remove this step anyway.
Step 2: Encapsulate the Fibonacci Function
We know that the calculation of the Fibonacci series follows a step-wise execution pattern, so it would be good if we could isolate the parts of the function that encapsulate a single step of the calculation, then treat this in isolation from the parts that handle recursion.
So looking at our code, the Fibonacci function is where the actual calculation is performed, so lets isolate this code by applying Tennent's Correspondence Principle. This is where firstly, we take an expression and wrap it inside an anonymous function, then secondly, immediately call that function.
Now that we have encapsulated the Fibonacci function inside a wrapper function, what value should we pass this wrapper?
Step 3: Separate the Calculation Code from the Recursion Code
We said above that we would like to separate those parts of the code that handle the actual calculation, from those parts that handle recursion. Further to this, we know that the unit of code that takes us another step along the Fibonacci series is the call to
fib_next(fib_next) (which we've also identified as an example of the universal combinator
So, we can extract this recursive part of the code and make it a parameter to the new wrapper function. To achieve this, we need to do three things:
Firstly, the partial function created by calling
U(fib_next) becomes the parameter to our new wrapper function.
Secondly, we must create a variable name to identify the value of this parameter within the function. Since we know its going to be a partial function that takes the next step down the series, we can call it
Lastly, where we previously had the call to
U(fib_next), we can replace this with the parameter name
Uncaught RangeError: Maximum call stack size exceeded(…)
RATS!! We've broken it...
The call to
U(fib_next) generates the next step along the Fibonacci series and up until now, we have only ever called it after we tested that
n is not less than one. So this test was acting as a safeguard to prevent runaway recursion. Unfortunately, we have now moved the call to
U(fib_next) outside the safety of our termination condition.
No wonder it blows up in our face!
Step 4: Stopping the Runaway Recursion
What we need is a way to delay the call to
U(fib_next) so that it is only called when needed... Ah yes, we can use function wrapping.
We must be careful here though. Look at where
U(fib_next) is used inside the Fibonacci function.
Q: The call to
U(fib_next) generates a partial function that takes how many parameters?
Therefore, when we wrap the call to
U(fib_next), the wrapper function must also take three parameters. For simplicity, we'll use the same names as before:
Step 5: Just A Minute, Aren't We Going Round In Circles Here?
Look at this block of code embedded into the centre of function
fib_x - haven't we seen that before?
Yup, this is the Fibonacci function we started with... (Look back at the coding at the start of this blog if you're not sure). The only difference is that the call to
fib_next(fib_next) has been moved outside the function and is passed in as the parameter
Now let's see if we can extract the actual Fibonacci function itself from this whole mess. To do this, we'll follow exactly the same principle as before; we'll apply Tennent's Correspondence Principle on the entire function.
We will extract the code block shown above by doing three things all at once. We will:
- Cut out the code block shown above and make it the parameter to a new wrapper function
- Give this parameter the name
rec_fnmeaning "recursive function"
- Replace the code block with the parameter name
Step 6: Tidy Up The Pieces
Now that the function to be called recursively has been isolated from the coding that performs the recursion, we can start to pull out the various pieces and give them reasonable names.
First, the inline function passed to the recursion function does not need to be a hard-coded parameter; it can be extracted and given a name. Let's call it
Next, it is no longer appropriate to use the internal variable name
fib_next since we can now handle any recursive function. So let's rename
gen_rec, because this unit of code can "generate recursive" functions calls.
Now that we have transformed our Fibonacci recursive function into a generic recursive function, we must do two further things: first, the name
fib_x can be changed to
Y (as in the Y-Combinator), and second, we can drop the
(do_fib) parameter at the end of the definition of
These two changes will now (finally) give us a general purpose Y-Combinator function.
The only remnant left in this coding that betrays its prior usage for calculating the Fibonacci series is the fact that
rec_fn is passed a function in three parameters. But this won't cause anything to break.
In fact, since we've derived our Y-combinator from a recursive function that requires three parameters, this solution will work with any recursive function that requires up to three parameters.
But Is This Really a General Purpose Recursive Function?
Up till now, we've been working exclusively with the Fibonacci function. This was largely because everybody else who discusses recursion uses the Factorial function, and I just wanted to be different!
Nonetheless, let's create a factorial function and see how that works with our Y-Combinator.
But Is This Function Really The Y-Combinator?
We've got ourselves a cool function here that can take any other recursive function (taking up to three parameters), and it will work nicely. But is it really the Y-Combinator?
Well, we can check this by looking again at the discussion of Fixed Points. A function is said to have a "fixed point" if, when you put in a particular value, you get exactly the same value out again.
In the case of the cosine function (working in radians), the fixed point is
cos(0.739085133) = 0.739085133.
However, there is no reason why the fixed point of a function should be a simple number such as
0.739085133. It could just as easily be another function.
The Y-Combinator is also known as the fixed-point combinator; therefore, we need to test whether we can supply it with a function, and then get out exactly the same function. If we can, then we know that we have successfully created the Y-Combinator.
So using the relationship above, we already have a function called
Y, and the function called
f would be either the
As can be seen, calling
Y(do_fact) gives exactly the same function as calling
do_fact(Y(do_fact)). So we can see that our function satisfies the criteria for being a valid Y-Combinator.
fact1 is the fixed point of function
do_fact, and likewise, function
fib1 is the fixed point of function
But This Y-Combinator Doesn't Look Like Haskell Curry's Y-Combinator
That's true, and fortunately for me, there are infinitely many valid Y-Combinators so in that sense, it doesn't matter. However, at the start of this blog I promised to derive Haskell Curry's Y-Combinator.
I guess this means we need to do a little more refactoring...
However, for reasons that defy rational explanation, mathematicians seem to have a strong dislike for giving variables meaningful names. They seem to delight in concealing the meaning of variables by giving them intention-hiding names such as
x, or the letters from the Greek or Hebrew alphabets, or characters written upside down or in strange type faces...
So, let's follow mathematical convention and translate the meaningful variable names in our Y-Combinator into utterly meaningless, single character names.
At this point, it is very important that you suspend any desire to ask for rational explanations and just accept that
rec_fnshould now be called
Now remove the use of the universal combinator
U, and replace it with inline calls.
U(fib_next) as a parameter value back in step 3.
However, other languages such as Haskell perform lazy evaluation. This means that expressions are not evaluated until their value is actually needed; consequently, such languages do not require the use of this workaround to avoid hitting the runtime recursion limit.
U(fib_next) to delay its execution. But by doing this, we transformed the combinator from a "normal order" combinator to an "applicative order" combinator.
Normal order combinators can only be implemented in languages that perform lazy evaluation; whereas, for languages that perform strict evaluation, the applicative order combinator is required. (The applicative order combinator is also known as the Z-Combinator).
(n,a,b) => x(x)(n,a,b) used to delay the execution of
x(x) can be removed leaving a naked
x(x) as the parameter to
Lastly, the preferred mathematical style is to pass the results of the first call to
x(x) to the recursive function
f. As it happens, this turns out in this case to be a no-op.
Et voilá! The Y-Combinator, as shown in Wikipedia.
However, its fair to say that this, the most frequently cited version of the Y-Combinator, is far from intuitive - and its taken me several weeks to understand, and then a further three whole blogs to build up its derivation from first principles!
I trust however, that in spite of number and length of these blogs, these explanations have been clear enough for you to follow and understand.
Now that we know how the Y-Combinator works, we can proceed to define other mathematical functions such as
All source code from the Mindshift blog series can be found on GitHub