Mindshift: Part 11

Deriving The Y-Combinator

In the previous Mindshift blog, we looked at creating a recursive function using only anonymous functions. However, that function was only capable of performing a single task (calculating numbers in the Fibonacci Series).

What we need however is a general purpose iterator. We have now arrived at the point where we can derive the Y-Combinator.

Please remember that wherever you see the name "Y-Combinator", we could also use the name "Fixed Point Combinator". Please read Mindshift: Part 9 for an explanation of the fixed point of a function.

But Before We Start...

It is important to understand that there is no single version of the Y-Combinator; rather, it is a set of functions where any member Y satisfies the criteria Y(f) === f(Y(f)) for some function f. So, strictly speaking, there are infinitely many Y-Combinators.

What we're going to deal with here however, is the version of the Y-Combinator derived by Haskell Curry. This is because his version is both the most widely cited and is also the least intuitive!

Haskell Curry's Y-Combinator is often quoted in the following form (using Lambda Calculus notation):

Y = λf·(λx·f(x x))
       (λx·f(x x))

Or, if we translate this into JavaScript arrow functions...

This expression is perfectly correct and has both an inherent beauty and, due to its symmetry, no small amount of mystery. However, it is nothing more than a description of the finished expression, and as such, is completely useless as the starting point for helping people understand the Y-Combinator.

Therefore, I will not refer to this expression any more. However, by the end of this blog, we will have derived this combinator using the recursive functions defined in the previous blog to calculate the Fibonacci series. Also, I will not use any Lambda Calculus notation since this is of little help when working with the JavaScript implementation.

BTW, don't try running the JavaScript implementation shown above - it will blow up in your face for reasons that will soon be explained.

So, Where Did We Get Up To?

By the end of the previous blog, we had successfully calculated the Fibonacci sequence using nothing but a recursive arrangement of anonymous functions. The conceptual breakthrough was that we were able to get a function to call itself recursively without it needing to know its own name.

Our task here is to turn this function from one that implements the specific recursive function fib_next, into one that can handle any recursive function.

So more refactoring is needed.

The Most Basic Combinators

In the simplest sense, a combinator is any function that takes some number of functions as parameters and combines them in some way.

The simplest of all combinators is known as the IDENTITY function - and it does nothing more than return its parameter completely unmodified.

Dave Keenan has rather humorously renamed the IDENTITY function to be the IDIOT function - because it is too dumb to perform any processing on its parameter.

Although we don't need to use it here, the IDENTITY function is not as useless as it might at first appear.

We have however, already met two other very simple combinators; the TRUE and FALSE functions.

Admittedly, we are stretching the definition of the verb "to combine" to its breaking point here because neither TRUE nor FALSE actually combine their parameter values. They simply ignore one and return the other - but second only to the IDENTITY function, these are the simplest (dumbest?) of all combinators.

Recursive Combinators

As far as recursion is concerned, the recursive equivalent of the IDENTITY function is the Universal, or U-Combinator. This is a higher-order function that does nothing more than take a single function as a parameter and return whatever value is created by passing that function to itself. Under such circumstances, it is usual to expect a partial function as the return value.

As with IDENTITY, TRUE and FALSE, the universal combinator is not as useless as it looks; in fact, we've already been using it.

Deriving the Y-Combinator

Step 1: Apply the Universal Combinator

Now that we know about the universal combinator, we can optimise our code (slightly). This particular step is not a requirement and can be omitted if desired - since right at the very end, we will remove this step anyway.

Step 2: Encapsulate the Fibonacci Function

We know that the calculation of the Fibonacci series follows a step-wise execution pattern, so it would be good if we could isolate the parts of the function that encapsulate a single step of the calculation, then treat this in isolation from the parts that handle recursion.

So looking at our code, the Fibonacci function is where the actual calculation is performed, so lets isolate this code by applying Tennent's Correspondence Principle. This is where firstly, we take an expression and wrap it inside an anonymous function, then secondly, immediately call that function.

Now that we have encapsulated the Fibonacci function inside a wrapper function, what value should we pass this wrapper?

Step 3: Separate the Calculation Code from the Recursion Code

We said above that we would like to separate those parts of the code that handle the actual calculation, from those parts that handle recursion. Further to this, we know that the unit of code that takes us another step along the Fibonacci series is the call to fib_next(fib_next) (which we've also identified as an example of the universal combinator U(fib_next)).

So, we can extract this recursive part of the code and make it a parameter to the new wrapper function. To achieve this, we need to do three things:

Firstly, the partial function created by calling U(fib_next) becomes the parameter to our new wrapper function.

Secondly, we must create a variable name to identify the value of this parameter within the function. Since we know its going to be a partial function that takes the next step down the series, we can call it fib_next

Lastly, where we previously had the call to U(fib_next), we can replace this with the parameter name fib_next.

Uncaught RangeError: Maximum call stack size exceeded(…)

RATS!! We've broken it...

The call to U(fib_next) generates the next step along the Fibonacci series and up until now, we have only ever called it after we tested that n is not less than one. So this test was acting as a safeguard to prevent runaway recursion. Unfortunately, we have now moved the call to U(fib_next) outside the safety of our termination condition.

No wonder it blows up in our face!

Step 4: Stopping the Runaway Recursion

What we need is a way to delay the call to U(fib_next) so that it is only called when needed... Ah yes, we can use function wrapping.

We must be careful here though. Look at where U(fib_next) is used inside the Fibonacci function.

Q: The call to U(fib_next) generates a partial function that takes how many parameters?
A: Three

Therefore, when we wrap the call to U(fib_next), the wrapper function must also take three parameters. For simplicity, we'll use the same names as before: n, a and b.

Step 5: Just A Minute, Aren't We Going Round In Circles Here?

Look at this block of code embedded into the centre of function fib_x - haven't we seen that before?

Yup, this is the Fibonacci function we started with... (Look back at the coding at the start of this blog if you're not sure). The only difference is that the call to fib_next(fib_next) has been moved outside the function and is passed in as the parameter fib_next.

Now let's see if we can extract the actual Fibonacci function itself from this whole mess. To do this, we'll follow exactly the same principle as before; we'll apply Tennent's Correspondence Principle on the entire function.

We will extract the code block shown above by doing three things all at once. We will:

  1. Cut out the code block shown above and make it the parameter to a new wrapper function
  2. Give this parameter the name rec_fn meaning "recursive function"
  3. Replace the code block with the parameter name rec_fn

Step 6: Tidy Up The Pieces

Now that the function to be called recursively has been isolated from the coding that performs the recursion, we can start to pull out the various pieces and give them reasonable names.

First, the inline function passed to the recursion function does not need to be a hard-coded parameter; it can be extracted and given a name. Let's call it do_fib.

Next, it is no longer appropriate to use the internal variable name fib_next since we can now handle any recursive function. So let's rename fib_next to gen_rec, because this unit of code can "generate recursive" functions calls.

Now that we have transformed our Fibonacci recursive function into a generic recursive function, we must do two further things: first, the name fib_x can be changed to Y (as in the Y-Combinator), and second, we can drop the (do_fib) parameter at the end of the definition of fib_x.

These two changes will now (finally) give us a general purpose Y-Combinator function.

The only remnant left in this coding that betrays its prior usage for calculating the Fibonacci series is the fact that rec_fn is passed a function in three parameters. But this won't cause anything to break.

In fact, since we've derived our Y-combinator from a recursive function that requires three parameters, this solution will work with any recursive function that requires up to three parameters.

But Is This Really a General Purpose Recursive Function?

Up till now, we've been working exclusively with the Fibonacci function. This was largely because everybody else who discusses recursion uses the Factorial function, and I just wanted to be different!

Nonetheless, let's create a factorial function and see how that works with our Y-Combinator.

Cool!!

But Is This Function Really The Y-Combinator?

We've got ourselves a cool function here that can take any other recursive function (taking up to three parameters), and it will work nicely. But is it really the Y-Combinator?

Well, we can check this by looking again at the discussion of Fixed Points. A function is said to have a "fixed point" if, when you put in a particular value, you get exactly the same value out again.

In the case of the cosine function (working in radians), the fixed point is 0.739085133 because cos(0.739085133) = 0.739085133.

However, there is no reason why the fixed point of a function should be a simple number such as 0.739085133. It could just as easily be another function.

The Y-Combinator is also known as the fixed-point combinator; therefore, we need to test whether we can supply it with a function, and then get out exactly the same function. If we can, then we know that we have successfully created the Y-Combinator.

So using the relationship above, we already have a function called Y, and the function called f would be either the do_fib or do_fact functions.

As can be seen, calling Y(do_fib) or Y(do_fact) gives exactly the same function as calling do_fib(Y(do_fib)) or do_fact(Y(do_fact)). So we can see that our function satisfies the criteria for being a valid Y-Combinator.

Therefore function fact1 is the fixed point of function do_fact, and likewise, function fib1 is the fixed point of function do_fib.

But This Y-Combinator Doesn't Look Like Haskell Curry's Y-Combinator

That's true, and fortunately for me, there are infinitely many valid Y-Combinators so in that sense, it doesn't matter. However, at the start of this blog I promised to derive Haskell Curry's Y-Combinator.

I guess this means we need to do a little more refactoring...

However, for reasons that defy rational explanation, mathematicians seem to have a strong dislike for giving variables meaningful names. They seem to delight in concealing the meaning of variables by giving them intention-hiding names such as a or x, or the letters from the Greek or Hebrew alphabets, or characters written upside down or in strange type faces...

So, let's follow mathematical convention and translate the meaningful variable names in our Y-Combinator into utterly meaningless, single character names.

At this point, it is very important that you suspend any desire to ask for rational explanations and just accept that rec_fn should now be called f.

Now gen_rec becomes x.

Now remove the use of the universal combinator U, and replace it with inline calls.

The next point to understand is that since JavaScript is a strict evaluation language, all expressions are evaluated before they are passed as parameters to other functions. This is why we hit the recursion limit when we tried to pass U(fib_next) as a parameter value back in step 3.

However, other languages such as Haskell perform lazy evaluation. This means that expressions are not evaluated until their value is actually needed; consequently, such languages do not require the use of this workaround to avoid hitting the runtime recursion limit.

Since we are using JavaScript however, we had to account for this problem by creating a wrapper function around the call to U(fib_next) to delay its execution. But by doing this, we transformed the combinator from a "normal order" combinator to an "applicative order" combinator.

Normal order combinators can only be implemented in languages that perform lazy evaluation; whereas, for languages that perform strict evaluation, the applicative order combinator is required. (The applicative order combinator is also known as the Z-Combinator).

However, if we pretend that JavaScript has suddenly become a lazy evaluation language, then the wrapper function (n,a,b) => x(x)(n,a,b) used to delay the execution of x(x) can be removed leaving a naked x(x) as the parameter to f:

We must now understand that the above code can no longer be executed in JavaScript...

Lastly, the preferred mathematical style is to pass the results of the first call to x(x) to the recursive function f. As it happens, this turns out in this case to be a no-op.

Et voilá! The Y-Combinator, as shown in Wikipedia.

However, its fair to say that this, the most frequently cited version of the Y-Combinator, is far from intuitive - and its taken me several weeks to understand, and then a further three whole blogs to build up its derivation from first principles!

I trust however, that in spite of number and length of these blogs, these explanations have been clear enough for you to follow and understand.

Now that we know how the Y-Combinator works, we can proceed to define other mathematical functions such as DIVIDE and MODulus.

Chris W


All source code from the Mindshift blog series can be found on GitHub


Disclosure

The content of my Mindshift blogs 9, 10 and 11 were based on a YouTube video by the late (great) Jim Weirich in which he derives the Y-Combinator. In the video, Jim uses Ruby as his demonstration language, and I have translated his coding steps into JavaScript.