by Manfred von Thun
Abstract: Joy is a functional programming language which is not based on the application of functions to arguments but on the composition of functions. Many topics from the theory of computability are particularly easy to handle within Joy. They include the parameterisation theorem, the recursion theorem and Rice’s theorem. Since programs are data, it is possible to define a Y-combinator for recursion and several variants. It follows that there are self-reproducing and self-describing programs in Joy. Practical programs can be written without recursive definitions by using several general purpose recursion combinators which are more intuitive and more efficient than the classical ones.
Keywords: functional programming, functionals, computability, diagonalisation, program = data, diagonalisation, self-reproducing and self-describing programs, hierarchy of recursion combinators, elimination of recursive definitions.
This paper describes some aspects of the language from the perspective of recursive function theory.
A deep result in the theory of computability concerns the elimination of recursive definitions. To use the stock example, the factorial function can be defined recursively in Joy by:
factorial ==
[0 =] [pop 1] [dup 1 - factorial *] ifte
The definition is then used in programs like this:
5
factorial
Because in Joy programs can be manipulated as data, the factorial function can also be computed recursively without a recursive definition, as follows:
5
[ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
[dup cons] swap concat dup cons i
The second line in this program does much the same as the body of the
definition of factorial, but it is a quoted program. The third line
first transforms this into another longer quoted program which performs
“anonymous” recursion, and then the final i
combinator
essentially dequotes this program causing its execution.
The third line implements Joy’s counterpart of the Y combinator of the lambda calculus and of combinatory logic. Exactly the same line can be used to cause anonymous recursion of other functions which are normally defined recursively.
Joy has other combinators which make recursive execution of programs more succinct. (Of course it is also possible in Joy to compute the factorial function more efficiently with iteration instead of recursion.)
The remainder of this paper deals with various aspects of the theory of computability, in particular the theory of recursive functions. The next section gives a recursive definition of a recursion combinator. Following that is a section dealing with various well known and elementary theorems from recursive function theory. The next section then proves a fixpoint theorem for Joy. The theorem implies the existence of self-reproducing programs, as shown in the following section. There is also some discussion of the effect of evaluation order on termination. The topic of the next section is Rice’s theorem for Joy. Then there is a section describing further self-reproducing and self-describing programs. Another recursion combinator is constructed in the next section. The final section discusses the more practical recursion combinators of Joy. Previous knowledge of the field of recursion theory is not assumed.
The factorial function and the Fibonacci function are often used to illustrate two different recursion patterns, although both are most efficiently computed non-recursively. This section follows tradition by using these two functions. The ultimate aim is to show how recursive definitions of functions can be eliminated to obtain a (closed) form of a program for recursive execution.
In conventional notation the factorial function can be defined recursively like this:
r-fac(n) = if n = 0 then 1 else n * r-fac(n - 1)
It is obvious that in conventional notation the definitions need the
formal parameter n
. Joy was designed to eliminate formal
parameters in definitions, and the factorial function would be defined
in Joy like this:
r-fac1 == [0 =] [pop 1] [dup 1 - r-fac1 *] ifte
The suffix 1
serves to distinguish this definition from
a later version. The RHS of the definition contains three quotations: an
if-part, a then-part and an else-part. These serve as parameters to the
ifte combinator. The if-part tests for equality with 0
, and
if that is true then the then-part pops off the parameter and replaces
it with the result 1
. Otherwise the else-part is executed,
which duplicates the parameter, subtracts 1
from the top
copy, calls r-fac1
recursively on that and, when that call
has returned with a value, multiplies that with the original. Using a
more Joy-like idiom:
r-fac1 == [null] [succ] [dup pred r-fac1 *] ifte
This uses the null
predicate in the if-part, the
successor function in the then-part and the predecessor operator in the
else-part.
In conventional notation the Fibonacci function is defined recursively like this:
r-fib(n) = if n <= 1 then n else r-fib(n - 1) + r-fib(n - 2)
A more or less literal translation into Joy is the following:
r-fib1 == [small] [] [pred dup [r-fib1] dip pred r-fib1 +] ifte
The if-part uses the small predicate, which for numeric parameters
yields true
for 0
and for 1
. The
then-part is the empty quotation []
, when executed it does
nothing. In the else-part r-fib1
has to call itself twice,
once each for the two parameters that have been prepared by
dup
. The dip
combinator applies the quotation
[r-fib1]
to the earlier version, and the later version,
after pred
has done its job, is handled directly by
r-fib1
. A cleaner version is:
r-fib2 == [small] [] [pred dup pred [r-fib2] unary2 +] ifte (* app2 *)
The app2unary2
combinator applies the quoted
[r-fib2]
twice, to each of the two numbers on top of the
stack.
It will help if the minor differences between the definitions of the
factorial function and the Fibonacci function are eliminated. In
particular this concerns the patterns of the recursive calls. In the
body of r-fac1
the direct recursive call has been replaced
by its quotation and app1i
, which applies the
quotation just once, to the single number. So here are two other
versions, they have been aligned to make comparisons easier:
r-fac2 == [ null] [succ] [ dup pred [r-fac2] i *] ifte (* app1 *)
r-fib2 == [small] [ ] [pred dup pred [r-fib2] unary2 +] ifte (* app2 *)
The task of eliminating the recursion in the RHS of the definitions
amounts to this: The occurrences of the quoted programs
[r-fac2]
and [r-fib2]
have to be replaced by
the quoted RHS. But this will introduce those same quotes again, and
these have to be replaced by the RHS, and so on ad infinitum.
It seems impossible.
One part of the solution is that the programs have to be given an
extra parameter which is to be used when the else-part is executed. The
extra parameter will have to contain whatever is necessary to enable the
recursion inside the else-part. But this means that the extra parameter
will be somewhat in the way. Consequently the if-part and the then-part
need an extra pop
to remove the unneeded parameter.
Furthermore, in the else-part any preparatory work before the actual
recursion has to be done below the extra parameter using
dip
. The one or two recursions are then to be effected by
app1i
and app2unary2
,
respectively. The two programs now look like this:
?-fac: [[pop null] [pop succ] [[dup pred ] dip i *] ifte] ? (* app1 *)
?-fib: [[pop small] [pop ] [[pred dup pred] dip unary2 +] ifte] ? (* app2 *)
To indicate that these are not recursive definitions, they
are not given as definitions at all. The ?
-symbol is left
unanalysed at this point, only this much can be said about it:
So the ?
-symbol denotes a strange combinator.
For comparison the three versions of the factorial program are listed here:
r-fac1 == [ null] [ succ] [ dup pred r-fac1 *] ifte
r-fac2 == [ null] [ succ] [ dup pred [r-fac2] i *] ifte (* app1 *)
?-fac: [[pop null] [pop succ] [[dup pred ] dip i *] ifte] ?
And here are the three versions of the Fibonacci program:
r-fib1 == [ small] [ ] [ pred dup [r-fib1] dip r-fib1 +] ifte
r-fib2 == [ small] [ ] [ pred dup pred [r-fib2] unary2 +] ifte (* app2 *)
?-fib: [[pop small] [pop ] [[pred dup pred] dip unary2 +] ifte] ?
The recursion combinator must do this: At the point where
?-fac
uses app1i
and where
?fib
uses app2unary2
they expect
?-fac
and ?fib
on top of the stack. That is
why the if-parts and the then-parts each need an extra pop
,
and why the else-part has to do its initial work from a
dip
. It all means that a recursion combinator has to supply
the RHS to itself as an extra parameter. In general, a recursion
combinator expects a program [P]
and it executes it in a
special way. It must call P
but provide the extra
parameter. This is the defining law for a recursion combinator
?
:
[P] ? == [[P] ?] P
How can a recursion combinator be defined? This can be done
recursively and non-recursively. Once the ?
combinator is
defined, it can be used to eliminate all other recursive definitions. If
a recursion combinator is defined recursively, then this would be the
only recursive definition that is needed. Here is a step-by-step
development of such a definition. The development is a fairly typical
example of how one can write Joy programs systematically. Start with the
defining law:
[P] ?
== [[P] ?] P
In the second line P
occurred twice, in quoted and
unquoted form. It will be simpler if it occurs in only one form, and
that has to be the quoted form. So, using the i
combinator:
== [[P] ?] [P] i
The two quotations are not exactly the same, but they can be produced from two identical quotations:
== [[P] ?] [[P] ?] first i
Now the two identical quotations can be produced by the
dup
operator:
== [[P] ?] dup first i
All that is needed now is to extract [P]
to the
left:
== [P] [?] cons dup first i
In this construction each line was equivalent to its one or two neighbours. Hence the first and the last lines of this construction are equivalent:
[P] ? == [P] [?] cons dup first i
Hence the following is a suitable recursive definition of a recursion combinator:
? == [?] cons dup first i
It is useful to think of the RHS as being composed of two parts: the first is:
[?] cons dup first
and the second part is just i
. The first part is just a
function which takes one quoted program as a parameter and produces two
quoted programs as values:
[P] [?] cons dup first
[[P] ?] dup first (by cons)
[[P] ?] [[P] ?] first (by dup)
[[P] ?] [P] (by first)
The second part of the definition is i
, which
effectively dequotes the topmost [P]
and hence executes
P
. So the next reduction step depends on what
P
actually does.
An alternative recursive definition may be constructed:
[P] ?
== [[P] ?] P
== [[P] ?] [P] i
== [P] [?] cons [P] i
== [P] [P] [[?] cons] dip i
== [P] dup [[?] cons] dip i
The resulting alternative recursive definition is:
? == dup [[?] cons] dip i
It follows that the two RHSs of the two equivalent definitions are equal:
[?] cons dup first i == dup [[?] cons] dip i
(Note that cancellation of the trailing i
combinator on
both sides is not valid in general, though it would be valid here.)
We now have a recursive definition of recursion. Computationally this
is quite adequate, the recursive definition of any one of the
?
combinators really does make it possible to eliminate all
other recursive definitions. Some of the following sections deal with
the elimination of recursion even for the recursion combinators.
The recursive definition of a recursion combinator can of course also be given in other languages. For an example in the lambda calculus see {Henson87}. For an example in ML see {Sokolowski91}.
The theory of computability treats various formalisms which independently aim to capture the essence of computation. These formalisms include Turing machines, the lambda calculus, Markov algorithms, register machines, flow charts and any of the conventional programming languages. Each formalism deals with various specifications: for the Turing formalism the specifications are particular Turing machines, for the Markov formalism the specifications are particular Markov algorithms, for the programming languages they are particular programs, and so on. Some results in the theory concern connections between the formalisms, whereas others concern connections within the same formalism.
The main results in the theory can be divided into two groups: those concerned with the relationships between two such formalisms and those concerned with just one.
The principal results in the first group are that the formalisms are all equivalent, in this sense: Given any two formalisms, for any specification in the one formalism there is a specification in the other formalism such that the two specifications compute the same function. Proofs of such results are always constructive, by exhibiting an algorithm which converts any specification in the one formalism into a specification in the other.
Results in the second group concern just one formalism and often take this form: For any specification S1 having a certain property there is another specification S2 such that the two specifications S1 and S2 are related in some special way. Proofs are again constructive, by exhibiting algorithms for converting S1 into S2.
The algorithms of both groups can be expressed in any of the formalisms. The constructive proof then look like this: There is a specification S (in formalism F) with the capacity of transforming any specification S1 (in formalism F1) into a specification S2 (in formalism F2, not necessarily different from F1) such that the two specifications S1 and S2 are related in a special way. Since the formalisms are universal, the algorithms can be written in the formalism itself, as another specification S. The results then take this form: There is a specification S such that for every specification S1 the result of applying S to S1 yields the required S2.
Historically the theory of computability has been mainly couched in terms of recursive functions, a collection of functions built from a small base by means of a small number of constructors. These functions take natural numbers as arguments and give natural numbers as values. By means of a theoretically elegant (but computationally unfeasible) mapping called Gödel numbering, linguistic entities such as expressions can be assigned a unique number. In this way the functions can be made to “talk about themselves”. Any syntactic operation on linguistic entities has a counterpart purely arithmetical operation on natural numbers which are the Gödel numbers of these entities. The theorems are basic to any proper understanding of the foundations of computer science. But the proofs, when not hand-waiving by appeals to Church’s thesis, tend to be forbidding.
The ease of writing S depends crucially on how well the formalism can handle its own specifications. Often it is necessary to use encodings of S1 and S2 into a form which the formalism can handle. In the worst case the encodings are truly ghastly. In the best case the formalism can handle its own specifications very naturally. This is sometimes expressed by the slogan program = data. Most formalisms fare badly on this, including most programming languages.
Some languages can treat their own programs as data: they are Lisp, Snobol, Prolog and their descendants, some macro generators and the command languages based on them. Joy does it at least as well. In these languages programs can operate on other programs or on other data to produce other programs which may then be executed. Because of this, arithmetisation is not necessary, functions can take programs rather than Gödel numbers of programs as arguments, and metatheoretic proofs become much easier. {Phillips92} writes:
One might wonder how differently recursion theory might be viewed if it had arisen out of practical developments instead of predating them.
Indeed, one might speculate how much more natural computability theory would have been if Lisp has been invented thirty years earlier. Readers might want to pursue this topic, see the discussion in {Hofstadter85} starting on p 444, and his very startling conclusion on p 449.
In Joy many metatheoretic proofs are easier still because there are no named formal parameters. Consequently the difficult operation of substituting actual parameters for formal parameters does not occur, and everything is algebra. Reasoning about Joy programs will now be illustrated with a number of classical theorems.
A simple form of the parametrisation theorem states: there is a recursive function taking the Gödel number of an n-ary function F1 as one argument and taking the Gödel number of a numeral as a further argument such that the value of this recursive function is the Gödel number of an n-1-ary function F2 which is obtained by substituting the value of the numeral for the first argument of F1.
Here is the same theorem for Joy. An operand is a numeral (or any other constant) or a quotation:
For any program P and operand X
there is a program Q such that
Q == [Q] i == X P
The required program Q
is just the concatenation. But
the theorem can be strengthened:
There is a program O such that
for any program P and operand X
there is a program Q such that
X [P] O == [Q] and
Q == [Q] i == X [P] i == X P
The required program O
is cons
:
X [P] O == X [P] cons == [X P] == [Q]
The theorem is rather trivial in Joy. The formalisation of the proof is an overkill. However, many proofs later on will have the same structure, and it may help to get used to that.
The parametrisation theorem generalises to any number m of
arguments that are parametrised. Consequently it is often called the
S-m-n theorem, see {Rogers67}.
The “S” stands for “substitution” – it means that m of the
n formal parameters are replaced by fixed values. In Joy
notation repeated cons
operations can parameterise for any
m, simply by repeating cons
m times.
In recursive function theory it is possible for functions to take other functions as parameters, and since functions are untyped, they can take themselves as parameter. One consequence of the S-m-n theorem is the diagonalisation theorem: There is a recursive function taking as argument the Gödel number of a function which takes at least one parameter, and giving as value the Gödel number of the function obtained from the given one by substituting itself for the parameter.
Again, the counterpart in Joy is trivial:
For any program P
there is a program Q such that
[Q] i == Q == [P] P
The required quoted [Q]
is [[P] P]
. The
stronger form is:
There is a program O such that
for any program P
there is a program Q such that
[P] O == [Q] and
[Q] i == Q == [P] dup i == [P] P
The required program O
is dup cons
.
Proof:
[P] O == [P] dup cons == [P] [P] cons == [[P] P] == [Q]
[Q] i == [[P] P] i == [P] P == Q
So in Joy the short program:
dup cons
implements diagonalisation.
It may help to see diagonalisation in action for a small program:
[dup reverse concat] dup cons i
== [dup reverse concat] [dup reverse concat] cons i (by dup)
== [[dup reverse concat] dup reverse concat] i (by cons)
== [dup reverse concat] dup reverse concat (by i)
== [dup reverse concat] [dup reverse concat] reverse concat(by dup)
== [dup reverse concat] [concat reverse dup] concat (by reverse)
== [dup reverse concat concat reverse dup] (by concat)
So the program [dup reverse concat]
diagonalised and run
produces its own palindrome. Most programs cannot be self-applied
because it would breach typing. But here are a few, readers might like
to see what they do:
[]
[reverse 42]
[42 pop]
[pop]
[dup cons]
In {Smullyan61} theory of formal
systems of (character) strings, the norm or diagonalisation of
a string X
is defined to be X
followed by its
own Gödel number in dyadic notation. In Joy the detour via Gödel number
is eliminated by using quotations.
We now consider two reductions. The first is this:
[P] dup i
== [P] [P] i (by dup)
== [P] P (by i)
The second is this:
[P] dup cons i
== [P] [P] cons i (by dup)
== [[P] P] i (by cons)
== [P] P (by i)
So the two reductions come to the same, and hence their first lines are identical:
[P] dup i == [P] dup cons i
But this holds for all [P]
. So we may infer:
dup cons i == dup i
An important result in computability is the recursion theorem, see for example {Rogers67}. It states:
For every function P there is a Gödel number of a function Q such that the result of applying the function P to the Gödel number of Q is the Gödel number of a function identical with Q.
The theorem says that for every function P which is used to edit code
for computing functions, there is a program Q which will compute the
same function before and after the editing. So Q is a fixpoint, a
program not affected by P. Fixpoint theorems are about programs
P
(rather than about the functions which they compute).
Here is an equivalent version for Joy:
For any program P
there is a program Q such that
[Q] i == Q == [Q] P
The required program Q
is:
[dup cons P] dup cons P
The proof is as follows:
[ Q ] i
== [[dup cons P] dup cons P] i (def Q)
== [dup cons P] dup cons P (by i)
== [dup cons P] [dup cons P] cons P (by dup)
== [[dup cons P] dup cons P] P (by cons)
== [ Q ] P (def Q)
The theorem may be strengthened. The transformation from
P
to Q
can be done by a simple program
O
:
There is a program O such that
for any program P
there is a program Q such that
[P] O == [Q] and
[Q] i == Q == [Q] P
The required program O
is:
[dup cons] swap concat dup cons
The proof is as follows:
[P] O
== [P] [dup cons] swap concat dup cons (def O)
== [dup cons] [P] concat dup cons (by swap)
== [dup cons P] dup cons (by concat)
== [dup cons P] [dup cons P] cons (by dup)
== [[dup cons P] dup cons P] (by cons)
== [Q] (def Q)
The fixpoint finding program O
is useful: henceforth it
will be called the fix operator. We also define the y combinator for
Joy:
y == fix i
This is the Joy counterpart of Curry’s “paradoxical” Y combinator (uppercase), see {Curry58}. This combinator is well known in the literature on the lambda calculus and on combinatory logic.
The recursion theorem traces its ancestry to Epimenides, Russell and
Grelling (for [P]
put [i not]
), and to Gödel.
In its general form it is due to Kleene. Its proof is very cumbersome in
just about all formalisms. As can be seen from the above, in Joy the
proof is very simple.
The S-m-n theorem is the basis for partial evaluation or
program specialisation. A special case for this is the partial
evaluator mix which can transform an interpreter into a
compiler, and can transform itself into a compiler generator. {Jones92} shows that such transformations can
be done realistically only if the partial evaluator has primitives in
terms of which the fixpoint constructions of the recursion theorem can
be implemented efficiently. The two primitives chosen are in fact very
close to Joy’s i
combinator and quotation.
The recursion theorem leads in a few steps to Rice’s theorem, see {Rogers67}, which encapsulates all the bad news of computability theory: for example the halting problem, or the impossibility of writing programs which check other programs – implementation, student exercises – for correctness.
The theorem, in both forms, speaks of all programs
P
. It is of some interest to see what happens when some
simple actual programs are chosen. Two particularly simple programs
are:
the empty program, which computes the identity function, and
the i
combinator program, which dequotes and
executes a program on top of the stack.
A frequently paraded corollary of the recursion theorem is the
existence of self-reproducing programs. When run, these program
produce a replica of themselves. A suitably general definition of
self-reproducing programs is this: a program S
is
self-reproducing if this conditions holds:
S i == S
Note that S
is allowed to be arbitrarily complex, it
does not have to be just a quoted program. Such programs are obtained
from the recursion theorem by making P
compute the identity
function. In Joy this is represented most simply by the empty program,
in quoted form []
. (Alternatively, it is represented by the
program id
, in quoted form [id]
.):
There is a program Q such that
[Q] i == [Q]
Proof: let Q
be the program:
[dup cons] dup cons
Then the derivation is as follows:
[[dup cons] dup cons] i
== [dup cons] dup cons (by i)
== [dup cons] [dup cons] cons (by dup)
== [[dup cons] dup cons] (by cons)
Readers might like to be reminded of a self-replicating C-program:
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
(Author unknown. Proving correctness is not easy.)
Related to self-reproducing programs are self-describing
programs. A program S
is self-describing if running it
produces a description of it. In Joy a program S
is
self-describing if this condition holds:
S == [S]
Here is a self-describing program:
[dup cons] dup cons
== [dup cons] [dup cons] cons (by dup)
== [[dup cons] dup cons] (by cons)
The other simple program to investigate is the i
combinator. Then [P]
is [i]
, and
[Q]
is [[dup cons i] dup cons i]
. This is what
happens:
[[dup cons i] dup cons i] i
== [dup cons i] dup cons i (by i)
== [dup cons i] [dup cons i] cons i (by dup)
== [[dup cons i] dup cons i] i (by cons)
So this is a program that runs forever. This is the Joy counterpart of what in the lambda calculus is selfapplication selfapplied. In both the reductions do not terminate. In the lambda calculus the rule used is beta-reduction, the substitution of actual for formal parameters. In Joy there are no formal parameters, just algebraic simplification.
The dup cons
combination occurs so frequently that it is
worth introducing an operator duco defined by:
duco == dup cons
This will make proofs shorter both horizontally (because programs are shorter) and vertically (because two steps are condensed into one).
The next kind of program is with [P]
=
dup
. Then [Q]
is the quotation in the first
line below, and execution is as follows:
[[duco dup] duco dup] i
== [duco dup] duco dup (by i)
== [[duco dup] duco dup] dup (by duco)
== [[duco dup] duco dup] [[duco dup] duco dup] (by dup)
So this is an example of a program which when run produces two copies of itself. The program in the second line is another self-describing program, it produces two copies of itself.
The next program uses [P]
= dup i
. Then
execution starts like this:
[[duco dup i] duco dup i] i
== [duco dup i] duco dup i (by i)
== [[duco dup i] duco dup i] dup i (by duco)
== [[duco dup i] duco dup i] [[duco dup i] duco dup i] i (by dup)
So this is another program that runs forever. In addition it leaves earlier copies of itself on the stack. In an implementation it must run out of stack space. The program in the second line is again self-describing, but it never finishes what would be a description of infinitely many copies of itself.
Another program that does much the same is:
[[duco dup dip] duco dup dip]
In the introduction a program was given which computes the
factorial function without being defined recursively. Here are the first
few steps in the execution of that program, those that are independent
of the numeric parameter that is provided to the program. To save space,
the duco
operator is used:
[ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
y
== [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
fix i (by y)
== [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
[duco] swap concat duco i (def fix)
== [duco] [ [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
concat duco i (by swap)
== [ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
duco i (by concat)
== [[ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
i (by duco)
== [ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte (by i)
== [[ duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
duco [pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte ]
[pop 0 =] [pop pop 1] [[dup 1 -] dip i *] ifte (by duco)
At this point the numeric parameter for the computation will be
needed. The ifte
combinator will execute its if-part. That
results in the long two-line quotation being popped off the stack and
the parameter being compared with 0. If that evaluates to
true
, the then-part is executed with the parameters and the
two-line quotation still on top of the stack, The then-part pops both
and replaces them by 1. But if the if-part returns false
,
the else-part is executed, again with the parameter and the two-line
quotation on top of the stack.
First the two-line quotation is set aside by dip
, the
parameter is duplicated and the top copy decremented. Only then the
two-line quotation will be executed by i
. The effect is to
construct a clone of itself with duco
, pushing the same
if-part, then-part and else-part for the contained ifte
.
When that eventually returns, the old parameter and above it the
factorial of the decremented parameter copy are finally multiplied to
give the required factorial of the original parameter.
Most theoretical and practical programming languages use formal parameters in definitions of functions (and procedures). These functions are called with expressions as the actual parameters. There are two ways of doing this: normal order, also known as call by name, and applicative order, also known as call by value. In normal order the unevaluated expressions are substituted for the formal parameters, and the expressions will be evaluated only if the body of the function requires it.
In applicative order the expressions are evaluated first and that value is substituted for the formal parameter. If the value of a parameter expression is used repeatedly in the body of the function, then normal order requires repeated evaluation of the expression, whereas applicative order requires only one evaluation. On the other hand, if the body does not require the value at all, then applicative order will have wasted time trying to evaluate the expression. If that evaluation does not terminate at all, then the call of the function will fail to terminate under applicative order although it would have terminated under normal order. Consequently there are some functions that are less defined under applicative order evaluation than they are under normal order.
The Y combinator of the lambda calculus does not work at all for applicative order. No matter what the function is, the call with the Y combinator will try to do the unending sequence of substitutions and hence fail to terminate. Therefore a different version of the Y combinator has to be used, see {Stoy77}, {Henson87}, {Paulson92}, {Winskel93}.
In Joy there are no formal parameters, there is no substitution, and
hence strictly speaking there is no evaluation order. Joy functions take
their actual parameters as values from the stack, and in a way this
resembles applicative order. The y
combinator of
Joy always terminates correctly provided that the quoted program and any
of its actual parameters did get onto the stack, and provided that the
quoted program terminates. So there is a difference between the Y
combinator of the lambda calculus under applicative order, and the
y
combinator of Joy. What is the reason? Consider again,
for an arbitrary program [P]
:
[P] y
== [P] fix i
== [[duco P] duco P] i
== [duco P] duco P
== [[duco P] duco P] P
This point is always reached in the initial call, and it is
independent of what P
actually is. At this point there is a
(double) quotation (containing P
twice) on top of the
stack, and P
now has it available as a parameter.
Quotations are never evaluated further, although they can be explicitly
manipulated by operators or they can be explicitly called by
combinators. Consequently the second duco
inside the
quotation will not be executed to yield:
== [[[duco P] duco P] duco P] P
Here of course the second and third duco
would have been
the next candidates for execution, and so on. So the reason for the
difference is that quotations in Joy are never evaluated automatically,
whereas abstractions in the lambda calculus will be under applicative
order evaluation.
It is possible to generalise fixpoint combinators for mutual recursion. For the lambda calculus this is done for example by {Kogge91}, shows how pairs of mutually recursive definitions can be eliminated by using a pair of rather complicated mutually recursive combinators Y1 and Y2. A similar technique is explained in {Henson87}. Constructions such as these rely on the existence of double fixpoints, whose existence follows from a double recursion theorem (see for example {Smullyan94}). Presumably these can be translated into Joy, too.
Consider arbitrary sets of functions, and note that this does not
mean sets of programs. Call such a set non-trivial if it is
neither the universal set of all functions nor the null set of no
functions. Given a non-trivial set F
of partial functions
and a Joy program [Q]
, the question arises whether the
function computed by [Q]
is in F
or not. Is
there an algorithm for deciding such questions? More specifically, is
there a Joy program, say PF
, for deciding such questions?
Such a program would have to satisfy:
[Q] PF == true, if the function computed by [Q] is in F
== false, if the function computed by [Q] is not in F
The program PF
is also expected to terminate for all
inputs [Q]
. Can there be such a program?
No. This is Rice’s theorem:
For all non-trivial sets F of partial functions,
there is no program PF such that
for all programs Q
PF can decide whether
the function computed by Q is in F.
Proof: Let F
be a non-trivial set of functions. Since
F
is non-trivial, there are some functions in
F
and some functions not in F
. Let program
[E]
compute a function in F
and let
[E']
compute a function not in F
.
Now assume that a program PF
exists, and that it always
terminates. Next, consider the following Joy program:
[PF] [pop [E']] [pop [E]] ifte
The program expects an arbitrary program [Q]
on top of
the stack and then pushes the three small quotations. The ifte operator
then removes them again and executes the first short quotation
[PF]
. This will result in a truth value, true
or false
. In the first case the second short quotation is
executed, it pop
s the [PF]
and replaces it by
[E']
. In the second case the third short quotation is
executed, it pop
s the [PF]
and replaces it by
[E]
. So, if [Q]
is in F
, the
program returns a program not in F
, namely
[E']
. On the other hand, if [Q]
is not in
F
, the program returns a program that is in F
,
namely [E]
. In other words, no matter what the input
program [Q]
is, the output program is opposite as far as
membership in F
is concerned. Call the above program
OPF
.
By the fixpoint theorem, the program OPF
must have a
fixpoint, say FIXOPF
, satisfying:
FIXOPF == [FIXOPF] OPF
But if the program on the left computes a function in F
,
then the program on the right computes a function not in F
,
and vice versa. But this is a contradiction. So the program
OPF
fails for at least one program, its own fixpoint
FIXOPF
. We must conclude that the assumption was false. So
there cannot be a program PF
.
The above proof of Rice’s theorem for Joy is adapted from a proof for
recursive functions in {Phillips92}.
Several well-known paradoxes are instances of the recursion
theorem, for example the Liar paradox and Grelling’s
paradox use as the program [P]
the simple program
[i not]
. Recent discussions of the Liar and related
problems can be found in {Martin70} and in
{Barwise-Etchemendy87}.
There are variants of the programs in section 5 which are worth mentioning.
The following is a simplification of the first self-reproducing program. It is simpler because it uses an operand parameter which it leaves intact:
There is an operand Q1 and a program Q2 such that
Q1 [Q2] i == Q1 [Q2]
Proof: Let Q1
= [dup]
and let
Q2
= dup
. Then the derivation is:
[dup] [dup] i
== [dup] dup (by i)
== [dup] [dup] (by dup)
This seemingly trivial self-reproducing program will be used to
derive a version of recursion that is actually more efficient than the
one based on the y
combinator.
The next uses the b
combinator:
There are programs Q1 and Q2 such that
[Q1] [Q2] b == [Q1] [Q2]
Proof: let Q1 and Q2 be the programs:
[[] cons dup first]
[] cons dup first
Then the derivation is:
[[[] cons dup first]] [[] cons dup first] b
== [[] cons dup first] [] cons dup first (by b)
== [[[] cons dup first]] dup first (by cons)
== [[[] cons dup first]] [[[] cons dup first]] first (by dup)
== [[[] cons dup first]] [[] cons dup first] (by first)
It is useful to introduce an operator codufi
by the
definition:
codufi == cons dup first
The operator will be found useful in the next section.
The next example of a self-reproducing program again uses the
i
combinator:
There is a program Q such that
[Q] i i == [Q] =/= [Q] i
Proof: let Q
be the program:
[duco [] cons] duco [] cons
The execution now looks like this:
[[duco [] cons] duco [] cons] i i
== [duco [] cons] duco [] cons i (by i)
== [[duco [] cons] duco [] cons] [] cons i (by duco)
== [[[duco [] cons] duco [] cons]] i (by cons)
== [[duco [] cons] duco [] cons] (by i)
Observe that the quoted programs in the first and fourth line differ by just the extra quoting in line four.
There is another program with the property. Let Q
be the
program:
[false [not] infra dup rest cons] [not] infra dup rest cons
The combinator infra expects a program (here [not]
) on
top of the stack, and below that a quotation (here the first half of the
program). It temporarily turns the quotation into the stack and executes
the program (here it complements the truth value false
or
true
at the very beginning. An outline of the derivation
is:
[[false [not] infra dup rest cons] [not] infra dup rest cons] i i
[[true [not] infra dup rest cons] [not] infra dup rest cons] i
[[false [not] infra dup rest cons] [not] infra dup rest cons]
(Each by 5 steps)
In the quoted programs the first and second lines are examples of mutually describing programs, satisfying:
P = [Q] and Q == [P]
In detail:
P == [false [not] infra dup rest cons] [not] infra dup rest cons
Q == [true [not] infra dup rest cons] [not] infra dup rest cons
The previous program keeps an internal toggle (true
or false
) which is thrown every time it is called. The next
program does the same with a counter, and consequently every generation
is different from all previous ones: Let Q
be the
program:
[0 [1 +] infra dup rest cons] [1 +] infra dup rest cons
Successive executions using i
cause the 0
to be incremented to 1
, 2
and so on.
Almost all programs become ruined when maltreated. Cutting off
bits and pieces would generally cause malfunction. Worms of course can
regenerate from small pieces, but most programs are not like that. But
programs can be written so that after each call they become more and
more insensitive to mutilation. The mutilating operations are
first
and rest
or sequences of such
operations. There are programs which with every generation become less
and less sensitive to longer and longer sequences of mutilations. Two
such Q
are:
[duco duco] duco duco
[cons duco] [cons duco] cons duco
Kym Horsell commented that self-reproducing programs do not do
anything useful. The idea would be that for a given program another
program is to be found which is self-reproducing and at each generation
executes the original program first. So the problem is to find, for
arbitrary P
, a program Q
such that:
[Q] i i .. i == P P .. P [Q]
where the i
on the left and the P
on the
right are repeated the same number of times. Here is one answer:
There is a program O such that
for all programs P
there is a program Q such that
[P] O == [Q] and
[Q] i == Q == P [Q]
Proof: Let O
be the program:
[dup [first i] dip rest duco] cons duco
Then for any P
:
[P] O
== [P] [dup [first i] dip rest duco] cons duco (by O)
== [[P] dup [first i] dip rest duco] duco (by cons)
== [[[P] dup [first i] dip rest duco] [P] dup [first i] dip rest duco]
(by duco)
Apart from the y
combinator there are other recursion
combinators. One of them is given by the following theorem:
There is a program M such that
(1) [y] M == [y] i == y and
for some programs N and O
(2a) [M] [duco] swap concat == [N] and
(2b) [N] duco == [O] and
(3) [M] y == [N] dup i == O and
for all programs P
(4) [P] O == [P] y and
there is a program Q such that
(5) [P] [O] cons == [Q] and
(6) [Q] i == Q == [Q] P
The first line, (1), says that y
is a fixpoint for
M
. Lines (2a) and (2b) show how to construct two further
programs N
and O
. Line (3) expresses a
relationship between the three programs M
, N
and O
. Line (5) shows how to construct a program
Q
which depends on P
. The last line of the
theorem says that the Q
is a fixpoint for
P
.
Proof: Only M
need be given, because
N
and O
are constructed. The required
M
is actually a combinator already seen in section 2:
cons dup first i
But the first three operators can be replaced by codufi
defined in the previous section. So we set:
M == codufi i
This is used to construct N
:
[M] [duco] swap concat
== [codufi i] [duco] swap concat (def M)
== [duco] [codufi i] (by swap)
== [duco codufi i] (by concat)
== [N] (def N)
So we have:
N == duco codufi i
This program can be used to construct O
:
[N] duco
== [duco codufi i] duco (def N)
== [[duco codufi i] duco codufi i] (by duco)
== [O] (def O)
So we have:
O == [duco codufi i] duco codufi i
This program can now be used to construct, for arbitrary
[P]
a corresponding [Q]
:
[P] [O] cons
== [P] [[duco codufi i] duco codufi i] cons (def O)
== [[P] [duco codufi i] duco codufi i] (by cons)
== [Q] (def Q)
So we have, for arbitrary [P]
:
Q == [P] [duco codufi i] duco codufi i
It remains to be shown that Q
is a fixpoint for
P
:
[Q] i
== [[P] [duco codufi i] duco codufi i] i (def Q)
== [P] [duco codufi i] duco codufi i (by i)
== [P] [[duco codufi i] duco codufi i] codufi i (by duco)
== [[P] [duco codufi i] duco codufi i] [P] i (by codufi)
== [[P] [duco codufi i] duco codufi i] P (by i)
== [Q] P (def Q)
This shows that Q
is indeed a fixpoint for
P
.
Different recursion combinators, and indeed a whole hierarchy of them, are well known in the literature on lambda calculus and combinatory logic; see for example {Barendregt84}, {Henson87}, {Revesz88}. Possibly one of the most satisfying introductions to combinatory logic is to be found in the remarkable little book {Smullyan90} To Mock a Mockingbird in which he manages to combine humour and rigour. Part III is a self-contained exposition to combinatory logic in which fancyful names are given to lambda calculus combinators.
Here are three recursive definitions. The function
r-last
finds the last element of a list. The
function r-fac
computes the factorial of a number.
The function r-fib
computes the Fibonacci value of
a number:
r-last == [rest null] [first] [rest r-last] ifte
r-fac == [0 =] [succ] [dup pred r-fac *] ifte
r-fib == [small] [pop 1] [pred dup pred [r-fib] unary2 +] ifte (* app2 *)
The following three functions also compute the last, the factorial
and the Fibonacci of their parameter. Note that there is no definition:
the recursion is taken care of by the y
combinator:
[ [pop rest null] [pop first] [[rest] dip i] ifte ] y
[ [pop 0 =] [pop succ] [[dup pred] dip i *] ifte ] y
[ [pop small] [pop pop 1] [[pred dup pred] dip unary2 +] ifte] y (* app2 *)
But the y
combinator is intrinsically inefficient
because of the way it operates. On every recursive call a certain
program is popped off the stack to be executed. The first task of that
program is to construct a copy of itself, in readiness for any further
recursive calls. But this is really quite silly. It would be better more
efficient if the program to be executed was not popped off the
stack at all but simply left there. Whereas most combinators remove
their parameters from the stack, a new x
combinator leaves
it there as a parameter for itself.
The following programs use the x
combinator instead of
the y
combinator. They are obtained from the first two of
the previous programs by replacing the internal occurrences of the
i
combinator and the external occurrence of the
y
combinator by the x
combinator:
[ [pop rest null] [pop first] [[rest] dip x] ifte ] x
[ [pop 0 =] [pop succ] [[dup pred] dip x *] ifte ] x
The x
combinator might have been defined
by:
x == dup i
Similar lambda calculus constructions are discussed in {Tennent76}, {Bauer-Woessner82}, {Schmidt86} and {Tennent91}.
The remainder of this section describes some further general and particular combinators of Joy which can be used to avoid recursive definitions.
The genrec
combinator takes four program parameters
in addition to whatever data parameters it needs. Fourth from the top is
an if-part, followed by a then-part. If the if-part yields true, then
the then-part is executed and the combinator terminates. The other two
parameters are the rec1-part and the rec2part. If the if-part yields
false
, the rec1-part is executed. Following that the four
program parameters and the combinator are again pushed onto the stack
bundled up in a quoted form. Then the rec2-part is executed, where it
will find the bundled form. Typically it will then execute the bundled
form, either with i
or with
app2unary2
, or some other combinator:
g-fac == [null ] [succ] [dup pred ] [i * ] genrec;
g-fib == [small] [ ] [pred dup pred] [unary2 +] genrec; (* app2 *)
The linrec
combinator also takes four program
parameters and is otherwise very similar to the genrec
combinator. The essential difference is that the bundled up quotation is
immediately called before the rec2-part. Consequently it can only be
used for linear recursion. Here are programs for finding the
last
of an aggregate or the Fibonacci value of a natural
number:
l-last == [rest null] [first] [rest ] [ ] linrec;
l-fac == [null ] [succ ] [dup pred] [*] linrec
The binrec
combinator is again similar, but it
applies the bundled quotation twice, once to each of the two top values
which the rec1-part has left on the stack. It implements binary
recursion. Below it is used for the Fibonacci function and for a
quicksort program for lists (or strings):
b-fib == [small] [] [pred dup pred ] [+ ] binrec;
b-qsort == [small] [] [uncons [>] split] [swapd cons concat] binrec;
The tailrec
combinator is similar to the
linrec
combinator, except that it does not have a
rec2-part. It can only be used for tail recursion, such as in
the program below which returns the last element of an aggregate:
t-last == [rest null] [first] [rest] tailrec;
The primrec
combinator is for primitive
recursion, it has the typical if-part built in. For numeric
parameters and for aggregate parameters it tests for null
.
The rec1-part is also built in, for numeric parameters it returns the
parameter and its predecessor, for aggregate parameters it returns the
first
of the aggregate and the rest
of the
aggregate. Recursion then occurs respectively on the predecessor or the
rest, and the rudimentary rec2-part typically combines the results.
The first program below computes the factorial (again!). The second turns any aggregate into a list. The third turns a suitable list of small numbers into a set. The fourth and fifth capitalise lists or strings of lowercase letters and produce, respectively, a list or string of corresponding capital letters. The last program takes an aggregate as parameter and produces a list of every second element in the original order and then the other elements in the reverse order:
p-fac == [1 ] [* ] primrec
agg2list == [[]] [cons ] primrec
list2set == [{}] [cons ] primrec
capstring == [""] [[32 -] dip cons] primrec
caplist == [[]] [[32 -] dip cons] primrec
fancy == [[]] [reverse cons ] primrec