Design of a Joy interpreter written in Joy

by Manfred von Thun

This short paper contains the design of a Joy interpreter written in Joy itself.

Let L1 and L2 be two languages. An interpreter for language L1, written in language L2 is a program P which takes as a parameter any program written in L1 and executes it in the same way as a processor for L1 would have. If the two languages are the same language L, such an interpreter is called metacircular. Languages in which program = data make it particularly easy to write an interpreter in its own language. The best known example is the Lisp interpreter eval. A metacircular interpreter for Joy can also be written.

The remainder of this paper assumes some familiarity with Joy.

A Joy interpreter written in Joy

A Joy interpreter in Joy is a program which expects a quoted program on top of the stack and executes it. The following is the design of Joy interpreter joy written in Joy itself. The first version of the interpreter is merely a reminder that Joy already has a combinator, the i combinator, which removes a program from the top of the stack and executes it:

joy  ==  i

The next version makes explicit the fact that Joy programs are sequences which are executed by stepping through the members of the sequence. For each member in the sequence an appropriate action is taken. The step combinator will indiscriminately leave literals, operators and combinators on top of the stack. But any operators and combinators on top of the stack cannot actually be executed. However, their unitlist can be executed by the i combinator. So this is the next version of the interpreter:

joy  == 
        [ unitlist 
          i ] 
        step

The last interpreter does not actually specify what is to be done what a particular element of the sequence has appeared on top of the stack. A better one should say things like this for the operators:

If the top element is "+" 
    then pop off the "+" and add the two numbers below 
If the top element is "rest" 
    then pop off the "rest" and take the rest of the aggregate below

For the literals it is even simpler:

If the top element is any number 
    then leave it there and do not do anything 
If the top number is any list 
    then leave it there and do not do anything

Essentially we need a way of examining what is on top of the stack and executing the appropriate action. So in the previous version of the interpreter we must replace the unitlist by something more specific. It has to consist of several cases, for the various literals, operators and combinators. The opcase operator is suitable for just that. It expects any kind of item on top of the stack, and above that a list of cases. Each case is a list consisting of a test item and a (possibly) empty rest. The effect of the opcase operator is to remove the top item and the list of cases, and to leave behind the rest of case for which the item matched the test item. The last case in the list is the default, it does not have a test item. The default is returned if there was no match. For the present design step the default does nothing. The following is what has to replace the unitlist in the interpreter:

      [ [ 0 ]                           (* sample number *) 
        [ [] ]                          (* sample list *) 
        [ + pop + ]                     (* addition operator *) 
        [ rest pop rest ]               (* rest operator *) 
        [ ] ]                           (* default, do nothing *) 
      opcase

It is an easy matter to add the other cases for literals. They have to be treated just like numbers and list:

        [ 'A ]                          (* sample character *) 
        [ true ]                        (* sample truth value *) 
        [ "" ]                          (* sample string *) 
        [ {} ]                          (* sample set *)

Similarly, other operators have to be added such as:

        [ swap pop swap ]               (* swap operator *) 
        [ cons pop cons ]               (* cons operator *)

For the combinators it is tempting to treat them just like operators:

        [ dip pop dip ]                 (* dip combinator - WRONG *) 
        [ map pop map ]                 (* map combinator - WRONG *)

This will work correctly, but it just uses the Joy system inside the quoted program that is being called by i or map. Instead it should use the Joy-in-Joy interpreter that we are writing. To achieve that effect, the program parameter [P] for the combinator has to be replaced by [[P] joy]. For i, dip and map and other unary combinators this is quite easy: after the pop execute [joy] cons. This gives cases like the following:

        [ i    pop [joy] cons  i   ]    (* i combinator *) 
        [ dip  pop [joy] cons  dip ]    (* dip combinator *) 
        [ map  pop [joy] cons  map ]    (* map combinator *)

The case for the i combinator is unnecessarily inefficient, it could be optimised to:

        [ i    pop             joy ]    (* i combinator *)

However, for uniformity this optimisation will not be used here.

So that we do not lose track, here is the interpreter as designed so far:

joy  ==                                 (* literals *) 
    [ [ [ 0    ] 
        [ []   ] 
        [ true ] 
        [ 'A   ] 
        [ ""   ] 
        [ {}   ] 
                                        (* operators *) 
        [ +             pop     +    ] 
        [ rest          pop     rest ] 
        [ dup           pop     dup  ] 
        [ swap          pop     swap ] 
        [ pop           pop     pop  ] 
        [ -             pop     -    ] 
        [ and           pop     and  ] 
        [ cons          pop     cons ] 
                                        (* unary combinators *) 
        [ i             pop [joy] cons    i      ] 
        [ dip           pop [joy] cons    dip    ] 
        [ map           pop [joy] cons    map    ] 
        [ filter        pop [joy] cons    filter ] 
        [ ] ]                           (* provisional default *) 
      opcase 
      i ] 
    step

The interpreter is getting close to its final shape now, but several things need to be fixed. Obviously the binary combinators have to be treated in a way that is similar to the unary ones: The two program parameters [P] and [Q] have to be replaced by [[P] joy] and [[Q] joy]. This is best done by using the ~~app2~~unary2 combinator as follows:

        [[joy] cons]    unary2          (* app2 *)

So for the binary combinators the cases look like this:

                                        (* binary combinators *) 
        [ branch        pop [[joy] cons] unary2   branch ]  (* app2 *)
        [ cleave        pop [[joy] cons] unary2   cleave ]  (* app2 *)

For the ternary combinators and quaternary combinators the pattern is much the same, there are now three or four program parameters that need to be modified. This is easily done using the ~~app3~~unary3 and ~~app4~~unary4 combinators to effect the modification:

                                        (* ternary combinators *) 
        [ ifte          pop [[joy] cons] unary3   ifte   ]  (* app3 *)
                                        (* quaternary combinators *) 
        [ linrec        pop [[joy] cons] unary4   linrec ]  (* app4 *)
        [ binrec        pop [[joy] cons] unary4   binrec ]  (* app4 *)

There are still two major amendments needed for the interpreter. The first concerns user defined symbols as they might occur in the standard library, the personal library or in the preamble to a particular run. All Joy symbols have an internal tag, and the tags differ individually only for the operators and combinators. However, all numbers have the same internal tag, all characters have the same tag, all lists have the same tag and so on. Similarly all defined symbols have the same tag. The opcase operator looks at these tags, so all that is needed is one new case for user defined symbols. Just as any number will do as the prototype for numbers, so any user defined symbol will do as the prototype for defined symbols. We might as well choose “joy”. When a defined symbol has been encountered, it is necessary to find its definition, which is the program that constitutes the body of that definition. The body operator will find that, it expects a user defined symbol on top of the stack and returns the defining program in quoted form. The Joy interpreter now has to execute this. But it is essential that the Joy interpreter should execute inside that quotation. So it will not do to use the i combinator, but the joy interpreter itself has to be used. Hence the case for user defined symbols is just:

        [ joy           body joy ]

The single most common case will be the call of a defined atom rather than an inbuilt one. To improve efficiency this case (with joy) is placed to the front of the caselist.

It is always possible that the interpreter is used for programs which contain operators or combinators that are part of the language (not user defined) but have not been given cases in the interpreter itself, either intentionally or through neglect. The joy interpreter should be able to perform reasonably for those operators and combinators. So this is where the default clause for the opcase operator comes in handy. When none of the listed cases apply, treat the symbol as we did in the second version of the interpreter: take the unitlist and use the i combinator to execute that. Instead of using unitlist it is better to use its definition [] cons. So the default clause looks like this:

        [               [] cons i ]     (* default *)

This completes the working design of the interpreter. The following is its structure with the last two additions written out fully:

joy  == 
    [ [ [ joy           body joy ]      (* user defined *) 
        ...                             (* literals *) 
        ...                             (* operators *) 
        ...                             (* unary combinators *) 
        ...                             (* binary combinators *) 
        ...                             (* ternary combinators *) 
        ...                             (* quaternary combinators *) 
        [               [] cons i ] ]   (* default *) 
      opcase 
      i ] 
    step

Before we write out a more complete version of the interpreter, it is useful to make a number of changes to the design.

Firstly, the cases for the combinators of various arities contain common code which is repeated again and again. The interpreter becomes more readable if such instances are factored out and defined separately. There are cases for unary, binary, ternary and quaternary combinators, and for each of these the common code will be calledcr1,cr2,cr3andcr4. For uniformity the code forcr1is assimilated to that of the others, using ~~app1~~i`. Since they are not likely to be wanted anywhere else, their definitions are given inside a HIDE declaration.

Secondly, it could be useful if the default case does its job not silently but traces out each symbol that it hands over to the Joy system. Such symbols are duplicated and then written out by put. This now makes the default case:

        [               dup put [] cons i ]

The interpreter joy now looks like this:

HIDE 
  cr1  ==  pop [[joy] cons] i;      (* app1 *)
  cr2  ==  pop [[joy] cons] unary2; (* app2 *)
  cr3  ==  pop [[joy] cons] unary3; (* app3 *)
  cr4  ==  pop [[joy] cons] unary4  (* app4 *)
IN 
  joy  == 
    [ [ [ joy           body    joy       ] 
        [ []                              ] 
        [ 0                               ] 
        ... 
        [ dup           pop     dup       ] 
        [ +             pop     +         ] 
        [ cons          pop     cons      ] 
        [ put           pop     put       ] 
        ... 
        [ i             cr1     i         ] 
        [ dip           cr1     dip       ] 
        [ map           cr1     map       ] 
        [ filter        cr1     filter    ] 
        [ i             cr1     i         ]     (* app1 *)
        [ unary2        cr1     unary2    ]     (* app2 *)
        ... 
        [ ifte          cr3     ifte      ] 
        [ linrec        cr4     linrec    ] 
        [ binrec        cr4     binrec    ] 
        ... 
        [               dup put [] cons i ] ] 
      opcase 
      i ] 
    step 
END

The interpreter has appropriate cases for all literals and for quite a few operators and combinators. Most operators and combinators are still missing and will therefore be handled by the default clause. However, it is straightforward to make the interpreter more comprehensive and even complete.

It is of some interest to write an interpreter that is just adequate to interpret itself and leaves everything else to the default clause. The following is the minimal Joy interpreter joy0; for variety it uses the optimisation for the i combinator mentioned earlier:

joy0  == 
    [ [ [ joy0          body            joy0     ] 
        [ []                                     ] 
        [ pop           pop             pop      ] 
        [ cons          pop             cons     ] 
        [ opcase        pop             opcase   ] 
        [ body          pop             body     ] 
        [ i             pop             joy0     ] 
        [ step          pop [joy0] cons step     ] 
        [               [] cons         i        ] ] 
      opcase 
      i ] 
    step

The two versions joy and joy0 are in the file Joy in Joy.