Programming in Joy

by Manfred von Thun

Introduction

This paper shows how to write simple programs in Joy. Since Joy is very different from familiar programming languages, it takes a while to become used to writing programs in Joy. One way to start the learning process is by way of writing some simple generally useful library programs. In an implementation these may be part of an actual library, or they may be built into the language.

Some general utility programs include operators for manipulating the Joy stack just below the top element, further operators for manipulating aggregate values, and a few output programs. Generally useful are the stack type and the queue type. Values and operators of these two types are easily implemented as Joy lists and list operators.

Another collection of useful operators take an aggregate as parameter and produce a list of subaggregates. These operators are polymorphic in the sense that the aggregate parameter can be a (small) set, a string, or a list. For example, one such operator can take a set as parameter and produces a list of its subsets. All of these operators are definable without recursion by using the linrec combinator.

Some arithmetic operators are often used to illustrate recursive definitions, although it is well known that iterative execution is more efficient. In particular the use of accumulating parameters can often replace recursion. This is easily done in Joy using various iteration combinators.

Values of sequence types, such as strings and lists, can be sorted, and sorted sequences can be merged. Programs for doing this are easily written in Joy without recursive definitions by using appropriate combinators instead.

Joy’s inbuilt set type is implemented just as bitstrings, and hence it is limited to small sets of small numbers. The more useful big set type, which allows large sets of elements of any type, can be implemented in any language which has lists. It is simple to do in Joy, and the usual set-theoretic operations are easily provided. A similar implementation can be used for the dictionary type, which uses lookup tables for finite functions.

Also useful is the tree type, of lists, or lists of lists, or lists of lists of lists … of elements other than lists.

The remainder of this paper illustrates programming in Joy by way of simple examples. Many of the programs are first written in pseudo-code and then translated into Joy. Some of the Joy programs here have been adapted from the literature on ML or Miranda (“Miranda” is a trademark of Research Software Ltd). The next section defines some useful general purpose operators and combinators. This is followed by a section on two little collections of operators for two datatypes, stacks and queues. A longer section deals with programs for creating and manipulating lists of subaggregates. A shorter section then illustrates the use of accumulating parameters for the efficient implementation of numeric functions. Then there is another section on sorting sequences and on merging sorted sequences. Another section gives an implementation of unrestricted sets and of lookup dictionaries.

Utility operators and combinators

This section describes some very simple utilities which are useful in very different settings. Joy definitions are of the form:

        ATOM   ==   PROGRAM

where the long equals == means that the atom on the left is being defined to cause the execution of the program on the right.

Joy has three important operations for manipulating the top few items of the stack: pop for removing the top item, dup for creating a copy of the top item, and swap for interchanging the top two items. Often it is necessary to perform similar operations further below the stack. The following define three similar operators which leave the top element of the stack intact and perform the work just below that. All three use the dip combinator which takes as a parameter a quoted program and below that a further item. The item is saved, the quoted program is executed and the item then restored:

        popd   ==  [pop ] dip
        dupd   ==  [dup ] dip
        swapd  ==  [swap] dip

The popd operator removes the second item. The dupd operator duplicates the second item. The swapd operator interchanges the second and third item.

Three similar operators affect the order of the top three items on the stack. The rollup operator places items one, two and three from the top in the order two, three, one. The rolldown operator places items one, two and three in the order three, one, two. The rotate operator places them in the order three, two, one:

        rollup     ==  swap [swap] dip
        rolldown   ==  [swap] dip swap
        rotate     ==  swap [swap] dip swap

The next examples are for unary operators which expect an item on the stack and replace it with either a set or a string or a list containing just that item. All three operators work by first pushing the empty set {} or the empty string “” or the empty list [] on top of the stack. Then they use the cons operation to add the item below into the aggregate. The items have to be of the right type. The unitset operator requires a small number, the unitstring operator requires a character, and the unitlist operator requires anything. If the items are not of the right type, an error occurs when the cons is executed:

        unitset     ==  {} cons
        unitstring  ==  "" cons
        unitlist    ==  [] cons

The action of all three is reversed by the single standard operator first.

Analogously one may define three operators pairset, pairstring and pairlist, which form a set, string or list from two appropriate items on top of the stack:

        pairset     ==  {} cons cons
        pairstring  ==  "" cons cons
        pairlist    ==  [] cons cons

The action of all three is reversed by a single operator unpair, which may be defined in either of two equivalent ways:

        unpair  ==  uncons uncons pop
        unpair  ==  uncons first

Joy has two operators applicable to non-empty sets, strings and lists: The operator first extracts the first item of a string or list, and that item from a set which is the first in the underlying order. The operator rest removes the first item and returns the remaining set, string or list. Sometimes it is necessary to extract the second or third item. Suitable definitions are these:

        second  ==  rest first
        third   ==  rest rest first

The operators uncons and unswons undo what is done by cons and swons. Often it is useful to dissect not just one aggregate into its first and rest, but to dissect two aggregates. This can be done by uncons2 and unswons2, defined as follows:

        uncons2   ==  [uncons ] dip uncons  swapd
        unswons2  ==  [unswons] dip unswons swapd

Both expect two aggregates on top of the stack, both leave two firsts and two rests on the stack. For uncons2 the two firsts are items 3 and 4 on the stack, the two rests are items 1 and 2. For unswons2 it is the other way around.

Similarly, it is sometimes necessary to test whether at least one of two aggregates is empty, or whether at least one of two numeric values is equal to zero. For a single parameter this is done by null, for two parameters it is done by null2, defined in either of two equivalent ways:

        null2  ==  [null] [true] [pop null] ifte
        null2  ==  [ [[null] true] [pop null] ]  cond

Strings and lists are special kinds of aggregates, they are sequences. Sometimes it is necessary to reverse sequences. The naive way of doing this is recursively as follows:

          To reverse a sequence S:
                 If    S is empty
                       then return the empty sequence
                       else    remove the first element
                               reverse the rest of S
                               append the first element of S at the tail

It is easy to see that this is very inefficient because the append operation requires a lot of copying, and every element to be appended requires portions of the rest to be copied again and again. A well-known optimisation uses an extra parameter, an accumulating parameter, to obtain the same effect. The idea is to prepend the elements of the original list onto the accumulating parameter. Sometimes this is expressed by analogy with railways. The shunt operator takes two sequences as parameters and, starting at the front of the topmost parameter, moves all items onto the front of the second parameter. Joy has a combinator step for stepping through all items of its top parameter, and for each item executing a program that is given as a further parameter. That program has to take the item and add it to the accumulating parameter, so it is the swons operator. So this is how shunt can be defined:

        shunt  ==  [swons] step

To reverse a list or a string, an empty list or empty string has to be supplied as an accumulating parameter just below the list or string that is to be reversed. So here are definitions for reverselist and reversestring:

        reverselist    ==  [] swap shunt
        reversestring  ==  "" swap shunt

But there is something unsatisfactory about this, the reversal operation should be polymorphic. So the following version of reverse first tests whether the sequence to be reversed is a list or not, and inserts the appropriate accumulating parameter. The testing is done by the iflist combinator which takes two (here rather tiny) programs as parameters and below that one other item, the list or string to be reversed. If the latter happens to be a list, then the first quoted program is executed, and it will push an empty list. Otherwise the second program is executed, and it will push an empty string. In either case the two top items are now swapped and then shunted:

        reverse  ==  [[]] [""] iflist swap shunt

It comes as a surprise that lists can be reversed in another way. The idea is this: When a list is executed by a combinator, all the members of the list will be literals, so they will each be pushed onto the stack. The last element of the list will end up on top of the stack. So the elements of the list will then be in reverse order. To make use of this idea we have to arrange that an initially empty list is treated as a stack. This is what the infra combinator does. It takes a list as one parameter and a program as the second. It uses the list as a temporary stack and executes the program. The resultant stack is then pushed as a list. For the reversal problem the program is the list to be reversed, and the other parameter has to be the empty list. That empty list first has to be inserted below the list to be reversed. So another program to reverse a list is this:

        reverselist  ==  [] swap infra

What makes this version possible is that in Joy the principle that program = data is extended to program = data = memory. This version is actually more efficient than the one given earlier. Of course it cannot be adapted for reversing strings.

The two principal operators for explicit output are put, which prints a single value of any type, and putch, which prints a single stand alone character without quote symbol. Two useful little utility operators are worth defining. The putchars operator uses the step combinator to step through the characters in a list or string and writes them to the output file without the enclosing [] or "". The newline operator just outputs the newline character \n to terminate a line:

        putchars  ==  [putch] step
        newline   ==  '\n put

Using the step combinator it is easy to define several conversion operators which can be useful. The first two produce sets from aggregates of upper or lower case aggregates. The last two produce strings of upper or lower case characters from aggregates of small numbers:

        upper2set == {} swap [64 - swons] step
        lower2set == {} swap [96 - swons] step
        set2upper == "" swap [64 + swons] step
        set2lower == "" swap [96 + swons] step

The dip combinator expects a quotation and below that an item that will be saved before execution of the quotation and restored afterwards. Sometimes one wants to save and restore two or even three items, so it is useful to have further variants dip2 and dip3, defined as follows:

        dip2  ==  [dip] cons dip
        dip3  ==  [dip] cons dip2

Note that cons is being used to build a constructed program that is then supplied as a parameter to the last combinator.

Stacks and queues

This section and the next two contain implementations of two simple data types. Members of the stack type are linear structures which allow read and write access at one end only, and members of the queue type are linear structures which allow read access at one end and write access at the other end. Both have this much in common: the stack or queue remains on the Joy stack, and any stack or queue operations using it leave it there. Of course it can be explicitly removed with pop.

1. First, the stack type. A stack is a linear structure that can grow by having items added, inspected and removed all from the same end. The simplest way to implement stacks in Joy is as lists. One essential operator is st-new for the creation of a new empty stack, which is just an empty list. So the definition is:

        st-new  ==  []

Stacks can have additional items pushed onto them, and this is done by adding them to the front of the list. Since the stack is already there, the new item will typically be first pushed onto the Joy stack, and then it is to be pushed onto the stack. One way to do this is to swap the item and the stack and then perform a cons operation. But Joy has an operation which combines these two, namely swons. So the st-push operation can be defined by:

        st-push  ==  swons

An essential predicate is st-null for testing whether a stack is empty or null. However it will not do to just use null, since this will remove the stack – but typically the stack is intended for further applications. So, to avoid losing the stack, it has to be duplicated first, and then the null test can be applied to the duplicate:

        st-null  ==  dup null

The previous two operations both make sense when the stack is empty, but this is not the case for the stack operations to follow. The first of these is st-top for extracting the top element of the stack, while leaving the stack itself unchanged. The second is st-pop for removing the top element. The third is st-pull which combines the last two, it is the opposite of push. It extracts the top item and pops the stack. Ignoring the complication of an empty stack, the definitions would simply be:

        st-top   ==  dup first
        st-pop   ==  rest
        st-pull  ==  unswons

To guard against an empty stack, a test has to be performed to determine whether the stack is empty. If it is, then an error message should be given, otherwise the operation should be performed. So for all three operations the structure will be of the form:

        ==   [null]  [ERROR-MESSAGE]  [PERFORM OPERATION]  ifte

The error messages should state which of the operations was being attempted, but otherwise they should be the same. So the name of the operation is given as a string parameter to an error handling operation. That particular operation will be called _st-error, and we leave the details of its implementation till a little later. The leading underscore _ in the name has been added because this operation is not intended to be used by the programmer; in the current implementation the help command hides identifiers with a leading underscore. The remaining stack operations are then:

        st-top  == [null] ["st-top"  _st-error] [dup first] ifte
        st-pop  == [null] ["st-pop"  _st-error] [rest     ] ifte
        st-pull == [null] ["st-pull" _st-error] [unswons  ] ifte

As may be seen, the three operations still have a lot in common, and one might consider extracting that further. However, the result is likely to be less clear to the human reader. It remains to implement the error operation. It should state that an error has occurred due to an empty stack, and this part is the same for all three operations. It should also state which of the operations failed. So a minimal implementation of _st-error would simply write one string which is common for any call, and another string which is the specific parameter. This is crude but very easy to implement:

        _st-error  ==  "non_empty stack needed for " put put

A minor improvement is to concatenate the two strings (in the right order), so that only one string has to be written. But the double quotation marks in the output still look silly. So instead of writing the one or two strings with put, it looks nicer to write their constituent characters with putchars. Also, the line should be terminated with a newline. Finally, there is little sense in continuing the computation, so after the two parts of the error message have been displayed, it is best to abort, to return to the top level.

In a library implementation for the collection of definitions of stack operations might look like this:

LIBRA (* stack *)

_st-error == "non-empty stack needed for " putchars putchars newline abort;
st-new  == [];
st-push == swons;
st-null == dup null;
st-top  == [null] ["st-top"  _st-error] [dup first] ifte;
st-pop  == [null] ["st-pop"  _st-error] [rest     ] ifte;
st-pull == [null] ["st-pull" _st-error] [unswons  ] ifte.

As may be seen from the example, a library declaration begins with the word LIBRA and terminates with a period. In between is a sequence of definitions separated by semicolons ;. A definition consists of an atomic symbol and then the symbol == followed by a Joy program. Note again that the _st-error operator is not really intended to be used outside the remaining definitions. It could well be hidden completely from outside. A mechanism for this will be illustrated below.

2. Next, the queue type. A queue is a linear structure that can grow by having items added at one end, and inspected or removed from the other end. A simple minded implementation would consist of a single Joy list to which items are added at the end and removed from the front. But adding something at the end requires copying the entire list every time. Nothing would be gained by reversing the role of front and end, because in that case the removal requires copying of (almost) the entire list. A better implementation uses two lists. Conceptually one is the front of the queue, and items are removed at the front. Conceptually the other list is the back of the queue, but in reverse, and items are added to the front of this list. If at any time the list implementing the front of the queue becomes empty, the other list gets explicitly reversed and it becomes the front, and the empty list becomes the rear. There are two auxiliary operators that need only be visible to the remaining operators but not to the outside; in the following they are hidden in the private part of this module. Because they are hidden, there is no need to choose names which indicate the datatype on which they operate:

LIBRA (* queue *)

HIDE
  error   == "non_empty queue needed for " putchars putchars newline abort;
  prepare == [null] [swap reverse] [] ifte
IN
  q-new   == [] [];
  q-add   == swap [swons] dip;
  q-addl  == swap [shunt] dip;
  q-null  == prepare dup null;
  q-front == prepare [null] ["q-front" error] [dup first] ifte;
  q-rem   == prepare [null] ["q-rem "  error] [unswons  ] ifte
END.

As may be seen, such a declaration consists of the word HIDE, followed by a sequence of definitions, then the word IN followed by another sequence of definitions, and then the word END. A sequence of definitions is again separated by semicolons ;. The whole declaration can occur inside a library declaration where a single definition can occur. Any hiding declaration can occur wherever a single definition can occur, so they can be nested.

The first auxiliary error reporting procedure, error, is similar to the one for stacks. The second auxiliary operation, prepare, prepares the two lists: if the list implementing the front happens to be empty, the roles of the two lists are interchanged. If the front list is not empty, nothing is done.

A new queue is created by q-new in the form of two empty lists. An item can be added (to the “rear of the queue”) by q-add which adds it to the front of the second list. The members of a whole list can be added to the rear by q-addl. The operator q-null first prepares the two lists so that the list implementing the front is not empty, if that is possible at all. It then tests the front list. The operator q-front and q-rem extract respectively a copy of the front element or the front element itself. The copy or the original are left above the two lists. Both operators require the queue to be prepared so that the list implementing the front is not empty. Also, both operators need to check whether the front really is non-empty. If it is not, the error operator is called.

The definitions for stacks and queues are part of the library file TYPLIB.JOY.

Lists of subaggregates

The aggregate types of Joy comprises sets of small numbers, or strings of characters, or lists of items of any kind. Much of this section deals with lists of aggregates constructed from a given aggregate. The principal tool is the linrec combinator. It expects four program parameters on the stack, an if-part, a then-part, a rec1-part and a rec2-part. Execution begins by saving the four parts and then executing the if-part. If that produces the truth value true on top of the stack, the then-part is executed and the combinator exits. Otherwise the rec1-part is executed, then the combinator calls itself with the same four parts, and then the rec2-part is executed.

The first definition below is for an operator restlist which takes any aggregate as parameter and produces the list of all those subaggregates that would be formed by repeatedly taking the rests of the aggregate. Such an operator can of course be defined recursively and this could be done in any language. But in Joy it is possible to use a non-recursive definition using the linrec combinator. Here is some pseudocode:

1.      If    the aggregate is empty
2.            then  form its unitlist
3.            else    take a copy and the rest of that,
                      recurse using this rest,
                      eventually forming a list of aggregates,
4.                    use cons to add the original aggregate
                      to the front of this list

The above pseudocode translates directly into a recursive form in any language, but in Joy a non-recursive definition is also possible. The four program parameters for the linrec combinator correspond exactly to the labelled lines. Nothing corresponds to the unlabelled line, the linrec combinator recurses here automatically:

    restlist  ==
1.        [ null ]
2.        [ unitlist ]
3.        [ dup rest ]
4.        [ cons ]
          linrec

The next program also takes an aggregate as parameter and produces a list of subaggregates. But the subaggregates are those obtained by successively deleting the last elements. In analogy with the previous operator it will be called the frontlist operator. For empty aggregate parameters again the unitlist has to be returned, so the if-part and the then-part are the same as before. Also, for non-empty aggregates the aggregate has to be taken apart in the rec1-part. This can be done in two ways. We can take the front aggregate and the last element, but that would require defining a suitable operator, and it would require expensive copying in the case of list or string aggregates. Alternatively we can just uncons. This leaves only the rec2-part to be written. But it will be more complicated than for the previous operator. Let us ignore for the moment that the operator is intended to be used for aggregates of any of the three types. When the anonymous recursion has completed, the stack will contain the first item of the non-empty aggregate and above that the frontlist of its rest. The first item has to be consed into each member of the frontlist, and that is best done by:

        [cons]  map

Then that first item, which is now still the second element on the stack, has to be deleted. This can be done by a variant of pop, namely popd. Finally, assuming that the operator is to be used for lists, the empty list has to be added to the frontlist, and the easiest way is by [] swap cons, or simply by [] swons. This gives the following provisional rec2-part:

        [ [cons] map   popd   [] swons ]

But the assumption that the operator is to be used only for lists is unnecessarily restrictive. The final part, adding an empty aggregate, should depend on what the initial aggregate was, a set, a string or a list. This can be achieved by looking up the first element of the frontlist, it is a one element aggregate and taking its rest produces the required empty aggregate of that type. So the required rec2-part is:

        [ [cons] map   popd   dup first rest swons ]

The entire definition for frontlist, applicable to any aggregate, now is:

    frontlist ==
        [ null ]
        [ unitlist ]
        [ uncons ]
        [ [cons] map popd dup first rest swons ]
        linrec

The next program defines an operator subseqlist which is in some ways a combination of the preceding ones. Again it takes any aggregate as parameter and returns a list of subaggregates. This time the subaggregates are all those obtainable from the parameter aggregate by deleting first or last elements. For ordered aggregates, lists and strings, the resulting subaggregates will still contain elements in the same order as the parameter. It is tempting to define the operator very simply by:

        ==   frontlist  [restlist]  map

But this produces not a list of subsequences but a list of list of subsequences. This list of lists could then be flattened to a single list, even if this is somewhat inefficient. However, a different solution is possible.

The if-part and the then-part are as for restlist and frontlist, of course. The rec2-part is easy, it is only necessary to concatenate two lists that were produced by the rec1-part. But the rec1-part is rather complex, and this is what it has to do: the first element of the aggregate has to be extracted and later it has to be consed into every subaggregate of the frontlist of the rest of the aggregate. But also the rest of the aggregate has to be made available for the linrec combinator to work on. So the rec1-part must uncons the aggregate, and produce a second copy of the rest. The second copy has to be kept aside by using the dip combinator to work on the original copy. So an intermediate draft of the rec1-part looks like this:

        [ uncons  dup
          [ ... ]
          dip ]

The [...] program must take the frontlist (of the original copy of the rest) and then cons the first element into each of the members of the result. We already know how to do that, and how to delete the hidden first member. So the rec2-part is the following:

        [ uncons  dup
          [ frontlist  [cons] map  popd ]
          dip ]

The entire program now is this:

    subseqlist ==
        [ null ]
        [ unitlist ]
        [ uncons dup [frontlist [cons] map popd] dip ]
        [ concat ]
        linrec

The program uses frontlist, but because the latter is defined without recursion, it is possible to simply use the RHS of the definition of frontlist and insert that textually into the definition of subseqlist. The frontlist and subseqlist operators were adapted from recursive programs in {Thompson91 p 247}.

The next program defines a unary operator powerlist which for any aggregate returns a list of all subaggregates. If the parameter aggregate has N members, the resulting list of subaggregates has members:

    powerlist ==
        [ null ]
        [ unitlist ]
        [ uncons ]
        [ dup swapd [cons] map popd swoncat ]
        linrec

It uses the linrec combinator and the same if-part and then-part as the previous programs. It also uses the same rec1-part as the frontlist program: before recursing, the parameter is split into its first and its rest by uncons. The recursion then produces the powerlist of the rest. The rec2-part then uses that result to produce two copies, using dup. One copy is left untouched, the other has the original first element consed before every sublist using map. For this to work, the original first element has to be placed in the right position by swapd and eventually deleted by popd. The two resultant lists are finally combined with swoncat. This produces the list of subaggregates starting with the empty one; if concat is used instead of swoncat, the list ends with the empty one. Neither method yields the subsequences sorted according to size; but see the later section on sorted sequences.

The next program defines a binary operator insertlist. This operator expects a list or a string as a parameter and below that a potential list member or a character. It returns a list whose elements are either lists or strings, each with the potential member or character inserted once at all possible positions. So if the original list or string has N members, the resultant list has N+1 lists or strings as members:

    insertlist ==       (*   Item  Sequence   ->   List(Sequence) *)
        cons
        [ small ]
        [ unitlist ]
        [ dup                           (* keep original *)
          unswons [uncons] dip swons ]  (* take out second *)
        [ swap [swons] cons map         (* swons in second *)
          cons ]                        (* cons in original *)
        linrec

The operator can also be used, with a set parameter instead of a string or list. Then it produces a list of N+1 identical sets, each with the new member added. But such a use will rarely be wanted.

The permlist operator expects a sequence and returns the list of all permutations of that sequence. So if the original sequence has N members, the result list has factorial(N) members:

    permlist ==
        [ small ]
        [ unitlist ]
        [ uncons ]
        [ swap [swap insertlist] cons map
          flatten ]
        linrec

Note again that in the two preceding programs the map combinator uses a constructed program.

The zip operator expects two aggregate parameters, not necessarily of the same type, and not necessarily of the same size. It produces a list of two element lists by combining corresponding elements from the two aggregates. The result list contains as many pairs as the smaller of the two parameter aggregates. Here is the definition:

    zip ==
        [ null2 ]
        [ pop pop [] ]
        [ uncons2 ]
        [ [pairlist] dip cons ]
        linrec

This might be paraphrased as: If one or the other of the two parameter aggregates is null, then pop them both and return the empty list [], otherwise take out the two first elements and the two rests, recurse with the two rests producing a result list of two element lists of that, then dip below that result list to combine the two previous firsts with pairlist to form a two element list, and cons that into the front of the result list.

Related to this is the more general zipwith combinator, adapted from {Bird and Wadler 1988 p 57}. It takes three parameters, two aggregates and one quotation which can be used to combine members of the aggregates. The program again uses the linrec combinator which needs four quoted programs as parameters. The fourth quotation now has to be a constructed program, it is built from the program parameter of zipwith and the program stub [dip cons] already seen for zip. The other three program parameters for linrec first have to be dipped below the parameter of zipwith. The definition for this combinator thus is:

    zipwith ==
1.       [ [ null2 ]
2.         [ pop pop [] ]
3.         [ uncons2 ] ]
         dip
4.       [ dip cons ] cons
         linrec

A list of sequences can be concatenated into a single sequence by the unary operator flatten. The code is straightforward: if the parameter list is empty, then there is nothing to concatenate, leave it as it is. Otherwise use uncons to take out its first and its rest, recurse anonymously on the rest to produce the flattened result of that, and finally concatenate the saved first part to the front of the last result:

    flatten  ==  [null]  []  [uncons]  [concat] linrec

A two dimensional matrix can be implemented as a list of lists. One important matrix operation is the interchange of rows and columns, performed by the unary operator transpose. A draft of the program is the following:

          To transpose a list of lists LL :
1                If    LL is empty or some sublist of LL is empty
2                      then     pop  it off and return the empty list  [] 
                       else    (all sublists are non-empty)
3a                             construct a list of all the  first s of sublists
3b                             construct a list of all the  rest s of sublists
                               recurse anonymously on the list of  rests
4                               cons  the list of  firsts into the result.

This version has been adapted from {Reade 1989 p 133}. To test the disjunction whether LL is empty or some of its sublists are empty, we use the conditional combinator ifte. The if-part has to test whether LL is empty, and if it is, then the then-part has to return true. Otherwise the then-part will have to determine whether some sublists of LL are empty. This is best done with the combinator some. So part 1 of the Joy version is:

1        [null]  [true]  [[null] some]  ifte

For parts 3a and 3b it is necessary to use the parameter LL to produce two lists, of the firsts and the rests. Either of these two can be obtained with the map combinator. To obtain the two lists, the cleave combinator can be used, it takes two quotation parameters and a further parameter, and produces two values, one from each of the two quotations. So parts 3a and 3b are just:

3a      [ [first] map ]
3b      [ [rest ] map ]
        cleave

The entire Joy program thus is:

    transpose ==
1       [ [null] [true] [[null] some] ifte ]
2       [ pop [] ]
3       [ [[first] map] [[rest] map] cleave ]
4       [ cons ]
        linrec;

Alternatively, line 1 can be replaced by the following:

1        [ [null] [[null] some] disjoin i ]

Here the two tests are first disjoined to form a single test predicate. This is then called by the i combinator. The net effect is exactly the same as in the version given earlier.

A common binary operation on aggregates is that of forming the Cartesian product. It will take two aggregates as parameters and produce the list of two element lists which each contain one element from each of the two aggregates. If the two aggregates have M and N members respectively, then the resultant list has M × N elements. In order to form the Cartesian product, it is necessary to consider each of the members of one aggregate with each of the members of the other aggregate. This is like stepping through an aggregate, except that there are two aggregates to be stepped through. It will be useful first to define a combinator step2 which does just that, leaving at each step two items on top of the stack. Then the Cartesian product operator will just form their pairs, and then form a list of all these pairs. An application of the step2 combinator will look like this:

        A1   A2   [P]   step2

The implementation is based on the simple idea that A2 and [P] be used to construct a program which is then used by the ordinary step combinator to step through the elements of A1:

        A1   [ .. A2 .. [P] .. ]   step

We now fill in the dots. The program to be constructed has to step through the members of A2 using a program which depends on [P]. It has to do this for each member of A1, and when it has done that the current member of A1 can be popped off. So the program will have to look like this:

        A1   [ A2  [ .. P ]  step  pop ]   step

Since P must be allowed to consume a current member of A1 but this still has to be available for the next inner step, that member of A1 first has to be duplicated, below the current member of A2. So the dots are just [dup] dip. In sum, the required program is:

        A1   [ A2  [ [dup]  dip  P ]  step  pop ]   step

The definition of step2 must construct that program from A2 and [P] and then call step with just two parameters, the program just constructed and below that the other aggregate A1. The definition looks like this:

    step2  ==
        [[dup] dip]  swoncat            (* form inner quote *)
        [step pop]  cons                (* form outer quote *)
        cons                            (* insert A2 *)
        step                            (* through A1 *)

It is now relatively easy to define the Cartesian product operator as follows. First we need to insert an accumulating parameter, an empty list []. It has to be inserted below the two aggregates of which the Cartesian product is to be computed. This is easily done with [] rollup. The program which is then used by the step2 combinator has to form the pairlist of the two items on top of the stack. The resultant pair has to be inserted into the accumulator with swons. But between the accumulator and the just formed pair is the current original member of the first aggregate which must be left intact. So the pair and the member have to be swapped, and the swons has to be done below the member, by dip. This is the program that is given as the parameter for step2. So the definition for the cartproduct operator is:

    cartproduct  ==
        [] rollup
        [pairlist swap [swons] dip]
        step2

The program works for aggregates of any type, and the two aggregates do not have to be of the same type. If both are lists of numbers or sets of small numbers, other variations are possible: By changing the pairing operator pairlist to, say, multiplication *, we obtain a program which produces a list of all products. By further changing the accumulator [] to the number 0 and the insertion operation swons to, say, addition +, we obtain a program which produces the sum of all M × N products. For two numeric aggregates of the same size, say N, another binary operator can be defined, the scalarproduct:

        scalarproduct  ==
            0 rollup                             (* accumulator *)
            [ null2 ]
            [ popd ]
            [ uncons2 [* +] dip2 ]
            tailrec

It produces the sum of all N products of pairs taken from corresponding positions in the two aggregates.

Arithmetic operators

This section gives efficient implementation of several well-known functions which are often used in the literature for demonstration purposes: the factorial, the Fibonacci-function, exponentiation and the greatest common divisor. All of them are often defined recursively, and of course they can be defined recursively in Joy. Using one of several suitable recursion combinators they can be computed recursively in Joy without a recursive definition. But recursive execution in any language can be inefficient.

There are well known techniques for removing linear recursion, see for example {Bauer and Wössner 1982 Chapter 4}. The same topic is discussed in {Henson 1987 Chapter 4} using the useful concept of eureka definitions due to Burstall and Darlington. These involve creative steps in the production of more efficient versions of programs, and hence would be difficult to perform by an optimising program.

Several of the functions to be defined require a little program to be executed a number of times. A useful combinator for this is times. It requires the program to be repeated as the top element of the stack and the required number of repetitions to be the second element on the stack.

The factorial of a number n is simply the product of n factors from 1 to n. To compute it using times, a small program has to be pushed on top of the number n which is the parameter. The number itself will be consumed by times. The program works on two other numbers on the stack. One of these is the accumulating parameter, it has to start at 1. The other is the next factor to be used by the program with which to multiply the accumulator. The multiplication has to be done without losing the factor, so it has to be duplicated first. Apart from doing the multiplication, the program also has to increment the factor using the successor operator succ. The program which is the parameter to times thus is:

        [dup  [*]  dip  succ]

Before the times combinator can get to work on the parameter n and the quoted program, the accumulator and the first factor have to be placed in position, below the parameter n. Both begin with the value 1, so the rolldown operator can be used to push these two values below n. Finally, after times has completed, the stack will contain the required accumulator value but also on top of that the next factor. The latter is simply popped off. The entire definition of fact is:

        fact == 1 1 rolldown [dup [*] dip succ] times pop

The Fibonacci function can be computed in a similar way. Again there is a certain computation that has to be repeated a number of times as given by the parameter n. Again the computation involves two further numbers, the larger one is to be replaced by their sum, and the smaller one is to be replaced by the former larger one. Adding the two must not destroy the original larger number, so again it has to be duplicated. The addition is then performed under the control of dip. Then the two numbers are swapped. This describes the little program that serves as the parameter to the times combinator. Before it can start, the two initial values 0 and 1 have to be placed below the parameter n with rolldown. When it has completed, the required accumulated sum is the second element, and the top element, the useless next summand, is popped. So this is the definition of fib:

        fib  ==  0 1 rolldown [dup [+] dip swap] times pop

This version of the Fibonacci function requires a computation time which is a linear function of the parameter n.

The recursive version of the Fibonacci function requires quadratic computation time. Since the result values are not very large, it is often used as a test program. What is of interest is the number of recursive calls made during the computation, to be divided by the total time it took. To obtain the number of recursive calls it is often convenient to use a variant of the Fibonacci function, sometimes called nfib. It has the property that the value returned is the same as the number of calls made during recursive execution. The following are recursive definitions of Fibonacci and its variant:

        r-fib  ==  [small] []      [pred dup pred [r-fib ] unary2 +     ] ifte  (* app2 *)
        r-nfib ==  [small] [pop 1] [pred dup pred [r-nfib] unary2 + succ] ifte  (* app2 *)

These are recursive definitions which of course are intended to run in quadratic time. The following is a definition of nfib which uses accumulators to run in linear time. Of course it does not measure its own runtime, it is included here to illustrate a programming technique:

        nfib  ==  1 1 rolldown [dup [+ succ] dip swap] times pop

The next two programs are for binary operators which compute functions of two parameters: the exponentiation function and the greatest common divisor. Exponentiation can be computed by performing a certain operation as many times as given by the exponent. This description again suggests using the times combinator to execute a quoted program several times. The operation to be repeated consists in multiplying an accumulator by the base which is the second parameter. So it is necessary to construct a little program from the base which for every call will multiply by the base. Assuming that the base is in the right position on the stack, the program is easily constructed, by:

        [*] cons

Before the constructed program can be handed to times, the initial value 1 has to be placed as an accumulator below the two numbers which are the two parameters, the base and the exponent. This is done by 1 rollup. To get the two parameters into the order appropriate for cons it is necessary to perform a rotate first. So here is the exp operator:

        exp  ==  1 rotate [*] cons times

The technique of first constructing a program (here by cons) and then supplying it to a combinator (here times) is very useful in Joy.

The next program computes the greatest common divisor of two numbers, using Euclid’s algorithm. The algorithm uses two numbers and repeatedly takes the remainder after dividing one by the other. The remainder obtained is then used to replace the dividend. The process is repeated as long as the potential divisor is positive. So, unlike the previous programs, we cannot use the times combinator. Instead a combinator called while is used which resembles while-loops in imperative languages. It takes two parameters: the while-part is a quoted program which must return a truth value, and the do-part is a quoted program which can compute anything. The while-part in the following gcd program is of course very similar to a corresponding part in the fib program:

    gcd  ==  [0 >] [dup [rem] dip swap] while pop

Two other arithmetic functions that are sometimes useful are for computing the sum or the product of a set or a list of numbers. Both are best implemented by stepping through all members of the set or list, doing additions or multiplications with an accumulator every time. The initial accumulator value, 0 or 1, is first pushed onto the stack below the parameter set or list. For comparison, the third line below gives a definition of the size operator which is applicable to any aggregate. The fourth line below gives a definition of a similar operator size2 for determining the total number of elements in a list of aggregates. If these aggregates are themselves lists, then their members are counted but not the members of their sublists:

    sum     ==  0  swap  [+       ]  step
    product ==  1  swap  [*       ]  step
    size    ==  0  swap  [pop succ]  step
    size2   ==  0  swap  [size  + ]  step

A generalisation of size2 for counting the leaves in recursive lists or trees is treesize, defined later.

Sorted sequences

The sequence types of Joy are the string type and the list type. Values of these types can be ordered. Strings contain just characters, but lists may contain anything. So for lists it only makes sense to speak of ordering if the elements are characters or integers or something else that has an ordering defined on it.

An informal description of the quicksort algorithm is this:

To sort a sequence S :
1       If    S is small (has only 0 or 1 element)
2             then it is sorted already, leave it alone
              else    (S has at least one element)
3                     using     the first of S
                                as a "pivot" for comparison,
                                split the rest of S into two portions -
                                those that are less than the pivot
                                and those that are not
                      separately sort both portions P1 and P2
4                     concatenate    the now sorted lesser portion,
                                     the pivot, and
                                     the sorted other portion.

The following is a definition of an operator qsort which uses the above algorithm. But instead of using explicit binary recursion it uses the binrec combinator. This is like the linrec combinator except that it recurses twice, once each on the top two elements of the stack. The recursions again occur between the rec1-part and the rec2-part. The program also uses another combinator split which takes as parameter an aggregate and above that a quoted program which must return a truth value. The split combinator returns two aggregates, containing those elements for which the test yields false and those for which it yields true. The split combinator has access to the remainder of the stack which in this case contains the pivot. So the test determines whether the pivot is > than the element being examined:

    qsort  ==
1       [ small ]
2       [ ]
3       [ uncons [>] split ]
4       [ swapd cons concat ]
        binrec

Sometimes it is required to sort a list of aggregates on the basis of their first elements. In that case it is necessary to supply to the comparison operator > not the pivot and the element to be apportioned by split, but their first elements instead. This is conveniently done by the app2unary2 combinator which applies a quoted program to two elements on top of the stack and replaces them by whatever values the programs return:

    qsort1  ==
1       [ small ]
2       [ ]
3       [ uncons [[first] unary2 > ] split ]    (* app2 *)
4       [ swapd cons concat ]
        binrec

Note that in part 3 when the first element of the pivot has to be compared with the first element of the aggregate to be apportioned, the first element of the pivot is being extracted every time. It would perhaps be more efficient if the first element of the pivot is extracted just once, as soon as the pivot is available. In that case it is necessary to take the pivot apart with unswons, but this has to be done by dipping below the rest of the list still to be sorted. Then the quotation parameter to split just needs to take out the first of the current aggregate and compare it with the first of the pivot. After split has done its job, the pivot has to be re-assembled by swons, but this now has to be done below the two portions with dip2. So part 3 can be replaced by:

3       [ uncons [unswons] dip [first >] split [swons] dip2 ]

Sometimes it might be necessary to sort a list of items on the basis not of their first element but on the size or their second or third element or even the size of the second of the third element. For the last example it would only be necessary to use [third second size] instead of [first] in the qsort1 program. But it would be impossible to anticipate all alternative sorting bases for a library, and it would be awkward to have to write the appropriate sorting program on every special occasion. It is possible to write a general quicksort program which takes as an additional parameter something like [first] or [third second size]. The mk-qsort combinator does just that:

    mk_qsort ==
        [ [small] [] ] dip
        [ unary2 >] cons [split] cons [uncons] swoncat  (* app2 *)
        [ swapd cons concat ]
        binrec

It begins in line 1 by inserting the standard if-part and then-part below its parameters. In line 2 it uses the parameter to build a constructed program, the required rec1-part. Then in line 3 it pushes the standard rec2-part. At this point the top five elements of the stack are the list to be sorted and above that the four program parts needed for binrec. The latter now executes. For example the program:

        [third second size]  mk-qsort

will sort a list of lists of three or more elements whose third member are aggregates of two or more elements. It will sort according to the size of the second of the third element.

The binary operator insert takes a sorted sequence and a potential new member as parameters, it returns a new sequence with the additional member inserted in the appropriate position. Here is a draft program:

To insert an item into a sorted sequence :
1       If    the sequence is empty or
               its first element is >= than the item
2             then add the new item in the front of the sequence
3             else    set aside the first item of the sequence
                      recurse with the rest of the sequence and the new item
4                     add the previously set aside first item to the front

The disjunction in line 1 is best handled by the disjoin operator on programs. It expects two quoted programs which return a truth value, and it returns a single quoted program which computes their disjunction. So line 1 consists of two quoted programs one of them tests whether the sequence is empty, the other tests whether its first element is >= than the item to be inserted. The disjoin operator then produces their disjunction. The resulting program is the if-part for the linrec combinator. The other three parts are now quite obvious. So the definition in Joy is:

    insert ==
1       [ pop null ]  [ [first] dip ]  disjoin
2       [ swons ]
3       [ [uncons] dip ]
4       [ cons ]
        linrec

Two sorted sequences can be merged into a single sequence which respects the original ordering. Here is a very informal algorithm for a recursive version:

To merge two sorted sequences : 
        If    the first sequence is empty,
              then throw it away and return the second sequence.
        If the second sequence is empty,
              then throw it away and return the first sequence.
        (Both sequences are non-empty, so both have a first element:)
        If the first of the first sequence is less than the first of the second,
              then    set the lesser element aside,
                      recurse using the rest of the first sequence,
                      prepend the previously set aside element.
        If the first of the first sequence is greater than the first of the second,
              then set the lesser element aside,
                      recurse using the rest of the second sequence,
                      prepend the previously set aside element.
       (The two first elements of the sequences are equal:)
        Default  
              set both first elements aside,
              recurse using the rests of both sequences,
              prepend the two previously set aside elements.

Like just about all programming languages, Joy has an if-then-else construct (ifte) for two-way branching. Multiway branching can be achieved by nested iftes, but this can become difficult to read. Joy has another combinator for multi-way branching borrowed from Lisp. The combinator cond expects one parameter which is a list of cases. The last case is the default case, the other cases each consist of a condition or if-part and a program or then-part. The condition is a quoted program in front of the program. Execution of the cond combinator tests successive conditions, and for the first condition that yields true the associated program is executed. If none of the conditions is true, the default case is executed. The informal algorithm given earlier now translates into the following recursive definition of r-merge:

    r-merge  ==
        [ [ [null] pop]
          [ [pop null] swap pop]
          [ [unswons2 <] [uncons] dip r-merge cons]
          [ [unswons2 >] uncons swapd r-merge cons]
          [ uncons2 r-merge cons cons] ]
        cond

As may be seen from the earlier informal version and the above Joy version, for each case the program recurses at most once. Therefore the program has the pattern of linear recursion. However, because there are three cases in which recursion occurs, it is not possible to use the linrec combinator. However, Joy has a combinator condlinrec which has features of cond and linrec. The combinator condlinrec also expects one parameter which is a list of cases. Again the last case is the default case, and the other cases consist of a list of two or three quoted programs. If there are just two parts, then they are called the if-part and the then-part. Their meaning is as for cond. If there are three parts, then they are called the if-part, the rec1-part and the rec2-part. In that case linear recursion occurs between execution of the rec1-part and the rec2-part. The following is a non-recursive definition of merge:

    merge ==
        [ [ [null] [pop] ]
          [ [pop null] [swap pop] ]
          [ [unswons2 <] [[uncons] dip] [cons] ]
          [ [unswons2 >] [uncons swapd] [cons] ]
          [ [uncons2] [cons cons] ] ]
        condlinrec;

Sometimes it is necessary to merge two lists of aggregates on the basis of their first elements. In that case the comparisons < and > should not be applied to the elements of the sequences but to their first members. A simple solution is to replace the two comparisons respectively by the following two:

        [first] unary2 <              [first] unary2 >  (* app2 *)

So the definition of the merge1 operator could be:

    merge1 ==
        [ [ [null] [pop] ]
          [ [pop null] [swap pop] ]
          [ [unswons2 [first] unary2 <] [[uncons] dip] [cons] ]  (* app2 *)
          [ [unswons2 [first] unary2 >] [uncons swapd] [cons] ]  (* app2 *)
          [ [uncons2] [cons cons] ] ]
        condlinrec

The definition of merge (and especially merge1) could be optimised so that the unswons (and the first) is not done repeatedly for each comparison. As the definitions stand, they are easy to understand and work correctly.

Big sets and dictionaries

Computer words are short bit-sequences and a common size is 32. These can be used to implement small sets of small numbers 0..31, with a few common set operations implemented in hardware. Joy uses this in its set type. But often it is necessary to have either much larger sets or sets of larger elements. Such a big set type can be implemented in various ways: as unordered lists, as ordered lists, as unbalanced trees or as balanced trees. Each implementation method has its advantages and disadvantages. The following implementation of big sets in terms of ordered lists has been adapted from {Bird and Wadler 1988 p 230 ff}.

The empty set is represented as an empty list, in this library it is written as bs-new:

LIBRA (* big sets *)

bs-new == [];

One very important binary set operation is union. The two parameters are sorted lists, and the returned value also has to be a sorted list. It would appear that the two lists should be simply merged. But if they have an element in common, then the returned list would then contain the element twice. However, in sets any element should occur at most once. This consideration affects the default case, the last case of the program list which is the parameter. The case occurs when the first elements of the two parameter lists are equal. So in the definition of bs-union instead of saving and later restoring both, only one is saved and later restored:

bs-union ==
        [ [ [null] [pop] ]
          [ [pop null] [swap pop] ]
          [ [unswons2 <] [[uncons] dip] [cons] ]
          [ [unswons2 >] [uncons swapd] [cons] ]
          [ [rest [uncons] dip] [cons] ] ]
        condlinrec;

The same situation arises for inserting or adding a new member to a set. If the new member is already in the set, then it should not be inserted again. So if the first member of the current list is equal to the candidate new member, then the candidate is just popped off in the third line below. In the definition of bs-insert the only recursion occurs in the last, the default case:

bs-insert ==
        [ [ [pop null] [swons] ]
          [ [[first] dip >] [swons] ]
          [ [[first] dip =] [pop] ]
          [ [[uncons] dip] [cons] ] ]
        condlinrec;

The next operator tests for membership, so it must return a truth value. If the list is null or its first element is > than the candidate, then false is returned. If the first element is = to the candidate, then true is returned. In the default case, when the relation is <, the list has to be inspected recursively further down. So the case must contain two programs to effect the recursion. However, on the way back from the recursion, the last returned truth value is the one to be used. Hence no further action is required, and the second program is just []. This is the definition of bs-member:

bs-member ==
        [ [ [pop null] [pop2 false] ]
          [ [[first] dip >] [pop2 false] ]
          [ [[first] dip =] [pop2 true] ]
          [ [[rest] dip] [] ] ]
        condlinrec;

The same device is used in the default case of the definition of bs-differ for finding the difference between two sets. As may be seen, there are two further recursive cases, for < and >, and one of them uses the same device again:

bs-differ ==
        [ [ [null] [pop]]
          [ [pop null] [pop pop []] ]
          [ [unswons2 <] [[uncons] dip] [cons] ]
          [ [unswons2 >] [rest] [] ]
          [ [[rest] dip rest] [] ] ]
        condlinrec;

The next definition is for bs-delete, it deletes a specified member from a set, if it is a member at all. The only recursive case is the default case:

bs-delete ==
        [ [ [pop null] [pop] ]
          [ [[first] dip >] [pop] ]
          [ [[first] dip =] [pop rest] ]
          [ [[uncons] dip] [cons] ] ]
        condlinrec.

The operations of inserting or deleting members into or from a set are essentially special cases of taking unions or differences with unitsets. So the following definitions might have been given instead of the earlier, more efficient definitions:

    bs-insert  ==  unitlist bs-union;
    bs-delete  ==  unitlist bs-differ;

A dictionary is a way of implementing finite functions as argument-value pairs. A pair is best implemented in Joy as a two element list. The totality of pairs is then essentially a big set, and any of the ways of implementing these is suitable here. If the argument part of pairs is subject to an ordering relation, the sets of pairs can be implemented as lists ordered in accordance with the first element, the argument of the pairs. Not surprisingly then, some of the code to follow is reminiscent of code for qsort1 and merge1. The following is a library for the dictionary type. A new dictionary is created by d-new. A predicate d-null returns true or false according as the parameter dictionary is empty or not. New pairs are added by d-add, they are inserted in the correct place based on the ordering of the first member of the pairs. The union or difference of two dictionaries is given by the two binary operators d-union and d-differ. A single pair is removed by the binary operator d-rem, it removes the pair whose first member matches the given query parameter. Instead of a test for membership there is a binary operator d-look which extracts the first pair whose first element matches the query.

Only the program for one of the operators will be developed here, the program for d-union:

To form the union of two dictionaries D1 and D2:
1       If D2 is empty, pop it off and return just D1
2       If D1 is empty, retain D2, pop D1 and return D2
3       Extract    the first pairs from D1 and D2,
                   from both pairs compare their firsts with <
        If the comparison is true,
                   below D2, uncons D1 into its first
                   and rest
                   recurse anonymously on the rest of D1 and D2
                   cons the saved first pair from D1
                   into the result
4       Extract    the first pairs from D1 and D2,
                   from both pairs compare their firsts with >
        If the comparison is true,
                   uncons D2, put its first below D2
                   recurse anonymously on D1 and the rest of D2
                   cons the saved first pair from D2
                   into the result
        Default (the firsts of the first pairs of D1 and D2
                    are =):
                   uncons both D1 and D2 into their
                   first and rest,
                   recurse on the two rests to form their union,
                   cons the two saved firsts into the result.

In the default case both first pairs are retained, so that if one is deleted, the other one, which may well have a different second component, is still available.

As may be seen, the d-union operator is very similar to the bs-union operator. The other three operators d-differ, d-look and d-rem are similar to their counterparts for big sets. The entire library is the following:

LIBRA (* dictionary *)

d_new   == [];
d_null  == null;
d_add   ==
        [ [ [pop null] [swons] ]
          [ [[first] dip [first] unary2 >=] [swons] ]   (* app2 *)
          [ [[uncons] dip] [cons] ] ]
        condlinrec;
d_union ==
        [ [ [null] [pop] ]
          [ [pop null] [popd] ]
          [ [unswons2 [first] unary2 <] [[uncons] dip] [cons] ]  (* app2 *)
          [ [unswons2 [first] unary2 >] [uncons swapd] [cons] ]  (* app2 *)
          [ [uncons2] [cons cons] ] ]
        condlinrec;
d_differ ==
        [ [ [null] [pop]]
          [ [pop null] [pop pop []] ]
          [ [unswons2 [first] unary2 <] [[uncons] dip] [cons] ]  (* app2 *)
          [ [unswons2 [first] unary2 >] [rest] [] ]              (* app2 *)
          [ [[rest] dip rest] [] ] ]
        condlinrec;
d_look  == [dup] dip
        [ [ [pop null] [pop pop "not found"] ]
          [ [[first first] dip >] [pop pop "not found"] ]
          [ [[first first] dip =] [pop first] ]
          [ [[rest] dip] [] ] ]
        condlinrec;
d_rem   ==
        [ [ [pop null] [pop] ]
          [ [[first first] dip >] [pop] ]
          [ [[first first] dip =] [pop rest] ]
          [ [[uncons] dip] [cons] ] ]
        condlinrec.

The definitions of big sets and dictionaries are part of the library file TYPLIB.JOY.

Trees

Apart from the aggregate types it is useful to have another type, the tree type. These are lists which can contain lists as members which might contain lists as members and so on. Formally define a leaf to be anything which is not a list. Then a tree is defined to be either a leaf or a list of trees. Sometimes one needs the concept of a proper tree – this is just a list of trees. Trees are similar to other aggregates, but since the tree datatype is recursive, a special treatment is generally needed.

Just as there is the step combinator to step through the elements of an aggregate, so there is a treestep combinator to step through the leaves of a tree. For example, the following are the already familiar program for computing the sum of the numbers in an aggregate and a similar program for computing the sum of the numbers in a tree:

            sum  ==  0 swap [+]     step
        treesum  ==  0 swap [+] treestep

In the same way, the following are a familiar program and a new one for determining the size of an aggregate and the treesize of a tree:

            size  ==  0 swap [pop succ]     step
        treesize  ==  0 swap [pop succ] treestep

Similarly, the following are a familiar program and a new one for shunting members of an aggregate or a tree, respectively, into an initially empty list:

            shunt  ==  [swons]     step
        treeshunt  ==  [swons] treestep

For the binary operator treeshunt all leaves will appear in the result list, but in reverse order.

A tree may be flattened completely, losing its entire internal structure but retaining the order of the leaves by the unary operator treeflatten:

        treeflatten  ==  [] swap treeshunt reverse

From a given tree we can obtain the reverse list of its leaves by:

        []  swap  treeshunt

But this may not be what is wanted. To reverse the tree while retaining its structure it is necessary to reverse the top level list, reverse the second level lists, reverse the third level lists and so on. For tasks such as this Joy has a ternary combinator treegenrec for general recursion through trees. It is used like this:

        [O1]  [O2]  [C]  treerecgen

Here [O1] must be a program applicable to leaves, [O2] must be an operator applicable to lists, and [C] must be a combinator applicable to lists with operators such as [O2]. Different choices of the three quotation parameters yield surprisingly different operators for trees or combinators applicable to trees. Using this combinator the unary treereverse operator is defined by:

        treereverse  ==  [] [reverse] [map] treegenrec

The same treegenrec combinator can be used to define a unary combinator treemap which takes a tree and a quoted program as parameters and returns a tree of the same structure but with each leaf as modified by the program parameter:

        treemap  ==  [] [map] treegenrec

The same combinator can be used to define a unary combinator treefilter which expects a tree and a quoted predicate. What is returned is a tree of the same structure but with only those leaves which pass the test predicate:

        treefilter  ==  [] swap orlistfilter [map] treegenrec

The first portion, [] swap just inserts the required [O1] which in this case does nothing. Following that is a modification of the test predicate, to be explained presently. The rest of the definition is familiar:

        treefilter  ==  [] swap orlistfilter [map] treegenrec

The [O2] operator to be used here is constructed from the test predicate [P] by orlistfilter, which constructs:

        [ [ [list] [P] disjoin ]  filter ]

The orlistfilter is defined in two steps:

        orlist  ==  [list] swap disjoin
        orlistfilter  ==  orlist [filter] cons

An operator to remove all leaves from a tree, but retaining its list structure is treestrip, defined as follows:

        treestrip  ==  [list] treefilter

Trees cannot have lists as leaves, but otherwise they are very flexible. In particular they can be used as queues. The following is a small collection of operations for manipulating trees when the focus is only on their leaves. A new empty tree is generated by t-new. A new leaf or a whole tree of leaves is added to an existing tree by the operator t-add; it always ensures that the tree is of a form suitable for the remaining operators. The tree predicate t-null tests whether the tree is empty. It first has to prepare the tree by ensuring that it does not consist of lists of lists and so on which ultimately only contain the empty list. Since this is also required by two other operators, the preparing is done by a hidden unary operator. Two other operators t-front and t-rem produce, respectively, the first leaf together with the remainder of the tree, or just the remainder of the tree after removing the first leaf. Both operators first have to check that the tree is non-empty; if it is, then an error is reported. A leaf or proper tree can be turned into a suitable form by t-reset.

The implementation is as follows. A proper tree is always a list, and an empty tree starts off by t-new as an empty list. Anything can be added by t-add to an existing tree, and this has to ensure that the result has a suitable standard form. The same is true for t-reset which first makes a copy of an existing tree. The other operators, t-null, t-front and t-rem all require the tree to be in a suitable standard form. This is done by prepare which is defined using condlinrec. If the tree is null, it is left as it is. If the first is null, then the rest is taken and condlinrec recurses. If the first of the first is a list, then that is unswonsed, condlinrec recurses and on return does nothing further. In all other cases the tree is left as it is:

HIDE                                                    (* tree *)
    error        ==  "non-empty tree needed for" putchars putchars abort;
    prepare      ==  [ [ [null] [] ]
                     [ [first null] [rest] [] ]
                     [ [first first list] [[unswons] infra] [] ]
                     [ [] ] ]
                     condlinrec
IN
    t-new        ==  [];
    t-reset      ==  dup  unitlist unitlist;
    t-add        ==  unitlist unitlist cons;
    t-null       ==  prepare
                     dup null;
    t-front      ==  prepare
                     [null]
                     ["t-front\n" error]
                     [dup first first]
                     ifte;
    t-rem        ==  prepare
                     [null]
                     ["t-rem\n" error]
                     [unswons unswons [swons] dip]
                     ifte
END

The definitions of trees is part of the library file TYPLIB.JOY.