by Manfred von Thun
This paper describes a conditional term rewriting system for Joy based on the two constructors concatenation and quotation.
A rewriting system consists of a set of syntactic rules for
performing replacements on certain suitable entities. The best known
such system is the one we learnt at school for evaluating arithmetic
expressions. Any programming language can be given a rewriting system,
but for Joy it is particularly simple. The basic binary rewriting
relation will be written in infix notation as =>
,
pronounced “can be rewritten as”. The following are some sample rules
for the +
operator, the <
predicate and the
dip
combinator:
2 3 + => 5
2 3 < => true
a [P] dip => P a
In the last example, P
is any program and a
is any literal (such as a number) or a program whose net effect is to
push exactly one item onto the stack. The rewriting relation is extended
to allow rewriting in appropriate contexts, further extended to
accomodate several rewriting steps, and finally extended to become a
congruence relation, an equivalence relation compatible with program
concatenation. This congruence relation between programs is essentially
the same as the identity relation in the algebra of functions which the
programs denote. Although Joy functions take a stack as argument and
value, in the rewrite rules the stack is never mentioned.
The following are rewriting rules for arithmetic expressions in four different notations: infix, functional, prefix and postfix:
2 + 3 => 5 +(2,3) => 5
+ 2 3 => 5 2 3 + => 5
In each case on the left the operands are 2
and
3
, and the operator or constructor is
+
, so they all refer to the same arithmetic term. Since Joy
uses what looks like postfix notation, it might be thought that one
should attempt a term rewriting system with rules just like the second
one in the last line. That would treat the short program
2 3 +
as being composed of two operands and one operator or
constructor. It would also treat the gap between 2
and
3
as quite different from the gap between 3
and +
. The difference would be explained away as a
syntactic coincidence due to the choice of notation. Apart from
+
there would be very many term constructors.
However, Joy has operators for manipulating the top few elements of
the stack, such as swap
, dup
and
pop
. These are also found in the language Forth,
see for example {Salman et al
1984} and {Kelly et al 1986}.
These operators take a stack as argument and yield a stack as value, and
their presence forces all other operators to be of the same type. For
example, the following is a rewrite rule for swap:
a b swap => b a
Unlike Forth, Joy also has quotations and combinators. These features also force the conclusion that the appropriate rewriting system is a string rewriting system. Consider the following four programs:
[2] [3 +] b [2] [3 +] concat i
[2 3] [+] b [2 3] [+] concat i
They all eventually have to reduce to 5
, just like the
earlier Joy program 2 3 +
. It suggests that in the latter
the gaps have to be treated in the same way, the program is a
concatenation of three atomic symbols, and it denotes the composition of
three functions. So, at least for Joy programs without quotations and
combinators, the appropriate system is a string rewriting system. Such a
system is equivalent to a term rewriting system with a concatenation
constructor for programs as the only constructor. To handle
combinators, a quotation constructor has to be introduced as a
second constructor.
The remainder of this paper is organised as follows: The next section introduces rewriting systems in general. Then follows a section on the principal concepts of a rewriting system for Joy. The next two sections give details of rewriting rules for operators and for combinators. Two other sections re-examine the stack and the quotation constructor. It is argued that the stack is not just an optimisation useful in an implementation, but that it is almost essential for understanding the semantics of Joy. The possibility of an extensional version of the quotation constructor is discussed but dismissed as unnecessarily restrictive. A final section is an outline of a rewriting system for Joy types; the system resembles a categorial grammar.
Rewriting systems can be classified according to the entities that are being rewritten. In a string rewriting system those entities are linear sequences of symbols or strings of symbols. In a term rewriting system these entities are expressions or terms build from operands and operators. In a graph rewriting system they are graphs of various kinds.
A string rewriting system is based on an alphabet which is just a set of symbols. Strings over a given alphabet are arbitrary sequences of symbols, each taken from the alphabet. The empty sequence or null string is included. A rewriting rule is of the form:
x => y
where both x and y are strings. A string rewriting system consists of
an alphabet and a relation =>, a set of such pairs or rules. A wider
relation ==> is defined as follows: For strings w
,
x
, y
and z
:
w x z ==> w y z
if and only if x => y
. This relation allows
replacement of x
by y
in arbitrary contexts
w..z
.
A term rewriting system also requires an alphabet of
symbols. Each symbol has an associated arity which is a natural
number (0, 1, 2, …). Symbols of arity 0 are nullary symbols or operands,
symbols of positive arity are unary, binary, ternary and so on symbols
or operators. A term over such an alphabet is either an operand
or it is an operator of arity n
together with
n
further terms. Terms are really abstract syntax trees,
and various notations can be used for their concrete linear
representation. A rewriting rule is again a pair of the form:
x => y
where x
and y
now have to be terms. A term
rewriting system consists of an alphabet of symbols, each with their own
arity, and a set of such rules. A wider relation ==> is defined as
follows:
y ==> z
if and only if y
and z
are alike terms
except that y
contains a subterm u
where
z
contains a subterm w
such that
u => w
.
In any rewriting system it is useful to define ==>> as the
reflexive transitive closure of ==>
:
x ==>> z
if and only if x = z
or for some y
,
x ==> y
and y ==>> z
.
Here is an example for a fragment of a rewriting system for arithmetical expressions in Joy:
2 3 + => 5
7 2 3 + * ==> 7 5 *
7 2 3 + * ==>> 35
In the second line the subexpression 2 3 +
on the left
is called a reducible expression or redex since it can be
reduced using the rule in the first line.
There are two distinct ways in which a string rewriting system can be
interpreted as a term rewriting system. On the first interpretation, the
term system has exactly one operand. It has as unary constructors all
the symbols of the string system. It has no other constructors. The
single operand is just the null string, and any symbol, say
s
from the string alphabet is interpreted as a unary
operator append s
. The appending is either
uniformly on the left or uniformly on the right.
On the second interpretation, the term system has as operands all the symbols from the string system. It has only one binary constructor, concatenation. This interpretation is most useful for a rewriting system for Joy because it is now possible to add the unary quotation constructor which is needed for the combinators.
Rewriting systems can be based just on unconditional rules of the
form x => y
, but they can also have conditional
rules. Such rules state that certain rewritings are permitted
provided certain other rewritings are permitted. The next sections give
a conditional rewriting system for Joy.
A short general introduction to rewriting systems is in {Salomaa 1985}. String rewriting systems in particular are discussed in {Book 1985}. A general survey of rewriting systems is in {Schmitt 1987}.
This section describes the basis of a conditional rewriting system for Joy using a notation similar to Prolog.
A rewriting system for Joy will be a collection of syntactic rules for rewriting Joy programs. Such a system must be based on the two principal program constructors, program concatenation and program quotation. The system to be presented here uses (unconditional) axioms of the form:
P => Q
where P
and Q
are programs. There are also
conditional rules of the following forms, where R
,
S
, T
and U
are further
programs:
P => Q :- R => S.
P => Q :- R => S, T => U.
The rules are written in a Prolog-like syntax. The turnstyle
:- is pronounced “if”. On its left is the conclusion or
consequent. On the right is a premise or antecedent.
The antecedent can be a conjunction, as in the second form above, and
the comma ,
is pronounced “and”. So the second of the above
rules can be read as: “P
can be replaced by Q
if R
can be replaced by S
and T
can be replaced by U
”. Details of these rules are given in
the next two sections.
Using the same notation, the relation ==> is defined by the three rules:
P ==> Q :- P => Q.
P Q ==> P R :- Q ==> R.
P R ==> Q R :- P ==> Q.
The last two clauses allow rewriting in a context – P
on
the left or R
on the right.
The next rules for ==>
concern combinators. In the
following, C
i is any combinator expecting at least
i
quotation parameters:
[P] C1 ==> [Q] C1 :- P ==> Q.
[P] [R] C2 ==> [Q] [R] C2 :- P ==> Q.
[P] [S] [R] C3 ==> [Q] [S] [R] C3 :- P ==> Q.
[P] [T] [S] [R] C4 ==> [Q] [T] [S] [R] C4 :- P ==> Q.
Note that there is no rewrite rule:
[P] ==> [Q] :- P ==> Q.
The reason for this is further explained in section 7.
The final relation to be introduced is ==>>, the reflexive
transitive closure of ==>
. It is defined by:
P ==>> P.
P ==>> R :- P ==> Q, Q ==>> R.
The simplest examples of rewriting axioms are those generated by
definitions. If an atom name
has been defined
using == as program P
, in the form:
name == P
then name
may be rewritten as P
:
name => P
The stack is normally a sequence of values of various types. This
sequence is just a special list which is modified by programs. The first
general operator is newstack
, which clears the stack.
Clearing twice is the same as clearing just once. If literals were
pushed before the clearing, this has the same effect as just clearing.
So newstack
is in fact the right zero element for
program concatenation:
newstack newstack => newstack.
P newstack => newstack.
Since the stack is a list, it should be possible to put this list on
top of the stack – that is to say, on top of itself. Also, it should be
possible to make the list on top of the stack become the stack. There
are two operators that do just that: The stack
operator
pushes onto the stack a list containing all the elements of the stack.
The unstack
operator expects a list on top of the stack and
makes that the stack. The unstack
operator undoes what the
stack
operator does, but the reverse is true only in
special cases:
newstack stack => newstack [].
[] unstack => newstack.
newstack L => [L] reverse unstack.
In the last rule, L
has to be a list of literals. Also,
it should be noted that the stack is not always a sequence of values, it
can also contain operators and combinators. So, strictly speaking the
stack is always a quotation, and the stack
operator pushes
a quotation onto the stack, and the unstack
operator
expects a quotation on the stack and makes that the new stack.
Although the stack was mentioned in these informal explanations, it should be noted that it is not referred to at all in the rewrite rules. The same will be true in the sections to follow. Rewrite rules are purely syntactic, and the stack is a semantic entity. Joy symbols denote functions from stacks to stacks. But syntax does not concern semantic concepts such as denotation.
An operator denotes a function which expects as argument one stack whose top few elements satisfy some condition and which returns a stack in which the top few elements have been replaced by something else, and which otherwise is like the argument stack. This section gives rewrite rules for Joy operators. The first ones given are for general operators that can be used for any stack. Then follow rules for operators on simple types such as truth values, characters and integers. These are then followed by rules for operators on aggregate types such as sets, strings and lists.
The unary operators pop
and dup
are defined
on all stacks containing at least one element. In the rewrite rules to
follow, let a
be any literal or a program whose net effect
is to push exactly one value onto the stack:
a pop => id.
a dup => a a.
The generalisation that a
may be not just a literal but
can be a program whose effect is to push a single value is needed for
rare cases like the following:
[*] first dup => [*] first [*] first.
The two programs on the left and right of the arrow have the net
effect of pushing two occurrences of the multiplication operator
*
onto the stack.
The binary operators swap
, popd
, and
dupd
are defined on all stacks containing at least two
elements. Let a
and b
be any literals or
equivalent programs:
a b swap => b a.
a b popd => b.
a b dupd => a a b.
The ternary operators swapd
, rollup
,
rolldown
and rotate
are defined on all stacks
containing at least three elements. Let a
, b
and c
be any literals or equivalent programs:
a b c swapd => b a c.
a b c rollup => c a b.
a b c rolldown => b c a.
a b c rotate => c b a.
The ternary operator choice also expects three elements, but the
third element has to be a truth value. Let a
and
b
be any literals or equivalent:
true a b choice => a.
false a b choice => b.
The simple types of Joy are the truth value type, the character type and the integer type. The next rules are for the operators on these types.
Rewrite rules for the unary operators succ
,
pred
, abs
and sign
for integer
operands is given by the following rules. Since characters are just
small positive integers, the operators can also be applied to
characters. The last two operators can also be applied to truth values.
In what follows, let i
, j
and k
be any integers:
i succ => j. ( j = i+1 )
i pred => j. ( j = i-1 )
i abs => j. ( j = abs(i) )
i sign => j. ( j = sign(i) )
Rewrite rules for the binary operators +
,
-
, *
, /
, rem
,
max
and min
for integers operands are as
follows:
i j + => k. ( k = i+j )
i j - => k. ( k = i-j )
i j * => k. ( k = i*j )
i j / => k. ( k = i/j )
i j rem => k. ( k = i mod j )
i j max => k. ( k = max(i,j) )
i j min => k. ( k = min(i,j) )
Again these binary operators can be applied to characters as well as
integers. In the mixed case the type of the result k
is the
same as the type of the second parameter i
.
Most implementations of Joy will also provide many other arithmetical operations. Since these will be defined in a library, no reduction rules should be given here.
The type of truth values is one of the Boolean types. The binary
operators are and
, or
and
xor
(exclusive or). The unary operator is
not
. Let p
and q
be truth values
true
or false
:
p q and => r. ( r = p and q )
p q or => r. ( r = p or q )
p q xor => r. ( r = p xor q )
p not => r. ( r = not p )
A predicate is a function which leaves a truth value on the
stack. The unary predicates null
,
small
, odd
, even
,
positive
and negative
are defined for all
numeric types:
i null => p. ( p = (i=0) )
i small => p. ( p = (i<2) )
i odd => p. ( p = odd(i) )
i even => p. ( p = even(i) )
i positive => p. ( p = positive(i) )
i negative => p. ( p = negative(i) )
The binary predicates =
, !=
,
<
, <=
, >
and
>=
have the obvious rewrite rules:
i j = => p. ( p = (i = j) )
i j != => p. ( p = not(i = j) )
i j < => p. ( p = (i < j) )
i j <= => p. ( p = (i <= j) )
i j > => p. ( p = (i > j) )
i j >= => p. ( p = (i >= j) )
The remainder of this section deals with aggregate types:
sets, strings and quotations, with lists as a special case. The
unary operators first
, second
,
third
and rest
expect a non-empty aggregate on
top of the stack. The following are the rules for list aggregates:
[a L] first => a.
[a b L] second => b.
[a b c L] third => c.
[a L] rest => [L].
Here [a L]
is a non-empty list or quotation whose first
member is a
and whose rest is [L]
. For strings
an analogous notation can be used to obtain analogous rules. For
example:
"cS" first => 'c.
"cS" rest => "S".
Here “cS” denotes a non-empty string whose first character is
'c
and whose remaining characters are the string
"S"
.
For sets the rules are entirely analogous, except that the numeric ordering of the members is used. One possible notation is the following:
{a S} first => a.
{a S} rest => {S}.
Here {a S}
denotes a non-empty set whose smallest member
is a
and whose other members are those of
{S}
.
The binary operators cons
and
swons
expect an aggregate and a potential member on top of
the stack. These are the rules for list aggregates:
a [L] cons => [a L].
[L] a swons => [a L].
The rules for strings and sets are analogous.
The unary operators uncons
and
unswons
also expect a non-empty aggregate. The rules for
list aggregates are:
[a L] uncons => a [L].
[a L] unswons => [L] a.
So for strings and sets some of the rules are:
'c "S" cons => "cS".
{S} a swons => {a S}.
{a S} uncons => a {S}.
The two binary operators at
and of
are for indexing into aggregates. For list the rules might be
written:
[l1 l2 ... li ... ln] i at => li.
i [l1 l2 ... li ... ln] of => li.
So the two operators are converses of each other. For both operators
in the case of sequences the sequence ordering is used, and for sets the
underlying ordering is used. But the notation with the dots
...
is not satisfactory. Here is a better version for
at
applied to lists:
[a L] 1 at => a.
[a L] n at => b :- [L] (n-1) at => b.
And here is a version for of
applied to sets:
1 {a S} of => a.
n {a S} of => b :- (n-1) {S} of => b.
The unary operator size takes an aggregate and determines the number of elements:
[] size => 0.
[a L] size => (n+1) :- L size => n.
The unary operator reverse can be applied to any aggregate but it is useful only for sequences:
[] reverse => [].
[a L] reverse => [M a] :- L reverse => M.
These rules for reverse
are correct but inefficient
since appending to the right to produce [M a]
requires
copying – at least for the obvious implementation of lists. Most
implementations would use an accumulating parameter to optimise
the reverse
operator. It is of some interest that this
optimisation can be expressed in rewrite rules:
reverse => [] swap shunt.
[L] [] shunt => [L].
[L] [a M] shunt => [a L] [M] shunt.
The binary operator concat
can be applied to two
sequences which are either both lists or both strings:
[] [L] concat => [L].
[a L] [M] concat => [a N] :- [L] [M] concat => [N].
The operators and
, or
, xor
and
not
can be applied not only to truth values but also to
values of the set type. The reduction rules look exactly as for
the truth values, except that the operations have to be performed
bitwise. So they compute the intersection, union,
symmetric difference and complement with respect to
the largest set.
The two unary predicates null
and
small
can also be applied to aggregates. These are the
rules for lists, those for strings and sets are analogous:
[] null => true.
[a L] null => false.
[] small => true.
[a] small => true.
[a b L] small => false.
The two binary predicates in
and has
test
aggregates for members:
a [] in => false.
a [a L] in => true.
a [b L] in => a [L] in
The has
predicate is just the converse:
[] a has => false.
[a L] a has => true.
[b L] a has => [L] a has.
Most implementations of Joy will provide an operator for sorting a
sequence and a binary operator merge
for combining two
already sorted sequences. Since these will be implemented in a library,
no reduction rules are given here. The same applies to many other
operators for aggregates.
Sometimes it is necessary to test a parameter for its type. The
unary predicates logical
, char
,
integer
, set
, string
and
list
are true if the parameter is a truth value, character,
integer, set, string or list, respectively. The predicate
leaf
is true if the parameter is not a list:
false logical => true.
123 logical => false.
123 integer => true.
['A 'B 'C] leaf => false.
There is another operator for multi-choices. It expects a non-empty
list of non-empty lists on top of the stack and below that one further
item. The opcase
operator matches the type of the item with
the first
members of the lists. When a match is found, the
rest
of that list is pushed onto the stack. If no match is
found, then the last list is used as the default:
123 [ [0 P] ['a Q] ["" R] ... ] opcase => 123 [P].
'c [ [0 P] ['a Q] ["" R] ... ] opcase => 'c [Q].
"Hello" [ [0 P] ['a Q] ["" R] ... ] opcase => "Hello" [R].
The simplest unary combinators are i
and
x
, they require the top of the stack to be one
quotation. Let P
be any program:
[P] i => P.
[P] x => [P] P.
The next unary combinators, dip
, dip2
and
dip3
, allow manipulation of the stack below the top few
elements. Let P
be any program, let a
,
b
and c
be any literals or equivalent:
a [P] dip => P a.
a b [P] dip2 => P a b.
a b c [P] dip3 => P a b c.
Another unary combinator is nullary
. Its rewrite rule
has to be expressed conditionally. Let L
, M
and P
be any programs:
L [P] nullary => L a :- L P => M a.
Three similar unary combinators are unary
,
binary
and ternary
:
L b [P] unary => L a :- L b P => M a.
L b c [P] binary => L a :- L b c P => M a.
L b c d [P] ternary => L a :- L b c d P => M a.
Three further unary combinators are app1i
,
app2unary2
and
app3unary3
. Let a
, a'
,
b
, b'
, c
and c'
be
any literals or equivalent. Note that the primed versions are used as
the result of applying P
to the unprimed versions:
L a [P] i => L a' :- (* app1 *)
L a P => M a'.
L a b [P] unary2 => L a' b' :- (* app2 *)
L a P => M a', L b P => N b'
L a b c [P] unary3 => L a' b' c' :- (* app3 *)
L a P => M a', L b P => N b', L c P => O c'.
There is even an app4unary4
combinator which
applies [P]
to four parameters a
,
b
, c
and d
.
The binary combinators expect two quotations on top of the
stack. The b combinator expects two quotations [P]
and
[Q]
, with [Q]
on top:
[P] [Q] b => P Q.
The cleave
combinator also expects two quotations, and
below that an item a
:
L a [P] [Q] cleave => L b c :- L a P => M b, L a Q => N c.
The ternary combinators expect three quotations on top of
the stack. One of the most important is ifte
which performs
branching. Its third parameter is the if-part, its second parameter is
the then-part, its first parameter, on top, is the else-part:
L [I] [T] [E] ifte => T :-
L I => M true.
L [I] [T] [E] ifte => E :-
L I => M false.
The binary while
do combinator is similar to
the ifte
combinator in that it has a test, the while-part,
which is second on the stack. The combinator repeatedly executes the
while-part and while that yields true
it executes the other
part, the do-part:
L [W] [D] while => L :-
L W => M false.
L [W] [D] while => L D [W] [D] while :-
L W => M true.
The ternary tailrec
combinator for tail
recursion also has a test, the third parameter. If that yields
true
, the second parameter is executed and the combinator
exits, otherwise the top parameter is executed and after that the
process is repeated:
L [I] [T] [R] tailrec => L T :-
L I => M true.
L [I] [T] [R] tailrec => L R [I] [T] [R] tailrec :-
L I => M false.
The quaternary combinators expect four quotations on top of
the stack. The linrec
combinator for linear
recursion expects an if-part [I]
, a then-part
[T]
, and two recursion parts [R1]
and
[R2]
:
L [I] [T] [R1] [R2] linrec => L T :-
L I => M true.
L [I] [T] [R1] [R2] linrec => L R1 [I] [T] [R1] [R2] linrec R2 :-
L I => M false.
The binrec
combinator for binary recursion is
similar, except that the first recursion part has to produce two values.
The recursion with all four parts is applied to the two values
separately. The second recursion part then has available the two results
from these two applications:
L [I] [T] [R1] [R2] binrec => L T :-
L I => M true.
L [I] [T] [R1] [R2] binrec => L a b R2 :-
L T => M false,
L R1 [I] [T] [R1] [R2] binrec => N a b.
The genrec
combinator for general recursion
also has an if-part, a then-part and two recursion parts. It differs
from the other two combinators in that after the execution of the first
recursion part nothing in particular is executed, but a program
consisting of the four parts and the combinator is pushed onto the
stack. The second recursion part thus has it available as a
parameter:
L [I] [T] [R1] [R2] genrec => L T :-
L I => M true.
L [I] [T] [R1] [R2] genrec =>
L R1 [[I] [T] [R1] [R2] genrec] R2 :-
L I => M false.
There are several combinators which do not have a fixed number of
quotation parameters. Instead they use a list of quotations. The
cond
combinator is like the one in Lisp, it is a
generalisation of the ifte
combinator. It expects a
non-empty list of programs, each consisting of a quoted if-part followed
by a then-part. The various if-parts are executed until one is found
that returns true
, and then its corresponding then-part is
executed. The last program in the list is the default which is executed
if none of the if-parts yield true
:
L [ [[I1] T1] REST ] cond => L T1 :-
L I1 => M true.
L [ [[I1] T1] REST ] cond => L [ REST ] cond :-
L I1 => M false.
The condlinrec
combinator is similar, it expects a list
of pairs or triples of quoted programs. Pairs consist of an if-part and
a then1-part, and triples consist of an if-part, a rec1-part and a
rec2-part. Again the first if-part that yields true
selects
its corresponding then-part or rec1-part for execution. If there is a
rec2-part, the combinator first recurses and then executes the
rec2-part. The last program is the default, it does not have an
if-part.
The cleave
combinator also has a generalisation. The
construct
combinator expects two parameters, a quotation
and above that a list of quotations. Each quotation in the list will
produce a value that will eventually be pushed onto the stack, and the
first quotation determines the stack onto which these values will be
pushed:
L [P] [..[Qi]..] construct => L P ..qi.. :- L Qi => M qi.
Some combinators expect values of specific types below their quotation parameters. The next few combinators expect values of simple types.
The binary combinator branch
expects a truth value below
its two quotation parameters: The branch
combinator
resembles the choice
operator and the ifte
combinator. The truth value below the two quotations determines which of
the two quotations will be executed. If the truth value is
true
, then the if-part, the second parameter, is executed,
otherwise the then-part, the top parameter, is executed:
true [P] [Q] branch => P.
false [P] [Q] branch => Q.
The unary combinator times
expects a numeric value below
its quotation parameter: The times
combinator executes its
quotation parameter as many times as indicated by the numeric value; if
the value is zero or less, then the quotation is not executed at
all:
0 [P] times => id.
n [P] times => P (n-1) [P] times.
The stack is normally a list, so any list could serve as the stack,
including a list which happens to be on top of the stack. But the stack
can also contain operators and combinators, although this does not
happen often. So the stack is always a quotation, and any other
quotation could serve as the stack, including one on top of the stack.
The infra
combinator expects a quotation [P]
which will be executed and below that another quotation which normally
will be just a list [M]
. The infra
combinator
temporarily discards the remainder of the stack and takes the quotation
or list [M]
to be the stack. It then executes the top
quotation [P]
which yields a result stack. This resulting
stack is then pushed as a list [N]
onto the original stack
replacing the original quotation or list. Hence any quotation can serve
as a complex unary operation on other quotations or lists:
L [M] [P] infra => L [N] :- [M] unstack P => N.
For linear recursion over numeric types the if-part often is
[null]
and the first recursion part is
[dup pred]
. The primrec
combinator has this
built in. For integers the rewrite rules are:
0 [T] [R2] primrec => pop T.
i [T] [R2] primrec => i dup pred [T] [R2] primrec R2.
The primrec
combinator can also be used for aggregates.
The implicit if-part is again [null]
, and the implicit
first recursion part is [rest]
. Below is the version for
lists, the versions for sets and strings are analogous:
[] [T] [R2] primrec => pop T.
[a L] [T] [R2] primrec => a [L] [T] [R2] primrec R2.
The unary combinators step
, map
,
filter
and split
all expect an aggregate below
their quotation parameter.
For step
operating on lists the rewrite rule is:
[] [P] step => id.
K [a L] [P] step => M [L] [P] step :-
K a P => M.
For strings and sets the rules are analogous. The same is true of the
rules to follow. For map
operating on lists the rewrite
rule is:
[] [P] map => [].
K [a L] [P] map => K [b M] :-
K a P => K b, K [L] [P] map => K M.
The filter
combinator expects a predicate as its
quotation parameter:
[] [P] filter => [].
K [a L] [P] filter => K [a M] :-
K a P => J true, K [L] [P] filter => K [M].
K [a L] [P] filter => K [M] :-
K a P => J false, K [L] [P] filter => K [M].
The split
combinator is like filter
except
that it produces two lists. The first list is just like the one from
filter
, the second list is the list of those elements which
did not pass the predicate test [P]
and hence are not
members of the first list:
[] [P] split => [].
K [a L] [P] split => K [a M] [N] :-
K a P => J true, K [L] [P] split => K [M] [N].
K [a L] [P] split => K [M] [a N] :-
K a P => J false, K [L] [P] split => K [M] [N].
The unary combinator fold
expects a quotation which
computes a binary operation. Below that has to be a literal and below
that an aggregate. The literal is used as a start value to fold or
reduce the aggregate. Applied to lists the combinator has these
rules:
[] a [P] fold => a.
[b L] a [P] fold => d :-
a b P => c,
[L] a [P] fold c P => d.
The two unary combinators some
and
all
expect an aggregate below their quotation parameter.
The quotation must be a predicate, yielding a truth value. The
some
combinator returns true
if some members
of the aggregate pass the test of the quotation, otherwise it returns
false
. The all
combinator returns
true
if all members of the aggregate pass the test of the
quotation, otherwise it returns false
. For empty aggregates
some
returns false
and all
returns true
. The rules for some
are:
[] [P] some => false.
L [a A] [P] some => L true :-
L a P => M true.
L [a A] [P] some => L [A] [P] some :-
L a P => M false.
The rules for all
are:
[] [P] all => true.
L [a A] [P] all => L false :-
L a P => M false.
L [a A] [P] all => L [A] [P] all :-
L a P => M true.
The unary combinator zipwith
expects two aggregates and
above that a program suitable for combining their respective elements.
For lists the rules are:
[] [A] [P] zipwith => [].
[A] [] [P] zipwith => [].
L [a A] [b B] [P] zipwith => L [c C] :-
L a b P => M c,
L [A] [B] [P] zipwith => L [C].
This section deals with the role of the Joy stack from a syntactic and semantic point of view.
First, let us consider a quite small arithmetic expression in postfix notation:
2 3 + 8 5 - *
A reduction might begin by doing the addition first, or the
subtraction first, followed in a second step by the other operation. In
fact, the addition and the subtraction could be done in parallel in the
same step. Only when both reductions have been done will it be possible
to do the final multiplication. The final result is the value
20
, and it is independent of the order in which the
reductions have been applied. In detail, the first mentioned reduction
sequence will look like this:
2 3 + 8 5 - *
5 8 5 - *
5 3 *
15
One possible strategy for reductions is the following:
Scan the expression from left to right until a redex is found, an expression that can be replaced in accordance with a rewrite rule. Apply the rule. Repeat until no more rules can be applied.
This strategy is most efficient for reducing expressions in which redexes are found early. The following is an example. Again all operators are binary, but note that except at the beginning operators and literals alternate. In each step the first three symbols constitute a redex:
10 5 / 3 * 4 - 1 +
The strategy is least efficient when a redex is found late. In the example below, note that all operators occur towards the end:
3 2 6 8 6 - / + *
The strategy requires skipping the 3
, 2
and
6
and only then replacing 8 6 -
by
2
. The next step requires skipping 3
and
2
and only then replacing 6 2 /
by
3
. The next step requires skipping 3
and only
then replacing 2 3 +
by 5
. The final step
requires no skipping, 3 5 *
is replaced by 15
.
All this skipping is of course inefficient.
A better strategy would apply the next operator at the point of the most recent change, if that is possible. An obvious way to do this is to use a stack of values for intermediate results. As the expression is being processed, operands such as literal numbers are pushed, and operators pop their arguments off the stack and push their result. This is of course the method commonly used for evaluating postfix expressions. So we have the following situation: The rewriting rules for programs are purely syntactic, they do not mention the stack. But the stack can be used as an optimisation of the rewrite rules. On the other hand, the stack is apparently an essential semantic entity, it is the argument and value of the functions denoted by programs.
But this now raises the question whether the stack is just an optimisation for the rewriting system or whether it is really needed as a semantic object. In other words, is it possible to give a semantic characterisation of Joy which does not involve a stack at all? In such a semantics the programs will have to denote something, and presumably they will have to denote functions. But what might be the arguments and values of these functions?
It will help to review the stack based semantics of Joy: The literals such as numerals, characters, strings and quotations denote functions taking any stack as argument and producing another stack as value which is like the argument stack except that a single item has been pushed on top. The operators also denote unary functions from stacks to stacks, and the result stack is like the argument stack except that the top few items have been replaced by the result of applying some operation. Likewise, the combinators denote unary functions from stacks to stacks, and the result stack depends on the combinator and the top few quotations.
To obtain a Joy semantics without a stack we take our hint from the rewriting rules. The operators and combinators no longer denote functions from stacks to stacks. The rewrite rule for addition transforms a program ending with two numerals into a program ending with a numeral for their sum. This is the key for a semantics without a stack: Joy programs denote unary functions taking one program as arguments and giving one program as value. The literals denote append operations; the program returned as value is like the program given as argument, except that it has the literal appended to it. The operators denote replacement operations, the last few items in the argument program have to be replaced by the result of applying the operator. Similarly the combinators also denote (higher order) functions from programs to programs, the result program depends on the combinator and the last few quotations of the argument program.
It is clear that such a semantics without a stack is possible and that it is merely a rephrasing of the semantics with a stack. Purists would probably prefer a system with such a lean ontology in which there are essentially just programs operating on other programs. But most programmers are so familiar with stacks that it seems more helpful to give a semantics with a stack. It is of course irrelevant for the semantics that for efficiency reasons any implementation of Joy will in fact use a stack.
There is one other argument for a stack semantics. By a program one would normally mean one that can be run, at least when supplied with appropriate parameters. The stack, however, can sometimes contain sequences of items that make the stack a non-executable program because it violates type rules. Such situations arise for example by executing one of the following:
[ 3 * ] second
[ pop cons map ] [] step
The first results in the one operator *
being pushed.
The second results in two operators and one combinator to be pushed.
Such situations are required only rarely. But the possibility is needed,
for example for a Joy interpreter joy written in Joy itself. Such an
interpreter is described in another paper.
It was mention in section 3 that for quotations there is no rewrite rule of the form:
[P] ==> [Q] :- P ==> Q.
If there were such a rule, then the rewriting:
42 dup ==> 42 42.
would license:
[ 42 dup ] ==> [ 42 42 ].
and hence:
[ 42 dup ] second ==>> [ 42 42 ] second.
dup ==>> 42.
which is absurd. On the other hand:
[ 42 dup ] i + ==> [ 42 42 ] i +
is acceptable. So, quotations must not allow substitutions in all contexts, but only in those where the quotation is guaranteed to be undone by a dequoting operation, by a combinator. In other words, quotation is an intensional constructor.
There is a simple way out of this, and it is to treat quotations of
programs to be very different from lists. Notice that the absurdity
comes from taking the second
element of the quotations
[42 dup]
and [42 42]
. If it were forbidden to
treat quoted programs as data structures, then the fatal inference would
be blocked. In detail, such a treatment would look like this: If
P
is a program, then (P)
is its quotation, now
written inside round parentheses. Also, [P]
is its list, as
before written inside square brackets. Both (P)
and
[P]
can be pushed onto the stack, can be
swap
ped, dup
licated and pop
ped,
can be inserted into lists and later extracted. But only
(P)
can be used as a parameter for combinators, and it
cannot be treated as a list. On the other hand, [P]
cannot
be used as a parameter for combinators, but it can be treated as a list.
Importantly, there could then be a reduction rule:
(P) ==> (Q) :- P ==> Q.
and hence quotation would be an extensional constructor.
This is a draconian solution, it allows programs such as:
[2] cons reverse
but forbids:
(+) cons map
The latter uses map
to add a single number on top of the
stack to each member of a list that is second on the stack. If the
single number on top of the stack is, say 7
, then the
cons
operation produces (7 +)
to be used by
map
. The prohibition would rule out
parameterisation. In general, the prohibition would rule out
using constructed programs as parameters to combinators.
It is possible to make a less drastic compromise: As before,
quotations (P)
serve as parameters to combinators, but they
can also be built up by list operations such as
concat
enation or cons
ing further items into
their front. This would allow parameterisation as in the
map
example above. Quotations could be constructed and
built up further and further and eventually called by a combinator, but
quotations could not be destructed. On this proposal constructive
operations on quotations would be allowed, but destructive operations
would not. All list operations would need to be classified as
constructive or destructive. Even the size
operator would
turn out to be destructive.
This compromise solution has much in its favour. Quotation is extensional, combinators can use constructed programs, but the absurdity does not arise. On the other hand, the compromise requires a syntactic distinction between quotations and lists, and it requires a semantic distinction between operations that can be applied to lists and to quotations, and those that can be applied only to lists.
On the whole, then, it does seem preferable to have quotation as an intensional constructor.
The rewriting system described up to here concerned values. It is
also possible to give a rewriting system for Joy type
expressions. We shall need constant and variable symbols for these
types. The following will be used as type constants:
Log
for the truth values, Chr
for the
characters, Int
for the integers, Set
for the
sets, Str
for the strings and Lst
for possibly
heterogeneous lists. For lists whose member are all of the same type,
say Int
, the notation [Int]
will be used. As
variables we use T
, T1
, T2
and so
on. So [T]
is the type of lists whose members are all of
the type T
.
The following is a sample of one style of rules for operators:
T1 T2 swap => T2 T1.
Int Int + => Int.
Str size => Int.
The notation used above has been made as close as possible to the notation for the rewriting rules for values. In the following a different notation will be introduced which is more useful.
Literals have atomic types, operators and combinators have
compound types. There are three constructors for compound
types: type concatenation, type quotation and type
cancellation. The first two are derived from the corresponding
program constructors. The third is new and has no counterpart program
constructor. It uses the infix symbol -> to combine two types into a
new one. If T
is a type, then so is its quotation
[T]
. If T1
and T2
are types, then
so are their concatenation (T1 T2)
and their cancellation
T1->T2
. If P1
and P2
are
programs of types T1
and T2
, then the type of
their concatenation (P1 P2)
is the concatenation
(T1 T2)
of their types. Cancellation satisfies the law:
T1 T1->T2 => T2
For types with concatenated parameter types:
(T1 T2)->T3 => T1->(T2->T3)
The expression on the right of the arrow can be written without
parentheses on the convention that the cancellation operator
->
is taken to be right associative.
The three rules given above should now be rewritten in this style:
swap => (T1 T2) -> (T2 T1).
+ => (Int Int) -> Int.
size => Str -> Int.
The following are a sample of further rules in the two styles: Those in the left column are in the earlier style, those in the right are in the new style:
Chr succ => Chr. succ => Chr->Chr.
Int Int > => Log. > => (Int Int)->Log.
Log Log and => Log. and => (Log Log)->Log.
[T Lst] first => T. first => [T Lst]->T.
[T Lst] rest => Lst. rest => [T Lst]->Lst.
T Lst cons => [T Lst]. cons => (T Lst)->[T Lst].
[T Lst] uncons => T Lst. uncons => [T Lst]->(T Lst).
Chr Str cons => Str. cons => (Chr Str)->Str.
Set null => Log. null => Set->Log.
Consider now the program P
below. Its type is given by
the concatenation of the types of its parts, in the line just below. All
the types here are built from the atomic type Int
of
integers. By four applications of cancellation the type in line 1 is
simplified to the type Int
in line 4:
P: 2 3 + dup *
1. Int Int (Int Int)->Int Int->(Int Int) (Int Int)->Int
2. Int Int->(Int Int) (Int Int)->Int
3. (Int Int) (Int Int)->Int
4. Int
For combinators only a few examples will be given here, for
i
and map
:
i => [T] -> T.
map => ([T1 -> T2]) -> ([T1] -> [T2]).
The formalism used in this section is that of categorial grammars. These have their origin in the (simple) theory of types and as generating devices are as powerful as context free grammars. Expositions and applications are to be found in the volume edited by {Oehrle et al 1988}, see in particular the contributions by {Casadio} and {Lambek}. Another reference is the volume edited by {Buskzkowski et al 1988}.
Rewriting systems are purely syntactic. If the object language has a semantics, then the rewriting rules have to be shown to be correct with respect to this semantics. This is true of the rewriting rules of the previous sections which dealt with values. It is also true for the rewriting rules for types. The basic semantic notion here is that of assigning types to programs. These take the form:
P : T
which is not a rewriting rule but a statement which
says that program P
is of type T
. The basic
type statements are to atomic programs, literals, operators and
combinators. Here are some examples:
42 : Int 'A : Chr
succ : Chr -> Chr > : Int Int -> Log
first : [T Lst] -> T i : [T] -> T
This is the style adopted in the Joy manual. To obtain rewrite rules
using =>
a single conditional rule is needed which
converts the semantic predicate : into the syntactic =>
,
as follows:
X => T :- X : T.
The material in this section has very tentative, most of the details need to be worked out fully.