by Manfred von Thun
9-NOV-2005
Abstract: There is a subset of Joy in which quotations contain at most one element and nesting of quotations is not allowed. For every Joy program there is a corresponding program in this subset, and when the corresponding program is executed, the original is restored. A translation from full Joy to the subset can be written in full Joy (and, if one wanted to, also in the subset itself).
In most high-level programming languages programs contain nested structures, typically in conditionals and loops. Nested structures also arise in data as in nested arrays or nested lists. Concatenative languages such as Forth, Joy, Slava Pestov’s Factor, and Stevan Apters’s cK and XY are no exception. But nesting interrupts the concatenative semantics: a concatenative program must treat a quotation, a contained nested program, as an indivisible unit, but when the quotation is executed by a combinator, its concatenative semantics unfolds. In the "concatenative" mailing group Billy Tanksley raised the question whether it would be possible to design a concatenative language without nesting, a flat concatenative language.
The written representation of programs in any language consists of a sequence of symbols which themselves are sequences of characters. The characters are collected by a scanner or tokeniser to form symbols, and the symbols are examined by a parser to form a syntax tree and by the compiler to form a program which typically is nested. Note that the sequence of symbols is not nested, but it contains some special symbols such as brackets and IF, THEN, WHILE.. . These symbols direct the compiler to build the nested structures. So one can think of the sequence of symbols as a program that is executed by the compiler and produces an internal nested structure. Surprisingly the sequence of symbols constitutes a flat concatenative language. The compiler computes a function, and so does the interpreter. Hence the composition of the compiler and the interpreter is an interpreter for a flat concatenative language.
Stevan Apter has adopted this principle in his web page http://www/nsl.com/k/xy/xy.htm. There he uses his concatenative language XY to write a proof-of-concept implementation for a flat concatenative language. The XY language uses a stack X, like all concatenative languages, but also a queue Y. It so happens that this arrangement is suitable for implementing a flat concatenative language. Essentially it works by compiling the flat input language to form quotations prior to every execution. So the compilation is done by the XY interpreter, and the result is then again interpreted by XY. The alternative solution offered in the remainder of this note borrows some of these ideas. Instead of a queue Y an ordinary Joy list is used. The flat input language is actually a subset of Joy.
Let Floy ("FLat jOY") be that subset of Joy in which quotations are not allowed to contain more than one atom. This means that [] [3] [dup] [*] and [map] are allowed, but [[]] [1 2 3] [[3]] [dup *] and [[map]] are not. The aim of this note is to demonstrate the following theorem about Joy:
For every Joy program J
there is a Floy program F which
constructs the quotation [J] of the Joy program J:
F => [J]
and hence F i == [J] i == J
The proof uses a technique commonly used in the theory of computability: One shows that there is a translation function T such that T(J) = F. The translation function is computable, so the proof is constructive. The translation could be computed by a program in any Turing complete language.
Joy is Turing complete, so the translation function from Joy to Floy can be written in Joy itself. Let j2f be a Joy program which takes the quotation [J] of a Joy program J as a parameter and produces the quotation [F] of the corresponding Floy program F. Note that the quotation [F] itself will in general not be a Floy program. The translation program j2f must satisfy:
[J] j2f == [F]
and hence [J] j2f i == [F] i == [J]
and [J] j2f i i == [F] i i == [J] i == J
From the second line above [J] j2f i == [J] and hence j2f i == id. So
j2f is the left inverse of i
, and i
is the
right inverse of j2f.
The quotation [F] of the Floy program F, when executed by
i
, has to reconstruct the quotation [J] of the Joy program
J, including all internal quotations of J which may be nested. There are
two ways in which these quotations can be constructed by F. Both must
begin with the empty quotation []. Then, in the forward direction, start
with the first, second .. item and append them in that order to the
quotation so far. Alternatively, in the reverse direction, start with
the last, second-last .. item and prepend them in that order to the
quotation so far. Since there are two ways in which a Floy program could
operate, there are in fact two possible Floy programs corresponding to a
given Joy program. And, for sheer perversity, a Floy program could use a
mixture of these two strategies. But such mixtures will not be further
considered here. The forward or backward versions can be constructed by
two different Joy-to-Floy programs, j2f-forwards and j2f-reverse. Both
Joy-to-Floy programs traverse the original Joy quotation [J] and any of
its internal quotations in the forward direction. Both start building
[F] as an empty quotation []. Then j2f-forwards starts at the first,
second .. item of [J] or its internal quotation to build [F] in the
forward direction, so that when [F] is ever executed by the i
combinator, it will reconstruct [J] in the forward direction. On the
other hand, j2f-reverse also starts at the first, second .. item to
build [F] in the reverse direction, so that when [F] is ever executed by
i, it will reconstruct [J] in the reverse direction. Is your head
spinning yet? So was mine when writing the code.
The remainder of this note consists of the separately commented output flatjoy.out from running flatjoy.joy The first definitions define four quoted Joy programs that will later be used to test the two Joy-to-Floy translators. The first is the program that does nothing, the second contains some actual code without nesting, the third has two occurrances of one level of nesting, and the fourth has two occurrances of two levels:
DEFINE
P0 == []; # empty code
P1 == [2 3 + dup *]; # non-empty code
P2 == [[1 2 3] [dup *] map]; # nest data & code
P3 == [ [[1 2][3 4][5] []] [[dup *] map] map ]. # double nesting
Next comes the definition of j2f-forwards and its subsidiaries. The
forward version [F] has to use a lot of concatenation, so the first
definition abbreviates concat to c
. The second definition
is for j2f-f, a recursive operator, the real workhorse. If the item to
be translated is a list, then do some preparation, step recursively
through the list, and then do some finalisation. The preparation
consists of appending an empty list to what has been constructed so far,
and the finalisation consists of appending code which, when executed by
i
, will append the unit list of the reconstructed list to
the partially reconstructed [J]. On the other hand, if the item is not a
list, but an atom foo, build code and append code which, when executed
by i
, will append foo to the partially reconstructed [J].
The final definition is for j2f-forwards, which must provide an empty
starting list [] and then step through [J] by calling j2f-f:
DEFINE
c == concat;
j2f-f ==
[ list ]
[ [[[]] concat] dip [j2f-f] step [[] cons c] concat ]
[ [] cons [c] cons concat ]
ifte;
j2f-forwards == [[]] swap [j2f-f] step.
The following are the tests. For each of P0..P3, we see the original
quoted Joy program [J], then its translation [F] by j2f-forwards, then
the result of executing this by the i
combinator to yield
the original [J]. For each of P1..P3 we also see the result of executing
this by a further i
combinator. In [F], remember that
c
is short for concat:
P0 .
[]
P0 j2f-forwards .
[[]]
P0 j2f-forwards i .
[]
P1 .
[2 3 + dup *]
P1 j2f-forwards .
[[] [2] c [3] c [+] c [dup] c [*] c]
P1 j2f-forwards i .
[2 3 + dup *]
P1 j2f-forwards i i .
25
P2 .
[[1 2 3] [dup *] map]
P2 j2f-forwards .
[[] [] [1] c [2] c [3] c [] cons c [] [dup] c [*] c [] cons c [map] c]
P2 j2f-forwards i .
[[1 2 3] [dup *] map]
P2 j2f-forwards i i .
[1 4 9]
P3 .
[[[1 2] [3 4] [5] []] [[dup *] map] map]
P3 j2f-forwards .
[[] [] [] [1] c [2] c [] cons c [] [3] c [4] c [] cons c [] [5] c [] cons c [] []
cons c [] cons c [] [] [dup] c [*] c [] cons c [map] c [] cons c [map] c]
P3 j2f-forwards i .
[[[1 2] [3 4] [5] []] [[dup *] map] map]
P3 j2f-forwards i i .
[[1 4] [9 16] [25] []]
The Floy programs so constructed are almost readable, and with some
practice one might even learn to write that way – if one really had to.
But there is something else that is unsatisfactory. On inspecting the
code for j2f-f and the translations that it produces, we see that both
use a lot of concatenations: j2f-f uses concat directly, and the
translations use the abbreviation c
, which will be executed
when the translation is executed by the i
combinator. In
both cases the concatenation serves to append a quite short list to
another list which might already be quite long. But that means that the
other longer list has to be copied entirely to have the short list as
its new tail. This is inefficient, and a better solution would be to
work the other way, so that a short list is prepended rather than
appended.
The following definition for translator j2f-reverse addresses the
problem. From a given Joy program [J] it constructs in reverse an
equivalent Floy program [F] which when executed by the i
combinator reconstructs in reverse the original Joy program [J]. Instead
of c
for concat, define s for swoncat, to be used inside
the translation [F]. The workhorse is j2f-r. If the item is a list do
some preparation, recursively step through the list and then do some
finalisation. Finalisation consists in prepending [] to the [F] built so
far, it will be the starting point when the completed [F] is executed by
i
. Preparation consists in prepending to [F] code which
when executed by i
will prepend the constructed quotation
to the partially rebuilt [J]. If the item is not a list but say foo,
construct and prepend code which when executed by i
, will
prepend foo. The final j2f-reverse has to provide an inital empty list
[] and after stepping through [J] prepend another empty list to
that:
DEFINE
s == swoncat;
j2f-r ==
[ list ]
[ [[swons] swoncat] dip [j2f-r] step [] swons ]
[ [] cons [s] cons swoncat ]
ifte;
j2f-reverse == [] swap [j2f-r] step [] swons.
The tests for j2f-reverse are the same as for j2f-forwards: Show [J],
then translate to yield [F] , then i
the translation to
reconstruct [J], and where appropriate, i
the
reconstruction:
P0 .
[]
P0 j2f-reverse .
[[]]
P0 j2f-reverse i .
[]
P1 .
[2 3 + dup *]
P1 j2f-reverse .
[[] [*] s [dup] s [+] s [3] s [2] s]
P1 j2f-reverse i .
[2 3 + dup *]
P1 j2f-reverse i i .
25
P2 .
[[1 2 3] [dup *] map]
P2 j2f-reverse .
[[] [map] s [] [*] s [dup] s swons [] [3] s [2] s [1] s swons]
P2 j2f-reverse i .
[[1 2 3] [dup *] map]
P2 j2f-reverse i i .
[1 4 9]
P3 .
[[[1 2] [3 4] [5] []] [[dup *] map] map]
P3 j2f-reverse .
[[] [map] s [] [map] s [] [*] s [dup] s swons swons [] [] swons [] [5] s swons [] [4]
s [3] s swons [] [2] s [1] s swons swons]
P3 j2f-reverse i .
[[[1 2] [3 4] [5] []] [[dup *] map] map]
P3 j2f-reverse i i .
[[1 4] [9 16] [25] []]
As may be seen, the reverse Floy programs are much harder to read and write when one’s mindset is for postfix notation. Finally, I could not resist flattening the already flattened:
# Double flattening - an exercise in Joy obfuscation:
P3 j2f-forwards j2f-reverse .
[[] [c] s [] [map] s swons [c] s [cons] s [] swons [c] s [] [map] s swons [c] s
[cons] s [] swons [c] s [] [*] s swons [c] s [] [dup] s swons [] swons [] swons [c] s
[cons] s [] swons [c] s [cons] s [] swons [] swons [c] s [cons] s [] swons [c] s []
[5] s swons [] swons [c] s [cons] s [] swons [c] s [] [4] s swons [c] s [] [3] s
swons [] swons [c] s [cons] s [] swons [c] s [] [2] s swons [c] s [] [1] s swons []
swons [] swons [] swons]
# partial de-obfuscation - by Joy itself:
P3 j2f-forwards j2f-reverse i .
[[] [] [] [1] c [2] c [] cons c [] [3] c [4] c [] cons c [] [5] c [] cons c [] [] cons c [] cons c [] [] [dup] c [*] c [] cons c [map] c [] cons c [map] c]
# full de-obfuscation - again by Joy:
P3 j2f-forwards j2f-reverse i i .
gc - 191 nodes inspected, 126 nodes copied, clock: 1
[[[1 2] [3 4] [5] []] [[dup *] map] map]
# final run of de-obfuscated original:
P3 j2f-forwards j2f-reverse i i i .
[[1 4] [9 16] [25] []]
This ends the output file flatjoy.out from running the source file flatjoy.joy. The definitions of the two translators j2f-forwards and j2f-backwards have been added in slightly modified form to the Joy library for symbolic manipulation, symlib.joy.
Here again is the theorem already mentioned at the beginning:
For every Joy program J
there is a Floy program F which
constructs the quotation [J] of the Joy program J:
F => [J]
and hence F i == [J] i == J
The proof is by induction on the structure of the Joy program J, using either of the two Joy-to-Floy translators just shown. A informal version of the proof is as follows: The base case is for the quoted empty program []; the two tests for P0 have demonstrated the correctness of the translators. For the induction on the length of the program without any internal quotations, the two tests for P1 have demonstrated the correctness. For the induction on the depth of nested quotations, the two tests for each of P2 and P3 demonstrate the correctness. A more formal proof would need to proceed by analysing the actual code for the translators and even of the implementation of Joy, but this is outside the scope of this note.
The translations into the two versions of Floy contain many
occurrances of unitlists of the form [foo], each followed by a
c
(for concat) in the forward version, or by a
s
(for swoncat) in the reverse version. It might be thought
that unitlists already break the ideal of flatness. For Joy as it is,
there seems to be no other solution. But the syntax of Joy could be
extended by adding unary operators in prefix notation. Two kinds of
solutions suggest themselves: Writing say Q for one such operator, Q foo
would be the same as [foo]. Then for the forward and reverse versions of
Floy one might write Q foo concat, or Q foo swoncat. Then one might
combine the quoting operator with the concat or swoncat in two other
unary operators C or S, and then writing C foo, or S foo.
The other kind of solution would introduce an operator F, so that F foo would be the same as [foo] first, with the effect of pushing foo onto the stack. Then concat or swoncat would have to be replaced by append and prepend, possibly by combined unary operators A and P, and then writing A foo, or P foo.
Similar unary operators have been suggested before in different contexts for different purposes. But such a major addition to the syntax does not sit so well with the ideal of simple concatenativity, and no such addition is currently planned for Joy. However, it may so turn out that compelling arguments for additional syntax will emerge in the future.