You are on page 1of 51

Building LISP Introduction

The best way to understand how something works is to try to build it for yourself. Reading somebody else's explanation might satisfy your curiosity, but without the experience of falling into all the little traps it is difficult to get a feel for why something is designed a certain way. It's been said that every would-be programmer should write a compiler. hile I think this is good advice !although I haven't followed it myself", there is so much effort involved #ust in parsing a language such as $ that any potential insights risk getting lost in a mire of details. %erhaps creating an interpreter for some simple language would be a good first step. I first started playing around with &I'% a good few years ago, yet much later than I should have. This led me to the classic lecture series 'tructure and Interpretation of $omputer %rograms. If you have the next () hours free and haven't seen the videos already, go watch them now. The course covers many topics, but the second half shows in detail how to evaluate &I'%, first by implementing a simple version of eval in &I'% itself. I figured that this would translate well into $, and so decided to try creating my own implementation of &I'%. It was really easy. This article is an attempt to share the process by which I built my implementation, and the chapters occur roughly in the order in which I did things. hy not follow along and create your own version in your language of choice*+ ,s a professional programmer !ha, ha", I spend the ma#ority of my time writing $ and $--. .ost of the rest is /ava. There are many languages out there, each with their own debatable merits, but I'd like to demonstrate #ust how simple a &I'% machine can be 0 even built in as low-level a language as $. 'ee /ohn .c$arthy's 1istory of &I'% for the story of the pioneers. 'o here is my toy implementation of &I'%. I've borrowed features from various dialects, but it's closer to 'cheme than $ommon &I'%. The differences are trivial enough that changing over would not re2uire substantial changes to the interpreter. 3on't worry if you're not familiar with &I'%4 I will define everything as I go along. It is not meant to be the smallest possible implementation, nor the most efficient, nor the most complete4 it could however be described as la5y. .y goal was to write robust, easy-to-read code that does exactly what it needs to, and no more, and I hope that it conveys how little effort is re2uired to construct an incredibly powerful environment like &I'%.
+

If you are using a fancy language which supports something like eval, it would be cool to expose the native datatypes to the &I'% environment.

Data
e will define four kinds of ob#ect to begin with7 Integer , number. 8or example7 9, -:, ;. 'ymbol , name consisting of a string of characters. 8or example7 FOO, BAR, ADD-TWO. e will normali5e characters to upper-case in this pro#ect, but this is not strictly necessary.
NIL

Represents <nothing<. , bit like NULL in $ and other languages. %air , pair consists of two elements, which for historical reasons are called car and cdr. =oth can hold either an integer, a symbol, NIL, or a reference to another pair. The types of each element may be different. Integers, symbols and NIL are called simple data. The term atom can refer to either a simple datum or a pair !purists may disagree on this point". >ote that integers and symbols are immutable, so we can think of two integers with the same value as being the same ob#ect. This is particularly useful for symbols, because it allows us to test for e2uality by comparing pointers.

Implementation
&et's declare some $ types to hold our data. There are many clever ways to store &I'% ob#ects efficiently, but for this implementation we will stick to a very simple scheme ?please excuse the pun@.
struct Atom { enum {

% type& union {

AtomType_Nil AtomType_!air AtomType_"ym#ol AtomType_Inte$er

struct !air 'pair& const c(ar 'sym#ol& lon$ inte$er& % value& %& struct !air { struct Atom atom)*+& %& type,e- struct Atom Atom&

, few macros will be handy7


.,e-ine car/p0 //p01value1pair-2atom)3+0 .,e-ine c,r/p0 //p01value1pair-2atom)4+0

.,e-ine nilp/atom0 //atom01type 55 AtomType_Nil0 static const Atom nil 5 { AtomType_Nil %& The <p< in nilp stands for <predicate<. Identifiers in

$ may not contain 2uestion marks. There is no need to restrict our &I'% implementation in that way, of course. Integers and !pointers to" strings can be copied around, but we need to allocate pairs on the heap.
Atom cons/Atom car_val { Atom p& Atom c,r_val0

p1type 5 AtomType_!air& p1value1pair 5 malloc/si6eo-/struct !air00& car/p0 5 car_val& c,r/p0 5 c,r_val& return p& % cons

is a function to allocate a pair on the heap and assign its two elements.

,t this point you will have noticed that using cons will leak memory the moment its return value is discarded. e will deal with that later. Af course, if you are using a garbagecollected language then the problem is already taken care of.

Testing
>ow we can start creating &I'% ob#ects. ,n integer7
Atom ma7e_int/lon$ 80 { Atom a& a1type 5 AtomType_Inte$er& a1value1inte$er 5 8& return a& %

,nd a symbol7
Atom ma7e_sym/const c(ar 's0 { Atom a& a1type 5 AtomType_"ym#ol& a1value1sym#ol 5 str,up/s0& return a& %

Textual representation
e will write a pair like this7
/a 1 #0 where a is

the car and # is the cdr.

=y using the cdr of a pair to reference another pair, we can create a chain7 9

/a 1 /# 1 /c 1 /, 1 NIL0000

>otice that the cdr of the last pair is NIL. This signifies the end of the chain, and we call this structure a list. To avoid having to write a large number of brackets, we will write the previous list like this7
/a # c ,0 /p 9 1 r0

8inally, if the cdr of the last pair in a list is not NIL, we will write this7 which is e2uivalent to
/p 1 /9 1 r00

This is called an improper list.

Implementation
%rinting an atom or list is simple.
voi, print_e8pr/Atom atom0 { s:itc( /atom1type0 { case AtomType_Nil; print-/<NIL<0& #rea7& case AtomType_!air; putc(ar/=/=0& print_e8pr/car/atom00& atom 5 c,r/atom0& :(ile />nilp/atom00 { i- /atom1type 55 AtomType_!air0 { putc(ar/= =0& print_e8pr/car/atom00& atom 5 c,r/atom0& % else { print-/< 1 <0& print_e8pr/atom0& #rea7& % % putc(ar/=0=0& #rea7& case AtomType_"ym#ol; print-/<?s< atom1value1sym#ol0& #rea7& case AtomType_Inte$er; print-/<?l,< atom1value1inte$er0& #rea7& % %

=y using recursion we can print aribtrarily complex data structures. !,ctually that's not true7 for a very deeply nested structure we will run out of stack space, and a self-referencing tree will never finish printing".

Testing
'ee what print_e8pr does with various atoms7 Atom
ma7e_int/@*0 ma7e_sym/<FOO<0 cons/ma7e_sym/<A<0 cons/ma7e_int/40 cons/ma7e_int/*0 cons/ma7e_int/C0 nil000

Output
@* FOO ma7e_sym/<B<00 /A 1 B0 /4 * C0

,ll this is pretty trivial.

e'll get on to some more interesting stuff in the next chapter.

One last thing


Remember we said that we would treat identical symbols as being the same ob#ect* e can enforce that by keeping track of all the symbols created, and returning the same atom if the same se2uence of characters is re2uested subse2uently. &anguages with a set or hashtable container make this easy, but we can use the &I'% data structures already implemented to store the symbols in a list7
static Atom sym_ta#le 5 { AtomType_Nil %& Atom ma7e_sym/const c(ar 's0 { Atom a p& p 5 sym_ta#le& :(ile />nilp/p00 { a 5 car/p0& i- /strcmp/a1value1sym#ol return a& p 5 c,r/p0& % a1type 5 AtomType_"ym#ol& a1value1sym#ol 5 str,up/s0& sym_ta#le 5 cons/a sym_ta#le0& return a& %

s0 55 30

>eat, huh* It's not particularly efficient, but it will do fine for now.

Parser
The next stage in our pro#ect is parsing7 taking a line of text from the user !or elsewhere", and creating the data ob#ects it represents. >aturally the user might type something which does not represent an ob#ect according to our definitions, in which case we must have some way to signal an error.

Errors
1ere is a definition of an Drror type7
type,e- enum { Drror_OE 5 3 Drror_"ynta8 % Drror&

If, like me, you learned to program in =,'I$ on microcomputers, you will be familiar with the dreaded "BNTAA DRROR. >ow is our chance to see things from the other side of the fence. .ost of our functions from now on will return an Drror to indicate whether and how something went wrong.

Lexer
I have no formal training in $', but as far as I understand it the idea is to split a string up into tokens, which are both <words< and <punctuation<, and discard any insignificant white space. 'o if the input is7
/-oo #ar0

Then the four tokens are7


/ -oo #ar 0

'o let's start by creating a lexer, which will return the start and end of the next token in a string.
int le8/const { const const const c(ar 'str const c(ar ''start const c(ar ''en,0

c(ar ':s 5 < FtFn<& c(ar ',elim 5 </0 FtFn<& c(ar 'pre-i8 5 </0<& :s0&

str G5 strspn/str

i- /str)3+ 55 =F3=0 { 'start 5 'en, 5 NULL& return Drror_"ynta8& % 'start 5 str& i- /strc(r/pre-i8 str)3+0 >5 NULL0 'en, 5 str G 4& else 'en, 5 str G strcspn/str ,elim0&

return Drror_OE&

If our lexer hits the end of the string without finding a token !ie, the remainder of the string is entirely white space", then it will return a syntax error and set the start and end to NULL.

Parser
>ow we can think about the parser itself. The entry point is rea,_e8pr, which will read a single !possibly complex" ob#ect and return the error status and a pointer to the remainder of the input.
int rea,_e8pr/const c(ar 'input const c(ar ''en, Atom 'result0&

e will first deal with the simple data7 integers, symbols and NIL. If you have a regex library available then this is easy, but it's not too bad in plain $ either.
int parse_simple/const c(ar 'start { c(ar '#u- 'p& const c(ar 'en, Atom 'result0

H' Is it an inte$erI 'H lon$ val 5 strtol/start Jp 430& i- /p 55 en,0 { result-2type 5 AtomType_Inte$er& result-2value1inte$er 5 val& return Drror_OE& % H' NIL or sym#ol 'H #u- 5 malloc/en, - start G 40& p 5 #u-& :(ile /start >5 en,0 'pGG 5 toupper/'start0 GGstart& 'p 5 =F3=& i- /strcmp/#u- <NIL<0 55 30 'result 5 nil& else 'result 5 ma7e_sym/#u-0& -ree/#u-0& return Drror_OE& %

>otice two things7 first, we are converting the input to upper case. This isn't strictly necessary 0 there's nothing wrong with having a case-sensitive lisp 0 but it is the traditional behaviour. 'econdly, NIL is a special case7 it's parsed directly as AtomType_Nil, rather than leaving it as a symbol. If you're familiar with the various dialects of &I'% then you will know that NIL is not necessarily the same as /0, the empty list. e could choose to treat NIL as a symbol which evaluates to itself, but for this pro#ect we will consider both representations to be exactly the same.

>ext up are lists !including improper lists and pairs". The simplified list syntax makes this a little complicated, so we'll stick it all in a helper function. Ance again recursion allows us to deal with nested lists.
int rea,_list/const c(ar 'start { Atom p& 'en, 5 start& p 5 'result 5 nil& -or /&&0 { const c(ar 'to7en& Atom item& Drror err& err 5 le8/'en, Jto7en i- /err0 return err& i- /to7en)3+ 55 =0=0 return Drror_OE& i- /to7en)3+ 55 =1= JJ 'en, - to7en 55 40 { H' Improper list 'H i- /nilp/p00 return Drror_"ynta8& err 5 rea,_e8pr/'en, i- /err0 return err& c,r/p0 5 item& H' Rea, t(e closin$ =0= 'H err 5 le8/'en, Jto7en en,0& i- />err JJ to7en)3+ >5 =0=0 err 5 Drror_"ynta8& return err& % err 5 rea,_e8pr/to7en i- /err0 return err& en, Jitem0& en, Jitem0& en,0& const c(ar ''en, Atom 'result0

i- /nilp/p00 { H' First item 'H 'result 5 cons/item nil0& p 5 'result& % else { c,r/p0 5 cons/item nil0& p 5 c,r/p0& % % %

I dislike writing infinite loops, but this is the clearest layout I have come up with so far. &et me know if you can write a better oneE

8inally we have rea,_e8pr itself, which is very simple now that we have done all of the hard work7
int rea,_e8pr/const c(ar 'input { const c(ar 'to7en& Drror err& err 5 le8/input Jto7en i- /err0 return err& const c(ar ''en, Atom 'result0

en,0&

% 0

i- /to7en)3+ 55 =/=0 return rea,_list/'en, en, result0& else i- /to7en)3+ 55 =0=0 return Drror_"ynta8& else return parse_simple/to7en 'en, result0&

The check for a closing bracket will catch invalid forms such as and
/A 10

Testing
If we use the parser to create a simple read-print loop, then the we can type representations of ob#ects on the console and check that they are parsed correctly.
int main/int ar$c c(ar ''ar$v0 { c(ar 'input& :(ile //input 5 rea,line/<2 <00 >5 NULL0 { const c(ar 'p 5 input& Drror err& Atom e8pr& err 5 rea,_e8pr/p Jp Je8pr0&

s:itc( /err0 { case Drror_OE; print_e8pr/e8pr0& putc(ar/=Fn=0& #rea7& case Drror_"ynta8; puts/<"ynta8 error<0& #rea7& % -ree/input0& % % return 3&

This version uses the readline library, which shows a prompt and reads a line of text from the console. It supports editing beyond what a dumb terminal can provide, but a simple wrapper around -$ets/0 will do #ust as well.
2 42 @* 2 (foo bar) /FOO BAR0 2 (s (t . u) v . (w . nil)) /" /T 1 U0 K W0 2 () NIL

&ooks goodE Remember that /0 is exactly the same as NIL, and that /A B0 is #ust another way of writing /A 1 /B 1 NIL00.

Expressions, Environment and Evaluation


Expressions
&I'% is all about expressions. ,n expression can be a literal, an identifier, or a list consisting of an operator and one or more arguments. , literal is an ob#ect with an intrinsic value. In our system, that's either an integer or NIL !if you consider <nothing< to be a value". ,n identifier is a name for an ob#ect. 'ymbols can be identifiers. Gverything else is a list of the form /operator ar$ument1110 where ar$ument111 means 5ero or more arguments.

Environment
To associate identifiers with ob#ects we need an environment. This is a collection of bindings, each of which consists of an identifier and its corresponding value. 8or example7 Bindings Identifier Value
FOO BAR BAL @* NIL /A B L0

>ote that the identifiers are all symbols, but the values can be any ob#ect within our system of data 0 the value for BAL is a list containing three symbols. ,n environment can also have a parent environment. If there is no binding for a particular identifier in the environment, we can check the parent, the parent's parent and so on. In this

6;

way we can create a tree of environments which share bindings with their ancestors unless explicit replacements exist.

Implementation
There is a convenient way of representing environments using our &I'% data types7
/parent /identifier 1 value0...0

'o the environment above !assuming it has no parent" is7


/NIL /FOO 1 @*0 /BAR 1 NIL0 /BAL 1 /A B L000

1ere is a function to create an empty environment with a specified parent !which could be NIL"7
Atom env_create/Atom parent0 { return cons/parent nil0& %

>ext we have two functions to retrieve and create bindings in an environment.


int env_$et/Atom env Atom sym#ol { Atom parent 5 car/env0& Atom #s 5 c,r/env0& Atom 'result0

:(ile />nilp/#s00 { Atom # 5 car/#s0& i- /car/#01value1sym#ol 55 sym#ol1value1sym#ol0 { 'result 5 c,r/#0& return Drror_OE& % #s 5 c,r/#s0& % i- /nilp/parent00 return Drror_Un#oun,& return env_$et/parent % sym#ol result0&

3isallowing duplicate symbols means that we don't have to call strcmp here, which should mean that this lookup function is not too slow.
int env_set/Atom env Atom sym#ol { Atom #s 5 c,r/env0& Atom # 5 nil& Atom value0

:(ile />nilp/#s00 { # 5 car/#s0& i- /car/#01value1sym#ol 55 sym#ol1value1sym#ol0 { c,r/#0 5 value& return Drror_OE& % #s 5 c,r/#s0& % # 5 cons/sym#ol value0&

66

c,r/env0 5 cons/# % return Drror_OE&

c,r/env00&

Anly env_$et recursively checks the parent environments. bindings of parents.

e don't want to modify the

Evaluation
>ow that we have expressions, we can start to evaluate them. Gvalution is a process which takes an expression and an environment, and produces a value !the result". &et's specify the rules.

, literal will evaluate to itself. The environment allows us to determine a value for an identifier. ,ttempting to evaluate an identifier for which no binding exists is an error. , list expression with one of the following operators is called a special form7
MUOTD

The result of evaluating /MUOTD DA!R0 is DA!R, which is returned without evaluating.
DDFIND

Gvaluating /DDFIND "BNBOL DA!R0 creates a binding for "BNBOL !or modifies an existing binding" in the evaluation environment. "BNBOL is bound to the value obtained by evaluating DA!R. The final result is "BNBOL. ,nything else, including list expressions with any other operator, is invalid.

Implementation
e will need to check whether an expression is a proper list.
int listp/Atom e8pr0 { :(ile />nilp/e8pr00 { i- /e8pr1type >5 AtomType_!air0 return 3& e8pr 5 c,r/e8pr0& % return 4& %

The Drror enumeration needs a few more entires7


Drror_Un#oun, Drror_Ar$s Drror_Type

,ttempted to evaluate a symbol for which no binding exists , list expression was shorter or longer than expected ,n ob#ect in an expression was of a different type than expected

The function to perform evaluation is now a straightforward translation of the rules into $.
int eval_e8pr/Atom e8pr { Atom env Atom 'result0

6(

Atom op ar$s& Drror err& i- /e8pr1type 55 AtomType_"ym#ol0 { return env_$et/env e8pr result0& % else i- /e8pr1type >5 AtomType_!air0 { 'result 5 e8pr& return Drror_OE& % i- />listp/e8pr00 return Drror_"ynta8& op 5 car/e8pr0& ar$s 5 c,r/e8pr0& i- /op1type 55 AtomType_"ym#ol0 { i- /strcmp/op1value1sym#ol <MUOTD<0 55 30 { i- /nilp/ar$s0 OO >nilp/c,r/ar$s000 return Drror_Ar$s& 'result 5 car/ar$s0& return Drror_OE& % else i- /strcmp/op1value1sym#ol Atom sym val& nilp/c,r/c,r/ar$s0000

<DDFIND<0 55 30 {

i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO > return Drror_Ar$s& sym 5 car/ar$s0& i- /sym1type >5 AtomType_"ym#ol0 return Drror_Type& err 5 eval_e8pr/car/c,r/ar$s00 i- /err0 return err& 'result 5 sym& return env_set/env sym val0& env Jval0&

% %

return Drror_"ynta8&

Testing
Gxtending the read-print loop from the previous chapter, we now have a read-eval-print loop !RG%&". This is the core of our &I'% interpreter.
int main/int ar$c c(ar ''ar$v0 { Atom env& c(ar 'input& env 5 env_create/nil0& :(ile //input 5 rea,line/<2 <00 >5 NULL0 { const c(ar 'p 5 input&

69

Drror err& Atom e8pr result& err 5 rea,_e8pr/p Jp Je8pr0& env Jresult0&

i- />err0 err 5 eval_e8pr/e8pr

s:itc( /err0 { case Drror_OE; print_e8pr/result0& putc(ar/=Fn=0& #rea7& case Drror_"ynta8; puts/<"ynta8 error<0& #rea7& case Drror_Un#oun,; puts/<"ym#ol not #oun,<0& #rea7& case Drror_Ar$s; puts/<Wron$ num#er o- ar$uments<0& #rea7& case Drror_Type; puts/<Wron$ type<0& #rea7& % -ree/input0& % % return 3&

&et's see what it can do.


2 foo "ym#ol not #oun, 2 (quote foo) FOO 2 (define foo 42) FOO 2 foo @* 2 (define foo (quote bar)) FOO 2 foo BAR

e can now interactively assign names to ob#ects.

Built in functions
6)

'o far in our implementation, we have made use of the functions car, c,r and cons to construct and access &I'% data. >ow, we will make the same functionality available within the interpreted environment. e shall extend the list expression syntax to add some new operators7
/PAR DA!R0

/PDR DA!R0

Gvaluates DA!R and returns the car of the result. It is an error if DA!R does not evaluate to a pair or NIL. Gvaluates DA!R and returns the cdr of the result. It is an error if DA!R does not evaluate to a pair or NIL.

/PON" A B0

Gvaluates both arguments A and B, and returns a newly constructed pair containing the results. In the definitions above we allow taking the car and cdr of NIL, unlike our $ versions. 'ome algorithms are simpler to express if the car and cdr of NIL are defined to be NIL. e could choose to implement these by adding more special cases to eval_e8pr, #ust like we did with MUOTD and DDFIND. 1owever, we will want to add more operators in the future 0 and adding each one to eval_e8pr would cause the function to get very long. The alternative is to introduce the concept of functions.

Functions
, function is a recipe for converting arguments into a value. If eval_e8pr encounters a list expression with a function as the operator, all it has to do is follow the recipe to come up with a value to use as the result of the expression. Ane way to implement these recipes is to create $ functions which can be called from eval_e8pr. e will call these built-in or primitive functions. &et's see how to extend our &I'% interpreter to accommodate these.

A new type of atom


will call built-in functions through a $ function pointer, so they must all have the same prototype7
eval_e8pr type,e- int /'Builtin0/struct Atom ar$s struct Atom 'result0&

In order to appear in expressions, we need a new kind of atom to represent them.


struct Atom { enum { 1 1 1 AtomType_Builtin % type& union {

6B

%&

1 1 1 Builtin #uiltin& % value&

'ections of code which we wrote previously are abbreviated as <1 1 1<. 8or completeness, print_e8pr needs to know how to display the new atom7
voi, print_e8pr/Atom atom0 { s:itc( /atom1type0 { 1 1 1 case AtomType_Builtin; print-/<.QBUILTIN;?p2< #rea7& % %

atom1value1#uiltin0&

,nd finally a helper function to create atoms of the new type7


Atom ma7e_#uiltin/Builtin -n0 { Atom a& a1type 5 AtomType_Builtin& a1value1#uiltin 5 -n& return a& %

Extending the evaluator


e will need to create a shallow copy of the argument list.
Atom copy_list/Atom list0 { Atom a p& i- /nilp/list00 return nil& a 5 cons/car/list0 p 5 a& list 5 c,r/list0& nil0&

:(ile />nilp/list00 { c,r/p0 5 cons/car/list0 p 5 c,r/p0& list 5 c,r/list0& % return a& %

nil0&

6C

simply calls the builtin function with a supplied list of arguments. e will extend this function later when we want to deal with other kinds of evaluation recipe.
apply int apply/Atom -n Atom ar$s Atom 'result0 { i- /-n1type 55 AtomType_Builtin0 return /'-n1value1#uiltin0/ar$s return Drror_Type& %

result0&

If a list expression is not one of the special forms we defined previously, then we will assume that the operator is something which evaluates to a function. e will also evaluate each of the arguments, and use apply to call that function with the list of results.
int eval_e8pr/Atom e8pr Atom env { Atom op ar$s p& Drror err& 1 1 1 i- /op1type 55 AtomType_"ym#ol0 { 1 1 1 % H' Dvaluate operator 'H err 5 eval_e8pr/op env i- /err0 return err& Jop0& Atom 'result0

H' Dvaulate ar$uments 'H ar$s 5 copy_list/ar$s0& p 5 ar$s& :(ile />nilp/p00 { err 5 eval_e8pr/car/p0 i- /err0 return err& p 5 c,r/p0& % % return apply/op ar$s result0&

env

Jcar/p00&

The argument list is copied before being overwritten with the results of evaluating the arguments. e don't want to overwrite the original argument list in case we need to use the form again in the future.

6D

Initial environment
%reviously we created an empty environment for the read-eval-print loop. The user has no way of creating atoms which represent builtin functions, so we populate the initial environment with bindings for our builtins. The functions themselves7
int #uiltin_car/Atom ar$s Atom 'result0 { i- /nilp/ar$s0 OO >nilp/c,r/ar$s000 return Drror_Ar$s& i- /nilp/car/ar$s000 'result 5 nil& else i- /car/ar$s01type >5 AtomType_!air0 return Drror_Type& else 'result 5 car/car/ar$s00& % return Drror_OE&

,lmost all of the function is code to deal with errors and type checkingE $reating functions in this way is pretty tedious.
int #uiltin_c,r/Atom ar$s Atom 'result0 { i- /nilp/ar$s0 OO >nilp/c,r/ar$s000 return Drror_Ar$s& i- /nilp/car/ar$s000 'result 5 nil& else i- /car/ar$s01type >5 AtomType_!air0 return Drror_Type& else 'result 5 c,r/car/ar$s00& % return Drror_OE&

#uiltin_c,r

is almost identical to #uiltin_car.

int #uiltin_cons/Atom ar$s Atom 'result0 { i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO >nilp/c,r/c,r/ar$s0000 return Drror_Ar$s& 'result 5 cons/car/ar$s0 return Drror_OE& % car/c,r/ar$s000&

ith these defined, we can at last use env_set to create the bindings.
int main/int ar$c { c(ar ''ar$v0

6F

Atom env& c(ar 'input& env 5 env_create/nil0& H' "et up t(e initial environment 'H env_set/env ma7e_sym/<PAR<0 ma7e_#uiltin/#uiltin_car00& env_set/env ma7e_sym/<PDR<0 ma7e_#uiltin/#uiltin_c,r00& env_set/env ma7e_sym/<PON"<0 ma7e_#uiltin/#uiltin_cons00& :(ile //input 5 rea,line/<2 <00 >5 NULL0 { 1 1 1 % % return 3&

Testing
2 (define foo 1) FOO 2 (define bar 2) BAR 2 (cons foo bar) /4 1 *0 2 (define baz (quote (a b c))) BAL 2 (car baz) A 2 (cdr baz) /B P0

>otice that /PON" FOO BAR0 is not the same as /MUOTD /FOO 1 BAR00. In the former expression, the arguments are evaluated and a new pair is created.

Arit!metic
'o far all we've been able to do is create and name ob#ects. 'ome of those ob#ects have been numbers 0 naturally we would like to do calculations with those numbers. In the last chapter we saw how to create built-in functions to tell eval_e8pr how to process arguments into a return value. e will now create four more builtins to perform the basic arithmetic operations. Expression
/G A B0 /- A B0 /' A B0 /H A B0

"esult The sum of A and B The difference of A and B The product of A and B The 2uotient of A and B

6:

In the definitions above, when we write <the sum of A and B<, what we really mean is <the sum of the values obtained by evaluating A and B<. Remember that eval_e8pr will evaluate all the arguments to a functions by default4 this is usually what we want to happen, so from now on we will not explicitly state this where the intent is obvious.

Implementation
Ance again almost all of our function consists of checking that the correct arguments were supplied. 8inally the result is constructed by the call to ma7e_int.
int #uiltin_a,,/Atom ar$s { Atom a #& Atom 'result0

i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO >nilp/c,r/c,r/ar$s0000 return Drror_Ar$s& a 5 car/ar$s0& # 5 car/c,r/ar$s00& i- /a1type >5 AtomType_Inte$er OO #1type >5 AtomType_Inte$er0 return Drror_Type& 'result 5 ma7e_int/a1value1inte$er G #1value1inte$er0& return Drror_OE& %

The other three functions differ by only one character, so I will omit them here. 8inally we need to create bindings for our new functions in the initial environment7
env_set/env env_set/env env_set/env env_set/env ma7e_sym/<G<0 ma7e_sym/<-<0 ma7e_sym/<'<0 ma7e_sym/<H<0 ma7e_#uiltin/#uiltin_a,,00& ma7e_#uiltin/#uiltin_su#tract00& ma7e_#uiltin/#uiltin_multiply00& ma7e_#uiltin/#uiltin_,ivi,e00&

Testing
e now have our very own &I'%-style calculator.
2 (+ 1 1) * 2 (define x (* 6 9)) A 2 x R@ 2 (- x 12) @*

In the last expression above, note that A is a symbol, not an integer. e have to evaluate the arguments so that #uiltin_su#tract can operate on the integer value bound to A and not the symbol A itself. 'imilarly the value bound to A is the integer result of evaluating the expression /' S T0.

(;

Lam#da expressions and closures


This is where things start to get interesting. e will now implement support for lambda expressions, a way to build functions dynamically out of the &I'% expressions we can already deal with. , lambda expression is a list expression with a particular syntax7
/LANBDA /arg...0 expr...0

The result of evaluating a LANBDA expression is a new kind of ob#ect which we will call a closure. , closure can be used in list expressions in the same way as a built-in function. In this case the arguments will be bound to the symbols listed as ar$111 in the lambda expression. The body of the function consists of the expressions e8pr111, which will be evaluated in turn. The result of evaluating the final expression is the result of applying the arguments to the closure. That's a pretty dense definition, so here is an example of how we would like to use lambda expressions7
/DDFIND "MUARD /LANBDA /A0 /' A A000 "MUARD should now be a function of one argument A, A A0. Thus evaluating /"MUARD C0 should return T.

which returns the result of evaluating /'

Implementation
e will represent the closure using a list7
/env /arg...0 expr...0 env is the environment in which

the closure was defined. This is needed to allow the lambda function to use bindings without having to pass them as arguments. 8or example, recall that PAR is bound in the initial environment to our primitive #uiltin_car function. The first task is to add a new constant for the type field of our Atom structure7
struct Atom { enum {

% type& union {

1 1 1 AtomType_Plosure

1 1 1 % value& %&

'ince the closure is #ust a regular list, there is no need to add anything to value.

(6

&ike our other atom types, we will create a utility function to initiali5e them. ma7e_closure, unlike the others, performs some validation of the arguments and so needs to return an error code.
int ma7e_closure/Atom env { Atom p& Atom ar$s Atom #o,y Atom 'result0

i- />listp/ar$s0 OO >listp/#o,y00 return Drror_"ynta8& H' P(ec7 ar$ument names are all sym#ols 'H p 5 ar$s& :(ile />nilp/p00 { i- /car/p01type >5 AtomType_"ym#ol0 return Drror_Type& p 5 c,r/p0& % 'result 5 cons/env cons/ar$s #o,y00& result-2type 5 AtomType_Plosure& % return Drror_OE&

>ext up is another special case in eval to create a closure whenever a lambda expression is encountered.
int eval_e8pr/Atom e8pr Atom env Atom 'result0 { 1 1 1 i- /op1type 55 AtomType_"ym#ol0 { i- /strcmp/op1value1sym#ol <MUOTD<0 55 30 { 1 1 1 % else i- /strcmp/op1value1sym#ol <LANBDA<0 55 30 { i- /nilp/ar$s0 OO nilp/c,r/ar$s000 return Drror_Ar$s& result0& % 1 1 1 % return ma7e_closure/env car/ar$s0 c,r/ar$s0

The body of our "MUARD example above is expressed in terms of A. In order to evaluate the body, we need to create a new environment with A bound to the value of the argument7
/closure-env /A 1 C00

where the parent environment closure-env is the environment that was stored in the closure.

((

8inally we extend apply to create the new environment and call eval for each expression in the body.
int apply/Atom -n Atom ar$s Atom 'result0 { Atom env ar$_names #o,y& i- /-n1type 55 AtomType_Builtin0 return /'-n1value1#uiltin0/ar$s else i- /-n1type >5 AtomType_Plosure0 return Drror_Type& env 5 env_create/car/-n00& ar$_names 5 car/c,r/-n00& #o,y 5 c,r/c,r/-n00& H' Bin, t(e ar$uments 'H :(ile />nilp/ar$_names00 { i- /nilp/ar$s00 return Drror_Ar$s& env_set/env car/ar$_names0 ar$_names 5 c,r/ar$_names0& ar$s 5 c,r/ar$s0& % i- />nilp/ar$s00 return Drror_Ar$s& result0&

car/ar$s00&

H' Dvaluate t(e #o,y 'H :(ile />nilp/#o,y00 { Drror err 5 eval_e8pr/car/#o,y0 i- /err0 return err& #o,y 5 c,r/#o,y0& % % return Drror_OE&

env

result0&

Testing
&et's check that our "MUARD function works as intended.
2 (define square (la bda (x) (* x x))) "MUARD 2 (square !) T 2 (square 4) 4S

Af course, lambda expressions do not have to be bound to a symbol 0 we can create anonymous functions.
2 ((la bda (x) (- x 2)) ") R

8ans of functional programming will be pleased to see that we can now do this kind of thing7
2 (define a#e-adder (la bda (x) (la bda ($) (+ x $))))

(9

NAED-ADDDR 2 (define add-two ( a#e-adder 2)) ADD-TWO 2 (add-two %) U

3o you know where the value <(< is stored*

Booleans and s!ort circuit evaluation


Booleans
!,pologies if you are a logician and I've got this all wrong..." , boolean value is one of two classes of values which are called true and false. If we wish to interpret a value as a boolean, we consider it to be true if it is in the class of true values, and false otherwise.

Short-circuit evalutaion
'o far every expression we pass to eval is evaluated. ith the exception of special forms such as DDFIND and LANBDA, which store away expressions to be evaluated later, eval must walk the whole tree before returning a result. In this chapter we will define yet another special form IF, which will cause eval to choose which of two possible expressions to evaluate, and discard the other. The syntax is as follows7
/IF test true-expr false-expr0 where test, true-e8pr and -alse-e8pr are arbitrary expressions. If the result of evaluating test is considered to be true, then the result of the IF-expression is the result of evaluating true-e8pr, otherwise it is the result of evaluating -alse-e8pr. Anly one of true-e8pr and -alse-e8pr is evaluated4 the other expression is ignored.

=ut what kind of value is true* In our environment we will define NIL to be false. ,ny other value is true. 1ere is the code to handle I8-expressions.
int eval_e8pr/Atom e8pr Atom env Atom 'result0 { 1 1 1 i- /op1type 55 AtomType_"ym#ol0 { i- /strcmp/op1value1sym#ol <MUOTD<0 55 30 { 1 1 1 % else i- /strcmp/op1value1sym#ol <IF<0 55 30 { Atom con, val&

()

i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO nilp/c,r/c,r/ar$s000 OO >nilp/c,r/c,r/c,r/ar$s00000 return Drror_Ar$s& err 5 eval_e8pr/car/ar$s0 i- /err0 return err& car/c,r/ar$s00& % % 1 1 1 % env Jcon,0&

val 5 nilp/con,0 I car/c,r/c,r/ar$s000 ; return eval_e8pr/val env result0&

The argument check is getting a little unwieldy. , couple of alternatives are to modify car and c,r to return NIL if the argument is not a pair and forego the syntax check, or to create a helper function to count the list length. It won't get any worse than this, though 0 so let's not waste time on it. Traditionally &I'% functions return the symbol T if they need to return a boolean value and there is no obvious ob#ect available. T is bound to itself, so evaluating it returns the symbol T again. , symbol is not NIL, and so is true. ,dd a binding for T to the initial environment7
env_set/env ma7e_sym/<T<0 ma7e_sym/<T<00&

Remember that ma7e_sym will return the same symbol ob#ect if it is called multiple times with identical strings.

Testing
2 (if t ! 4) C 2 (if nil ! 4) @ 2 (if & t nil) T

Hnlike $, 5ero is true, not false.

Predicates
hile we could stop here, it would be useful to make some tests other than <is it NIL<. This is where predicates come in. , predicate is a function which returns a trueIfalse value according to some condition. e will define two built-in predicates, <5< which tests for numerical e2uality, and <Q< which tests if one number is less than another.

(B

The functions are similar to our other numerical built-ins.


int #uiltin_nume9/Atom ar$s { Atom a #& Atom 'result0

i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO >nilp/c,r/c,r/ar$s0000 return Drror_Ar$s& a 5 car/ar$s0& # 5 car/c,r/ar$s00& i- /a1type >5 AtomType_Inte$er OO #1type >5 AtomType_Inte$er0 return Drror_Type& 'result 5 /a1value1inte$er 55 #1value1inte$er0 I ma7e_sym/<T<0 ; nil& % return Drror_OE&

#uiltin_less

follows the same pattern and is not shown here.

8inally we must add them to the initial environment.


env_set/env env_set/env ma7e_sym/<5<0 ma7e_sym/<Q<0 ma7e_#uiltin/#uiltin_nume900& ma7e_#uiltin/#uiltin_less00&

Testing
2 (' ! !) T 2 (( 11 4) NIL

=arring memory and stack limitations, our &I'% environment is now Turing-completeE If you have been entering the code as we go along, you can confirm that we have implemented the core of a usable programming language in well under 6,;;; lines of $ code. , classic demonstration7
2 (define fact (la bda (x) (if (' x &) 1 (* x (fact (- x 1)))))) FAPT 2 (fact 1&) CS*VV33

I have cheated a little here7 the RG%& does not allow the user to enter multi-line expressions, so you must enter the definition for -act all on one line. There is more to do yet, though. &I'% has other features which make it possible to express some really interesting stuff, and there are a few loose ends to tidy up as well.

(C

S$ntactic sugar
e will define some additional syntax to facilitate entry of some common expressions. Recall that we already allow the user to enter
/A B P0

instead of

/A 1 /B 1 /P 1 NIL000

uoting
In order to include a literal symbol or list in an expression, we need to use the MUOTD operator. ,s a shortcut, we will define
=DA!R

to be e2uivalent to
/MUOTD DA!R0

'o for example the following forms are e2uivalent7 A##reviation %anonical form Evaluates to
=FOO =/G 4 *0 =/A 1 B0 /MUOTD FOO0 FOO /MUOTD /G 4 *00 /G 4 *0 /MUOTD /A 1 B00 /A 1 B0

The lexer needs to know that the 2uote mark is a prefix !i.e., it can appear immediately before another token but is not necessarily a delimeter".
int le8/const { const const const 1 1 1 % c(ar 'str const c(ar ''start const c(ar ''en,0

c(ar ':s 5 < FtFn<& c(ar ',elim 5 </0 FtFn<& c(ar 'pre-i8 5 </0)*<&

,lso rea,_e8pr must convert it to the correct list expresssion.


int rea,_e8pr/const c(ar 'input { const c(ar 'to7en& Drror err& err 5 le8/input Jto7en i- /err0 return err& const c(ar ''en, Atom 'result0

en,0&

i- /to7en)3+ 55 =/=0 { return rea,_list/'en, en, % else i- /to7en)3+ 55 =0=0 {

result0&

(D

return Drror_"ynta8& + else if (to#en,&- '' *)**) . *result ' cons( a#e/s$ (0123450)6 cons(nil6 nil))7 return read/ex8r(*end6 end6 9car(cdr(*result)))7 % else { return parse_simple/to7en 'en, result0& %

Testing
2 (define x *(a b c)) A 2 x /A B P0 2 *x A 2 (define foo *bar) FOO 2 foo BAR 2 **() /MUOTD NIL0

!unction definitions
It is cumbersome to have to type a lambda expression every time we wish to define a function, so we will modify the DDFIND operator to avoid this.
/DDFIND /name args...0 body...0

is e2uivalent to 1ere's how7

/DDFIND name /LANBDA /args...0 body...00

int eval_e8pr/Atom e8pr Atom env Atom 'result0 { 1 1 1 i- /op1type 55 AtomType_"ym#ol0 { i- /strcmp/op1value1sym#ol <MUOTD<0 55 30 { 1 1 1 % else i- /strcmp/op1value1sym#ol <DDFIND<0 55 30 { Atom sym val& i- /nilp/ar$s0 OO nilp/c,r/ar$s000 return Drror_Ar$s& sym 5 car/ar$s0& i- /sym1type 55 AtomType_!air0 { err 5 ma7e_closure/env c,r/sym0 sym 5 car/sym0& i- /sym1type >5 AtomType_"ym#ol0 return Drror_Type& % else i- /sym1type 55 AtomType_"ym#ol0 {

Jval0&

c,r/ar$s0

(F

i- />nilp/c,r/c,r/ar$s0000 return Drror_Ar$s& err 5 eval_e8pr/car/c,r/ar$s00 % else { return Drror_Type& % i- /err0 return err&

env

Jval0&

% 1 1 1

'result 5 sym& return env_set/env sym val0& % else i- /strcmp/op1value1sym#ol <LANBDA<0 55 30 { 1 1 1 %

Testing
2 (define (square x) (* x x)) "MUARD 2 (square !) T

'weetE

Variadic functions
Hp till now all functions have had a specified number of named arguments. e will now introduce a syntax for defining variadic functions, which may take a fixed number of named arguments and a variable number of additional arguments which are collected into a named list. The argument declarations for variadic functions are improper lists7 & s$ntax ' args body...0 () args body...0
/LANBDA /ar$4 ar$* ar$C0 /LANBDA /ar$4 ar$* 1 rest0

%om#ined :5;<=5
/DDFIND /name ar$4 ar$* ar$C0 body...0 /DDFIND /name ar$4 ar$* 1 rest0 body...0 /DDFIND /name ar$4 1 rest0 body...0 /DDFIND /name 1 ar$s0 body...0

(* args /LANBDA /ar$4 1 rest0 body...0 (+ args /LANBDA ar$s body...0

In the definitions above, the parameters are bound as follows7


(f 1 2 !)

(:

Definition
/DDFIND /- a # c0 body...0 /DDFIND /- a 1 #0 body...0 /DDFIND /- 1 a0 body...0

Value of a Value of b Value of c


4 4 /4 * C0 * * /* C0 C /C0

/DDFIND /- a # 1 c0 body...0 4

Implementation
,ll that is re2uired is a small modification to ma7e_closure to accept the declaration7
int ma7e_closure/Atom env { Atom p& Atom ar$s Atom #o,y Atom 'result0

if (>list8(bod$)) return Drror_"ynta8& H' P(ec7 ar$ument names are all sym#ols 'H p 5 ar$s& :(ile />nilp/p00 { if (8.t$8e '' ?to 4$8e/@$ bol) brea#7 else if (8.t$8e >' ?to 4$8e/Aair BB car(8).t$8e >' ?to 4$8e/@$ bol) return 5rror/4$8e7 p 5 c,r/p0& % 'result 5 cons/env cons/ar$s #o,y00& result-2type 5 AtomType_Plosure& return Drror_OE& %

,nd another to apply to bind the additional arguments into a list7


int apply/Atom -n Atom ar$s Atom 'result0 { 1 1 1 H' Bin, t(e ar$uments 'H :(ile />nilp/ar$_names00 { if (arC/na es.t$8e '' ?to 4$8e/@$ bol) . env/set(env6 arC/na es6 arCs)7 arCs ' nil7 brea#7 + i- /nilp/ar$s00 return Drror_Ar$s& env_set/env car/ar$_names0 ar$_names 5 c,r/ar$_names0& ar$s 5 c,r/ar$s0&

car/ar$s00&

% i- />nilp/ar$s00

9;

1 1 1 %

return Drror_Ar$s&

Testing
, boring example7
2 ((la bda (a . b) a) 1 2 !) 4 2 ((la bda (a . b) b) 1 2 !) /* C0 2 ((la bda arCs arCs) 1 2 !) /4 * C0

e can also create a variadic adder7


2 (define (su -list xs) (if xs (+ (car xs) (su -list (cdr xs))) &)) "UN-LI"T 2 (su -list *(1 2 !)) S 2 (define (add . xs) (su -list xs)) ADD 2 (add 1 2 !) S 2 (add 1 (- 4 2) (D 9 !)) S

'ince you can always pass a list to a regular function, this is really #ust another kind of syntactic sugar.

,acros
.acros allow you to create new special forms at runtime. Hnlike a function, the arguments to a macro are not evaluated. The result of evaluating the body of the macro is then itself evaluated. >ote7 these are !essentially" Common LISP macros. 'cheme has a different macro system, which avoids problems with identifiers introduced by the macro, but is more complex. e will define macros using the following syntax7
/DDFNAPRO /name arg...0 body...0 This matches our DDFIND syntax for functions,

but is slightly different from the form used in

$ommon &I'%.

96

Example
Take the macro IWNORD defined by7
/DDFNAPRO /IWNORD A0 /PON" =MUOTD /PON" A NIL000

If we then evaluate the expression


/IWNORD FOO0 where FOO need

not be bound, the body of IWNORD will first be evaluated with the argument A bound to the unevaluated symbol FOO. The result of evaluating the nested PON" expressions within this environment is7
/MUOTD 1 /FOO 1 NIL00

which is of course e2uivalent to7


/MUOTD FOO0 FOO

8inally, evaluating this value !which is the result of evaluating the macro body" gives us7

Implementation
e will define a new type of atom7
AtomType_Nacro

the value of which is the same as AtomType_Plosure. ,nd now simply teach eval_e8pr about our new macro type.
int eval_e8pr/Atom e8pr Atom env Atom 'result0 { 1 1 1 i- /op1type 55 AtomType_"ym#ol0 { i- /strcmp/op1value1sym#ol <MUOTD<0 55 30 { 1 1 1 % else i- /strcmp/op1value1sym#ol <DDFNAPRO<0 55 30 { Atom name macro& Drror err& i- /nilp/ar$s0 OO nilp/c,r/ar$s000 return Drror_Ar$s& i- /car/ar$s01type >5 AtomType_!air0 return Drror_"ynta8& name 5 car/car/ar$s00& i- /name1type >5 AtomType_"ym#ol0 return Drror_Type& err 5 ma7e_closure/env c,r/car/ar$s00 c,r/ar$s0 Jmacro0& i- /err0

9(

return err& macro1type 5 AtomType_Nacro& 'result 5 name& return env_set/env name macro0&

% %

H' Dvaluate operator 'H 1 1 1 H' Is it a macroI 'H i- /op1type 55 AtomType_Nacro0 { Atom e8pansion& op1type 5 AtomType_Plosure& err 5 apply/op ar$s Je8pansion0& i- /err0 return err& return eval_e8pr/e8pansion env result0& % H' Dvaulate ar$uments 'H 1 1 1

Testing
2 (def acro (iCnore x) (cons *quote (cons x nil))) IWNORD 2 (iCnore foo) FOO 2 foo "ym#ol not #oun,

e will use macros in the future to define some new special forms.

Li#rar$
e will now create a small library of useful functions for our &I'% system. Rather than creating new builtins for each one, let's take advantage of the fact that much of the &I'% standard library can be implemented in &I'% itself in terms of lower-level fuctions. 8irst we need a function to read the library definitions from disk.
c(ar 'slurp/const c(ar 'pat(0 { FILD '-ile& c(ar '#u-& lon$ len& -ile 5 -open/pat( <r<0& i- />-ile0 return NULL& -see7/-ile 3 "DDE_DND0&

99

len 5 -tell/-ile0& -see7/-ile 3 "DDE_"DT0& #u- 5 malloc/len0& i- />#u-0 return NULL& -rea,/#u- 4 len -close/-ile0& % return #u-& -ile0&

,nd a routine, similar to our RG%& in main, to process the definitions. =ecause we read the whole file in one go, there is no problem with splitting definitions over several lines.
voi, loa,_-ile/Atom env { c(ar 'te8t& const c(ar 'pat(0

print-/<Rea,in$ ?s111Fn< pat(0& te8t 5 slurp/pat(0& i- /te8t0 { const c(ar 'p 5 te8t& Atom e8pr& :(ile /rea,_e8pr/p Jp Je8pr0 55 Drror_OE0 { Atom result& Drror err 5 eval_e8pr/e8pr env Jresult0& i- /err0 { print-/<Drror in e8pression;FnFt<0& print_e8pr/e8pr0& putc(ar/=Fn=0& % else { print_e8pr/result0& putc(ar/=Fn=0& % % -ree/te8t0& %

8inally read in the library after setting up the builtins.


int main/int ar$c { 1 1 1 c(ar ''ar$v0

H' "et up t(e initial environment 'H 1 1 1 load/file(env6 0librar$.lis80)7 H' Nain loop 'H 1 1

9)

Testing
$reate li#rary1lisp with the following definition7
/,e-ine /a#s 80 /i- /Q 8 30 /- 80 800

,nd run the interpreter7


Rea,in$ li#rary1lisp111 AB" 2 (abs -2) * The AB" function will now be available fold -ol,l

in every session without having to define it each time.

and -ol,r allow us to easily construct functions which combine elements of a list.

/,e-ine /-ol,l proc init list0 /i- list /-ol,l proc /proc init /car list00 /c,r list00 init00 /,e-ine /-ol,r proc init list0 /i- list /proc /car list0 /-ol,r proc init /c,r list000 init00

'ee the internet for more details.


/,e-ine /list 1 items0 /-ol,r cons nil items00 /,e-ine /reverse list0 /-ol,l /lam#,a /a 80 /cons 8 a00 nil list00

constructs a new list containing its arguments. reverse creates a copy of a list with the items in reverse order.
list

The recursive definition of LI"T re2uires A!n" stack space - a serious implementation would most likely use a more efficient version.

Testing
2 (list (+ ! %) *foo) /V FOO0 2 (reverse *(1 2 !)) /C * 40

'ee how much easier this was than implementing the functions as builtins. 9B

"ore #uiltins
'ome primitive functions re2uire access to the internals of the system.
a88l$

The apply function7


/A!!LB fn arg-list0 calls -n with the arguments bound to the values in the list ar$-list. int #uiltin_apply/Atom ar$s Atom 'result0 { Atom -n& i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO >nilp/c,r/c,r/ar$s0000 return Drror_Ar$s& -n 5 car/ar$s0& ar$s 5 car/c,r/ar$s00& i- />listp/ar$s00 return Drror_"ynta8& return apply/-n % eqE e9I ar$s result0&

tests whether two atoms refer to the same ob#ect.


Atom 'result0

int #uiltin_e9/Atom ar$s { Atom a #& int e9&

i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO >nilp/c,r/c,r/ar$s0000 return Drror_Ar$s& a 5 car/ar$s0& # 5 car/c,r/ar$s00& i- /a1type 55 #1type0 { s:itc( /a1type0 { case AtomType_Nil; e9 5 4& #rea7& case AtomType_!air; case AtomType_Plosure; case AtomType_Nacro; e9 5 /a1value1pair 55 #1value1pair0& #rea7& case AtomType_"ym#ol; e9 5 /a1value1sym#ol 55 #1value1sym#ol0& #rea7& case AtomType_Inte$er; e9 5 /a1value1inte$er 55 #1value1inte$er0& #rea7& case AtomType_Builtin;

9C

e9 5 /a1value1#uiltin 55 #1value1#uiltin0& #rea7& % % else { e9 5 3& % 'result 5 e9 I ma7e_sym/<T<0 ; nil& return Drror_OE&

% 8airE

Tests whether an atom is a pair.


int #uiltin_pairp/Atom ar$s Atom 'result0 { i- /nilp/ar$s0 OO >nilp/c,r/ar$s000 return Drror_Ar$s& 'result 5 /car/ar$s01type 55 AtomType_!air0 I ma7e_sym/<T<0 ; nil& return Drror_OE& %

3on't forget to add bindings for these to the initial environment.


env_set/env env_set/env env_set/env map ma7e_sym/<A!!LB<0 ma7e_#uiltin/#uiltin_apply00& ma7e_sym/<DMI<0 ma7e_#uiltin/#uiltin_e900& ma7e_sym/<!AIRI<0 ma7e_#uiltin/#uiltin_pairp00&

e can use -ol,r and apply to implement another important function map, which constructs a list containing the results of calling an n-ary function with the values contained in n lists in turn.
/,e-ine /unary-map proc list0 /-ol,r /lam#,a /8 rest0 /cons /proc 80 rest00 nil list00 /,e-ine /map proc 1 ar$-lists0 /i- /car ar$-lists0 /cons /apply proc /unary-map car ar$-lists00 /apply map /cons proc /unary-map c,r ar$-lists0000 nil00

Ance again please note that there are alternative implementations. It works like this7
2 ( a8 + *(1 2 !) *(4 % 6)) /R U T0

The result is a list containing the results of evaluating /G 4 @0, /G * R0, and /G C S0.

9D

-uasi.uotation
MUA"IMUOTD

is an extension of the MUOTD special form which is convenient for writing

macros. 8or symbols and other simple data, MUA"IMUOTD behaves like MUOTD, returning the datum unevaluated. &ists are also return without being evaluated, with two exceptions. If an element of the list !or a sub-list" is of the form /UNMUOTD expr0, then e8pr is evaluated and the result inserted into the list in place. /UNMUOTD-"!LIPINW expr0 is similar, but the result of evaluating e8pr must be a list, the items of which are spliced into the parent list.

Example
/MUA"IMUOTD /G 4 /UNMUOTD /G * C0000

evaluates to
/G 4 R0

If we define L to be the list /C @ R0 then


/MUA"IMUOTD /4 * /UNMUOTD-"!LIPINW L000

evaluates to

/4 * C @ R0

Shorthand syntax
/ust like MUOTD, we will define the following abbreviations. A##reviation
Xexpr expr Yexpr

E.uivalent to
/MUA"IMUOTD expr0 /UNMUOTD expr0 /UNMUOTD-"!LIPINW expr0

Rewriting the examples above with this syntax gives


X/G 4 /G * C00 YL0

and

X/4 *

Implementation
e extend the lexer to understand the additional special tokens.
int le8/const { const const const c(ar 'str const c(ar ''start const c(ar ''en,0

c(ar ':s 5 < FtFn<& c(ar ',elim 5 </0 FtFn<& c(ar 'pre-i8 5 </0F=F<& :s0&

str G5 strspn/str

i- /str)3+ 55 =F3=0 {

9F

'start 5 'en, 5 NULL& return Drror_"ynta8& % 'start 5 str& i- /strc(r/pre-i8 'en, 5 str else if (str,&- '' *end ' str else 'en, 5 str return Drror_OE& % rea,_e8pr str)3+0 >5 NULL0 G 4& *6*) + (str,1- '' *G* E 2 H 1)7 G strcspn/str ,elim0&

must expand the abbreviations in the same way as MUOTD

int rea,_e8pr/const c(ar 'input const c(ar ''en, Atom 'result0 { 1 1 1 i- /to7en)3+ 55 =/=0 { 1 1 1 % else i- /to7en)3+ 55 =X=0 { 'result 5 cons/ma7e_sym/<MUA"IMUOTD<0 cons/nil nil00& return rea,_e8pr/'en, en, Jcar/c,r/'result000& % else i- /to7en)3+ 55 = =0 { 'result 5 cons/ma7e_sym/ to7en)4+ 55 =Y= I <UNMUOTD-"!LIPINW< ; <UNMUOTD<0 cons/nil nil00& return rea,_e8pr/'en, en, Jcar/c,r/'result000& % else { 1 1 1 % %

The MUA"IMUOTD operator itself may be defined as a macro. 8irst we need a few helper functions.
/,e-ine /appen, a #0 /-ol,r cons # a00 /,e-ine /caar 80 /car /car 8000 /,e-ine /ca,r 80 /car /c,r 8000 /appen, a b0

concatenates the lists a and #.

,nd now the macro itself7


/,e-macro /9uasi9uote 80 /i- /pairI 80 /i- /e9I /car 80 =un9uote0

9:

/ca,r 80 /i- /e9I /caar 80 =un9uote-splicin$0 /list =appen, /ca,r /car 800 /list =9uasi9uote /c,r 8000 /list =cons /list =9uasi9uote /car 800 /list =9uasi9uote /c,r 800000 /list =9uote 8000

The definition above is a little hard to follow, since the resulting expression must be built up using LI"T and may include additional calls to MUA"IMUOTD. Juasi2uotation allows us to make the body of a macro look like the expression it returns4 for example the IWNORD macro in chapter 66
/DDFNAPRO /IWNORD A0 /PON" =MUOTD /PON" A NIL000

can now be written

/DDFNAPRO /IWNORD A0 X/MUOTD A00

and the operation is made clear.

Testing
2 F(+ 1 6(+ 2 !)) /G 4 R0 2 (define l *(! 4 %)) L 2 F(1 2 6Gl) /4 * C @ R0 let

e will now use MUA"IMUOTD to define a new special form7


/LDT //sym1 expr10 /sym2 expr20 1110 body1110

causes the expressions e8pr to be evaluated with the symbols sym4, sym*... bound to the result of evaluating e8pr4, e8pr* and so on. The result of the last expression #o,y to be evaluated is returned.
LDT

The definition is simple.


/,e-macro /let ,e-s 1 #o,y0 X//lam#,a /map car ,e-s0 Y/map ca,r ,e-s000 Y#o,y0

Example
hen we evaluate the form
/LDT //A C0 /B R00 /G A B00

);

it is transformed by the LDT macro into


//LANBDA /A B0 /G A B00 C R0

which behaves as desired.

Testing
2 (let ((x !) ($ %)) (+ x $)) V 2 x "ym#ol not #oun,

The LDT expression clarifies the programmer's intent to make temporary definitions.

A trick
e can use LDT to extend the built-in binary operator G to accept any number of arguments.
/,e-ine G /let //ol,G G00 /lam#,a 8s /-ol,l ol,G 3 8s0000

$ompare this with the definition of ADD add the end of chapter 6;.

Testing
2 (+ 1 2 ! 4) 43

e didn't have to touch #uiltin_a,, or even recompile the interpreter. NOTE The implementation of eval_expr and the design of the stack in this chapter are rather ad-hoc and I!m not particularly proud of them" Please skip to the next chapter if they offend you"

%ontinuations and tail recursion


Aur eval_e8pr function has been implemented recursively 0 that is to say, when in the course of evaluating an expression it is necessary to evaluate a sub-expression, eval_e8pr calls itself to obtain the result. This works fairly well, and is easy to follow, but the depth of recursion in our &I'% environment is limited by the stack si5e of the interpreter. &I'% code traditionally makes heavy use of recursion, and we would like to support this up to the limit of available memory. Take the following pathological example7
/,e-ine /count n0 /i- /5 n 30 3 /G 4 /count /- n 400000

)6

The POUNT function will recurse to depth n and return the sum of n ones. Gxpressions such as /POUNT 430 should compute AK with our current interpreter, but even /POUNT 433330 is enough to cause a stack overflow on my machine. To achieve this we will rewrite eval_e8pr as a loop, with helper functions to keep track of evaluations in progress and return the next expression to be evaluated. hen there are no more expressions left, eval_e8pr can return the final result to the caller. ,s eval_e8pr works through the tree of expressions, we will keep track of arguments evaluated and pending evaluation in a series of -rames, linked together to form a stac7. This is broadly the same way that the compiled version of the recursive eval_e8pr works4 in this case we are replacing the machine code stack with a &I'% data structure and manipulating it explicitly. The stack can also be thought of as representing the future of the computation once the present expression has been evaluated. In this sense it is referred to as the current continuation. 'ince any function which is called by eval_e8pr may not call eval_e8pr !to avoid recursion", we must integrate apply and #uiltin_apply into the body of eval_e8pr.

Implementation
, stack frame has the following form.
/parent env evaluated-op /pending-arg...0 /evaluated-arg...0 /body...00

is the stack frame corresponding to the parent expression !that is, the one which is waiting for the result of the current expression". env is the current environment, evaluate,op is the evaluated operator, and pen,in$-ar$111 and evaluate,-ar$ are the arguments pending and following evaluation respectively. #o,y111 are the expressions in the function body which are pending execution.
parent

Rather than writing out long lists of car/0 and c,r/0, we will define some helper functions to manipulate members of a list.
Atom list_$et/Atom list int 70 { :(ile /7--0 list 5 c,r/list0& return car/list0& % voi, list_set/Atom list int 7 Atom value0 { :(ile /7--0 list 5 c,r/list0& car/list0 5 value& % voi, list_reverse/Atom 'list0 { Atom tail 5 nil&

)(

:(ile />nilp/'list00 { Atom p 5 c,r/'list0& c,r/'list0 5 tail& tail 5 'list& 'list 5 p& % 'list 5 tail&

,nother function creates a new stack frame ready to start evaluating a new function call, with the specified parent, environment and list of arguments pending evaluation !the tail".
Atom ma7e_-rame/Atom parent Atom env { return cons/parent cons/env cons/nil H' op 'H cons/tail cons/nil H' ar$s 'H cons/nil H' #o,y 'H nil000000& % Atom tail0

1ere is the innermost part of our new e8ec_e8pr, which sets e8pr to the next part of the function body, and pops the stack when we have reached end of the body.
int eval_,o_e8ec/Atom 'stac7 { Atom #o,y& Atom 'e8pr Atom 'env0

'env 5 list_$et/'stac7 40& #o,y 5 list_$et/'stac7 R0& 'e8pr 5 car/#o,y0& #o,y 5 c,r/#o,y0& i- /nilp/#o,y00 { H' Finis(e, -unction& pop t(e stac7 'H 'stac7 5 car/'stac70& % else { list_set/'stac7 R #o,y0& % return Drror_OE& %

This helper binds the function arguments into a new environment if they have not already been bound, then calls eval_,o_e8ec to get the next expression in the body.
int eval_,o_#in,/Atom 'stac7 Atom 'e8pr { Atom op ar$s ar$_names #o,y& #o,y 5 list_$et/'stac7 R0& i- />nilp/#o,y00 return eval_,o_e8ec/stac7 op 5 list_$et/'stac7 *0& ar$s 5 list_$et/'stac7 @0& Atom 'env0

e8pr

env0&

)9

'env 5 env_create/car/op00& ar$_names 5 car/c,r/op00& #o,y 5 c,r/c,r/op00& list_set/'stac7 4 'env0& list_set/'stac7 R #o,y0& H' Bin, t(e ar$uments 'H :(ile />nilp/ar$_names00 { i- /ar$_names1type 55 AtomType_"ym#ol0 { env_set/'env ar$_names ar$s0& ar$s 5 nil& #rea7& % i- /nilp/ar$s00 return Drror_Ar$s& env_set/'env car/ar$_names0 ar$_names 5 c,r/ar$_names0& ar$s 5 c,r/ar$s0&

car/ar$s00&

% i- />nilp/ar$s00 return Drror_Ar$s& list_set/'stac7 % @ nil0&

return eval_,o_e8ec/stac7

e8pr

env0&

The next function is called once all arguments have been evaluated, and is responsible either generating an expression to call a builtin, or delegating to eval_,o_#in,.
int eval_,o_apply/Atom 'stac7 { Atom op ar$s& Atom 'e8pr Atom 'env Atom 'result0

op 5 list_$et/'stac7 *0& ar$s 5 list_$et/'stac7 @0& i- />nilp/ar$s00 { list_reverse/Jar$s0& list_set/'stac7 @ ar$s0& % i- /op1type 55 AtomType_"ym#ol0 { i- /strcmp/op1value1sym#ol <A!!LB<0 55 30 { H' Replace t(e current -rame 'H 'stac7 5 car/'stac70& 'stac7 5 ma7e_-rame/'stac7 'env nil0& op 5 car/ar$s0& ar$s 5 car/c,r/ar$s00& i- />listp/ar$s00 return Drror_"ynta8& list_set/'stac7 list_set/'stac7 * @ op0& ar$s0&

% %

i- /op1type 55 AtomType_Builtin0 { 'stac7 5 car/'stac70&

))

'e8pr 5 cons/op ar$s0& return Drror_OE& % else i- /op1type >5 AtomType_Plosure0 { return Drror_Type& % % return eval_,o_#in,/stac7 e8pr env0&

This part is called once an expression has been evaluated, and is responsible for storing the result, which is either an operator, an argument, or an intermediate body expression, and fetching the next expression to evaluate.
int eval_,o_return/Atom 'stac7 { Atom op ar$s #o,y& Atom 'e8pr Atom 'env Atom 'result0

'env 5 list_$et/'stac7 40& op 5 list_$et/'stac7 *0& #o,y 5 list_$et/'stac7 R0& i- />nilp/#o,y00 { H' "till runnin$ a proce,ure& i$nore t(e result 'H return eval_,o_apply/stac7 e8pr env result0& % i- /nilp/op00 { H' Finis(e, evaluatin$ operator 'H op 5 'result& list_set/'stac7 * op0& i- /op1type 55 AtomType_Nacro0 { H' Don=t evaluate macro ar$uments 'H ar$s 5 list_$et/'stac7 C0& 'stac7 5 ma7e_-rame/'stac7 'env nil0& op1type 5 AtomType_Plosure& list_set/'stac7 * op0& list_set/'stac7 @ ar$s0& return eval_,o_#in,/stac7 e8pr env0& % % else i- /op1type 55 AtomType_"ym#ol0 { H' Finis(e, :or7in$ on special -orm 'H i- /strcmp/op1value1sym#ol <DDFIND<0 55 30 { Atom sym 5 list_$et/'stac7 @0& /voi,0 env_set/'env sym 'result0& 'stac7 5 car/'stac70& 'e8pr 5 cons/ma7e_sym/<MUOTD<0 cons/sym nil00& return Drror_OE& % else i- /strcmp/op1value1sym#ol <IF<0 55 30 { ar$s 5 list_$et/'stac7 C0& 'e8pr 5 nilp/'result0 I car/c,r/ar$s00 ; car/ar$s0& 'stac7 5 car/'stac70& return Drror_OE& % else { $oto store_ar$& % % else i- /op1type 55 AtomType_Nacro0 { H' Finis(e, evaluatin$ macro 'H 'e8pr 5 'result& 'stac7 5 car/'stac70&

)B

return Drror_OE& % else { store_ar$; H' "tore evaluate, ar$ument 'H ar$s 5 list_$et/'stac7 @0& list_set/'stac7 @ cons/'result %

ar$s00&

ar$s 5 list_$et/'stac7 C0& i- /nilp/ar$s00 { H' No more ar$uments le-t to evaluate 'H return eval_,o_apply/stac7 e8pr env result0& % H' Dvaluate ne8t ar$ument 'H 'e8pr 5 car/ar$s0& list_set/'stac7 C c,r/ar$s00& return Drror_OE& %

,nd here we are at last with the new eval_e8pr. There is a lot of code for setting up special forms, but the rest is simply a loop waiting for the stack to clear.
int eval_e8pr/Atom e8pr Atom env { Drror err 5 Drror_OE& Atom stac7 5 nil& ,o { Atom 'result0

i- /e8pr1type 55 AtomType_"ym#ol0 { err 5 env_$et/env e8pr result0& % else i- /e8pr1type >5 AtomType_!air0 { 'result 5 e8pr& % else i- />listp/e8pr00 { return Drror_"ynta8& % else { Atom op 5 car/e8pr0& Atom ar$s 5 c,r/e8pr0& i- /op1type 55 AtomType_"ym#ol0 { H' Zan,le special -orms 'H i- /strcmp/op1value1sym#ol <MUOTD<0 55 30 { i- /nilp/ar$s0 OO >nilp/c,r/ar$s000 return Drror_Ar$s& 'result 5 car/ar$s0& % else i- /strcmp/op1value1sym#ol <DDFIND<0

55 30 {

Atom sym& i- /nilp/ar$s0 OO nilp/c,r/ar$s000 return Drror_Ar$s& sym 5 car/ar$s0& i- /sym1type 55 AtomType_!air0 { err 5 ma7e_closure/env sym 5 car/sym0&

c,r/sym0

c,r/ar$s0

result0&

)C

AtomType_"ym#ol0 'result0& AtomType_"ym#ol0 {

i- /sym1type >5 return Drror_Type& /voi,0 env_set/env sym 'result 5 sym& % else i- /sym1type 55 i- />nilp/c,r/c,r/ar$s0000 return Drror_Ar$s& stac7 5 ma7e_-rame/stac7 list_set/stac7 * op0& list_set/stac7 @ sym0& e8pr 5 car/c,r/ar$s00& continue& % else { return Drror_Type& % % else i- /strcmp/op1value1sym#ol <LANBDA<0

env

nil0&

55 30 {

i- /nilp/ar$s0 OO nilp/c,r/ar$s000 return Drror_Ar$s& result0& err 5 ma7e_closure/env car/ar$s0 <IF<0 55

c,r/ar$s0 30 {

% else i- /strcmp/op1value1sym#ol

nilp/c,r/c,r/ar$s000 nilp/c,r/c,r/c,r/ar$s00000

i- /nilp/ar$s0 OO nilp/c,r/ar$s00 OO OO > return Drror_Ar$s&

c,r/ar$s00&

stac7 5 ma7e_-rame/stac7 list_set/stac7 * op0& e8pr 5 car/ar$s0& continue& % else i- /strcmp/op1value1sym#ol

env

<DDFNAPRO<0 55 30 {

Atom name

macro&

i- /nilp/ar$s0 OO nilp/c,r/ar$s000 return Drror_Ar$s& i- /car/ar$s01type >5 AtomType_!air0 return Drror_"ynta8& name 5 car/car/ar$s00& i- /name1type >5 AtomType_"ym#ol0 return Drror_Type& err 5 ma7e_closure/env c,r/car/ar$s00 c,r/ar$s0 Jmacro0& i- />err0 { macro1type 5 AtomType_Nacro& 'result 5 name& /voi,0 env_set/env name

macro0&

)D

% % else i- /strcmp/op1value1sym#ol 55 30 { OO >nilp/c,r/c,r/ar$s0000

<A!!LB<0

i- /nilp/ar$s0 OO nilp/c,r/ar$s00 return Drror_Ar$s& stac7 5 ma7e_-rame/stac7 env

c,r/ar$s00&

list_set/stac7 * op0& e8pr 5 car/ar$s0& continue& % else { $oto pus(& % % else i- /op1type 55 AtomType_Builtin0 { err 5 /'op1value1#uiltin0/ar$s result0& % else { pus(; H' Zan,le -unction application 'H stac7 5 ma7e_-rame/stac7 env ar$s0& e8pr 5 op& continue& % % i- /nilp/stac700 #rea7&

i- />err0 err 5 eval_,o_return/Jstac7 % :(ile />err0& return err& %

Je8pr

Jenv

result0&

Testing
&et's try our POUNT function again.
2 (count 1&&&&&) 433333

1oorayE e can now recurse as much as we like without causing a stack overflow. If you have a lot of R,., you should even be able to do a million levels deep.

Tail recursion
If the last expression in a function is a call to another function, then the result can be returned directly to the first function's caller. This is known as a tail call. If the called function, through a series of tail calls, causes the first function to be called, we have tail recursion. Tail calls do not re2uire the caller's stack frame to be retained, so a tail-recursive function can recurse as many levels as necessary without increasing the stack depth. The count function could be formulated as a tail-recursive procedure as follows7

)F

/,e-ine /count n a0 /i- /5 n 30 a /count /- n 40 /G a 40000 /count 433333 30

If you watch eval_e8pr with a debugger you can confirm that the stack never grows above a few levels deep. ,ll that is left to do is clean up all the temporary ob#ects created by our new evaluator.

/ar#age collection
e will implement a very simple mark-and-sweep garbage collector. This is not something you would want to use in a real application, but it will serve for our purposes. Remember that all our &I'% data is allocated through the cons function. 8irst we modify it to keep track of every allocation in a linked list.
struct Allocation { struct !air pair& int mar7 ; 4& struct Allocation 'ne8t& %& struct Allocation '$lo#al_allocations 5 NULL& Atom cons/Atom car_val Atom c,r_val0 { struct Allocation 'a& Atom p& a 5 malloc/si6eo-/struct Allocation00& a-2mar7 5 3& a-2ne8t 5 $lo#al_allocations& $lo#al_allocations 5 a& p1type 5 AtomType_!air& p1value1pair 5 Ja-2pair& car/p0 5 car_val& c,r/p0 5 c,r_val& % return p&

>ow a function to mark a whole tree of pairs as <in use<.


voi, $c_mar7/Atom root0 { struct Allocation 'a& i- />/root1type 55 AtomType_!air OO root1type 55 AtomType_Plosure OO root1type 55 AtomType_Nacro00

):

return& a 5 /struct Allocation '0 //c(ar '0 root1value1pair - o--seto-/struct Allocation i- /a-2mar70 return& a-2mar7 5 4& $c_mar7/car/root00& $c_mar7/c,r/root00&

pair00&

The garbage collector frees everything which is not marked, and then clears the marks ready for the next run. e also mark the symbol table since these are referenced by a static variable.
voi, $c/0 { struct Allocation 'a $c_mar7/sym_ta#le0& H' Free unmar7e, allocations 'H p 5 J$lo#al_allocations& :(ile /'p >5 NULL0 { a 5 'p& i- />a-2mar70 { 'p 5 a-2ne8t& -ree/a0& % else { p 5 Ja-2ne8t& % % H' Plear mar7s 'H a 5 $lo#al_allocations& :(ile /a >5 NULL0 { a-2mar7 5 3& a 5 a-2ne8t& % %

''p&

'o that we don't run out of memory under deep recursion, we need to call the garbage collector every few iterations of eval_e8pr. The interval will roughly determine how many allocations are made between garbage collections.
int eval_e8pr/Atom e8pr Atom env { static int count 5 3& Drror err 5 Drror_OE& Atom stac7 5 nil& ,o { i- /GGcount 55 4333330 { $c_mar7/e8pr0& $c_mar7/env0& Atom 'result0

B;

% 1 1 1 %

$c_mar7/stac70& $c/0& count 5 3&

Testing
,dapting the POUNT example from previous chapters7
2 /,e-ine /count n0 /i- /5 n 30 t /count /- n 40000 POUNT 2 /count 43333330 T

,nd loE the operation completes without eating up all of our R,..

0!ere do 1e go from !ere2


The goal of this pro#ect was to demonstrate an easy implementation of &I'%. There is not much point in optimi5ing or mindlessly implementing library functions 0 this work has already been done in other pro#ects. 1ere are some possible extensions which might prove interesting7

Gxpose continuations with callHcc Ather numeric types !float, rational, bignum, complex" 'tring, vector and boolean types IIA support $alls to system libraries /IT !integrate with &&L., for example" 'lab allocation ,lternative M$

>ow it's time to stop messing about in $ and build something in &I'% insteadE That's all, folks. http7IIwww.lwh.#pIlispIindex.html

3o1 to 1in t!e lotter$


6. =uy a ticket with the correct numbers (. ait for draw 9. $ollect winnings

B6

You might also like