Roll A Lisp In C - Reading

Lisp is often known as one of the oldest programming languages. Indeed, it's conception can be traced back to research done by John McCarthy in 1960. In these series of articles I present an implementation of a Lisp written in the C language. These articles assume some familiarity in a Lisp like Scheme or Common Lisp.


Roll A Lisp In C

A Lisp interpreter can be thought of as having 3 distinct parts. The Lisp reader, which takes a character string representation of a program and loads it into data for evaluation. The Lisp evaluator, which can compute an expression. The Lisp printer, which can take data and create a character string representation for displaying to a console. So, the first step is to create the Lisp reader so that we can have expressions to evaluate.

Reader

The Lisp reader is essentially just a parser. So, we should have a way to lexically analyze the input string. Which is to say we want to turn a string representation of our expression into a series of tokens. Thankfully, Lisp is quite simple to lexically analyze. The only tokens we care about are parenthesis and symbols.

// We'll have 128 tokens. Each token can be up to 32 characters long.
char token[128][32];

int lexer(char* input) {
  int ii = 0; // input index
  int ti = 0; // token index

  // Loop thru the whole string
  while(input[ii] != '\0')
    switch(input[ii]) {
    // Ignore whitespace and newlines
    case ' ':
    case '\n':
      ++ii;
      break;

    // Turn a left parenthesis into a token.
    case '(':
      token[ti][0] = '(';
      token[ti][1] = '\0';
      ++ii;
      ++ti;
      break;

    // Turn a right parenthesis into a token.
    case ')':
      token[ti][0] = ')';
      token[ti][1] = '\0';
      ++ii;
      ++ti;
      break;

    // Anything else is a symbol
    default:
      for(int i = 0;; ++i) {
	if(input[ii] != ' '  &&
	   input[ii] != ')'  &&
	   input[ii] != '('  &&
	   input[ii] != '\n' &&
	   input[ii] != '\0') {
	  token[ti][i] = input[ii++];
	}
    else {
	  token[ti][i] = '\0';
	  break;
	}
      }
      ++ti;
      break;
    }
  return ti;
}

This code will create 3 types of tokens. A left and right parenthesis token and a symbol token. It would be nice to have some way of representing iteration through the tokens. An interface or a way of talking about the array.

int curtok;

char* nexttok() {
  return token[curtok++];
}

char* peektok() {
  return token[curtok];
}

This will be our way of talking about the token array. We can take the next token in the array or, look at the current token in the stream.

Our expressions are held in list structure so, we should use list structured memory.

typedef struct {
  void* car;
  void* cdr;
} Pair;

Pair text[256];
Pair* textptr;

Pair* cons(void* x, void* y) {
  textptr->car = x;
  textptr->cdr = y;
  return textptr++;
}

int ispair(void* x) {
  return x >= (void*)&text &&
        x <  (void*)&text[256];
}

Here we’re using pairs to represent list memory. Our interface to this memory is cons and ispair. cons does exactly what we would want cons to do. It makes an new pair from unused memory. ispair is just way to check if the thing we’re referring to is in list memory or not.

We now have enough infrastructure laid out to begin implementing the reader.

void* read(char* ln) {
  // Initialize the lexer and list memory.
  curtok = 0;
  textptr = text;

  lexer(ln);
  return read_exp();
}

void* read_exp() {
  char* tok = nexttok();
  if(tok[0] == '(')
    return read_list();
  else
    return tok;
}

void* read_list() {
  char* tok = peektok();
  if(tok[0] == ')') {
    nexttok();
    return NULL;
  }
  else {
    void* fst = read_exp();
    void* snd = read_list();
    return cons(fst, snd);
  }
}

This is the Lisp reader. read will take in a character string representation of our program and, return a pointer to it’s Lisp representation. Consider an expresion:

(display (read))

First this string is lexically analyzed. It’s turned into something of the following when passed to lexer:

char token[128][32];
token[0] = "(";
token[1] = "display";
token[2] = "(";
token[3] = "read";
token[4] = ")";
token[5] = ")";

Now, let’s trace the reader routine to try to get some intuition.

read_exp:
tok <- token[0] ; "("
call read_list

read_list:
tok <- token[1]   ; "display"
fst <- read_exp   ; "display"
snd <- read_list  ; ("read")
cons (fst snd)

read_exp:
tok <- token[1]  ; "display"
"display"

read_list:
tok <- token[2]   ; "("
fst <- read_exp   ; ("read")
snd <- read_list  ; NULL
cons (fst snd)

read_exp:
tok <- token[2]  ; "("
call read_list

read_list:
tok <- token[3]   ; "read"
fst <- read_exp   ; "read"
snd <- read_list  ; NULL
cons (fst snd)

read_exp:
tok <- token[3]   ; "read"
"read"

read_list:
tok <- token[4]  ; ")"
NULL

read_list:
tok <- token[5]  ; ")"
NULL

In this syntax string the first token read_exp will see is a “(” so, it will call read_list. Consider what’s in the next token. It’s “display” and, read_list will only process this by calling read_exp. At this point the next token is “(” so, read_exp will call read_list to start this process again.

You can think of a call to read_list as a sequence of calling read_exp several times. So that, cons(read_exp, cons(read_exp, ... | NULL)) becomes the operation.

Writter

Now that we have a representation of lists and symbols we can print them out.

void print(void* exp) {
  print_exp(exp);
  printf("\n");
}

void print_exp(void* exp) {
  if(ispair(exp)) {
    printf("(");
    print_list(exp);
  }
  else
    printf("%s", exp);
}

void print_list(Pair* list) {
  if(list->cdr == NULL) {
    print_exp(list->car);
    printf(")");
  }
  else {
    print_exp(list->car);
    printf(" ");
    print_list(list->cdr);
  }
}

Read, eval, print loop

Putting everything together we can make a basic REPL interface.

void* eval(void* exp) {
  return exp;
}

int main(int argc, char** argv) {
  printf("Lisp REPL\n\n");
  printf(">> ");

  char buffer[256];
  while(fgets(buffer, 256, stdin)) {
    print(eval(read(buffer)));
    printf(">> ");
  }
  return 0;
}

In this article we’ve gone over a Lisp reader and writter. We’ve seen how it represents and stores it’s data and, how to access that data for printing to a user. We’ve seen how to take a C string of letters and, turn this into tokens of that string. We’ve seen how to take tokens of a Lisp and, parse those tokens into list structured memory. Now we have enough to start evaluating different syntactic expressions.

Further Reading

Reader Comments