7 lines of code, 3 minutes: Implement a programming language

A small (yet Turing-equivalent) language

The easiest programming language to implement is a minimal, higher-order functional programming language known as lambda calculus.

Lambda calculus actually lives at the core of all the major functional languages ​​– Haskell, Scheme, and ML – but it also lives inside JavaScript, Python, and Ruby. It’s also hidden inside Java, if you know where to find it.

a brief History

Alonzo Church developed lambda calculus in 1929.

At that time, it was not called a programming language because there were no computers; There was nothing to “program”.

It was really just a mathematical notation for reasoning about functions.

Fortunately, Alonzo Church has a Ph.D. Was. student named Alan Turing.

Alan Turing defined the Turing machine, which became the first accepted definition of a general purpose computer.

It soon turned out that lambda calculus and Turing machines were equivalent: any function you can describe with lambda calculus can be applied to a Turing machine, and any function you can apply to a Turing machine can be described in lambda calculus.

What makes it remarkable is that there are only three types of expressions in lambda calculus: variable references, anonymous functions, and function calls.

anonymous work

Anonymous functions are written with “lambda-dot” notation, so that:

v . e)

is a function that accepts an argument v
Returns the value of and e.

If you programmed in JavaScript, this form is equivalent to:

 function (v) { return e ; } 

function call

A function call is written by concatenating two expressions:

 (f e)

In JavaScript (or almost any other language), you would write it like this:

 f(e)

Example

The identity function, which returns only its own argument, is easy to write:

 (λ x . x)

We can apply the identity function to the identity function:

 ((λ x . x) (λ a . a))

(Which of course only returns the identity function.)

Here’s a (slightly) more interesting program:

 (((λ f . (λ x . (f x))) (λ a . a)) (λ b . b))

Can you understand what it does?

wait! How is it a “programming” language after all?

At first glance, this simple language appears to lack both recursion and recursion, not to mention numbers, booleans, conditionals, data structures, and everything else. How can this language possibly be general-purpose?

The way lambda calculus achieves Turing-equivalence is through two of the coolest programming hacks out there: Church encoding and Y combinator.

I’ve written a whole article on Y Combinator and another article on Church encoding. But, if you don’t want to read them, I can assure you that there is more to lambda calculus than you might expect from just a program:

 ((λ f . (f f)) (λ f . (f f)))

This outwardly benign program is called Omega, and if you try to execute it, it does not terminate! (See if you can figure out why.)

Applying Lambda Calculus

Below is a 7-line, 3-minute interpreter for lambda calculus in the R5RS scheme. In technical terms (explained below), it is an environment-based sign interpreter.

; eval takes an expression and an environment to a value
(define (eval e env) (cond
  ((symbol? e)       (cadr (assq e env)))
  ((eq? (car e) 'λ)  (cons e env))
  (else              (apply (eval (car e) env) (eval (cadr e) env)))))

; apply takes a function and an argument to a value
(define (apply f x)
  (eval (cddr (car f)) (cons (list (cadr (car f)) x) (cdr f))))

; read and parse stdin, then evaluate:
(display (eval (read) '())) (newline)

This code will read a program from stdin, parse it, evaluate it and print the result. (This is 7 lines without comments and blank lines.)

of the scheme read The function simplifies lexing and parsing – as long as you’re willing to stay in the “balanced parenthesis” (ie S-expression) world of syntax. (If not, you’ll need to read about lexing in parsing; you can start with one of my articles on lexing.) In planning, read Grabs properly bracketed input from stdin and parses it into a tree.

The interpreter has two functions at its core:
eval And apply. Regardless of where we are in Scheme, we can give conceptual “signatures” to these functions:

 eval  : Expression * Environment -> Value
 apply : Value * Value -> Value

 Environment = Variable -> Value
 Value       = Closure
 Closure     = Lambda * Environment

eval The function takes an expression and an environment, a value. The expression can be either a variable, a lambda term, or an application.

An environment is a map from variables to values, used to define the free variables of an open word. (An open term is one that has an infinite number of occurrences of a variable.) For example, consider the expression (λ x . z). The word is open, because we don’t know what z Is.

Since we are in the R5RS scheme, we can use associative lists to define the environment.

A closure is an encoding of a function that combines a (possibly open) lambda expression with an environment to define its free variables. In other words, a close closes an open period.

A clean implementation in Racket

The racket is a battery-laden, goods-getting bid of the scheme. Racket provides a match construct that clears the interpreter. Check it out:

#lang racket

; bring in the match library:
(require racket/match)

; eval matches on the type of expression:
(define (eval exp env) (match exp
  [`(,f ,e)        (apply (eval f env) (eval e env))]
  [`(λ ,v . ,e)   `(closure ,exp ,env)]
  [(? symbol?)     (cadr (assq exp env))]))

; apply destructures the function with a match too:
(define (apply f x) (match f
  [`(closure (λ ,v . ,body) ,env)
    (eval body (cons `(,v ,x) env))]))

; read in, parse and evaluate:
(display (eval (read) '()))    (newline)

It’s big, but it’s clean and easy to understand.

a big language

Lambda calculus is a small language. Nevertheless, developing/implementing its interpreter design is applicable to very large languages. For example, in about 100 lines, we can implement an interpreter for a large subset of the plan.

Consider a language with diverse forms of expression:

  1. variable reference; East: x, foo, save-file.

  2. Numeric and Boolean constants; East: 300, 3.14, #f.

  3. primitive operations; East: +, -, <=.

  4. Conditional: (if condition if-true if-false).

  5. Variable Binding: (let ((var value) ...) body-expr).

  6. Recursive Binding: (letrec ((var value) ...) body-expr).

  7. Variable Mutation: (set! var value).

  8. Indexing: (begin do-this then-this).

Now add three top-level forms to this language:

  1. Function Definition: (define (proc-name var ...) expr).

  2. Global definition: (define var expr).

  3. Top-level expression: expr.

Here is the complete interpreter, including test harness and test cases:

#lang racket

(require racket/match)

;; Evaluation toggles between eval and apply.

; eval dispatches on the type of expression:
(define (eval exp env)
  (match exp
    [(? symbol?)          (env-lookup env exp)]
    [(? number?)          exp]
    [(? boolean?)         exp]
    [`(if ,ec ,et ,ef)    (if (eval ec env)
                              (eval et env)
                              (eval ef env))]
    [`(letrec ,binds ,eb) (eval-letrec binds eb env)]
    [`(let    ,binds ,eb) (eval-let binds eb env)]
    [`(lambda ,vs ,e)    `(closure ,exp ,env)]
    [`(set! ,v ,e)        (env-set! env v e)]
    [`(begin ,e1 ,e2)     (begin (eval e1 env)
                                 (eval e2 env))]
    [`(,f . ,args)        (apply-proc
                           (eval f env) 
                           (map (eval-with env) args))]))

; a handy wrapper for Currying eval:
(define (eval-with env) 
  (lambda (exp) (eval exp env)))

; eval for letrec:
(define (eval-letrec bindings body env)
  (let* ((vars (map car bindings))
         (exps (map cadr bindings))
         (fs   (map (lambda _ #f) bindings))
         (env* (env-extend* env vars fs))
         (vals (map (eval-with env*) exps)))
    (env-set!* env* vars vals)
    (eval body env*)))

; eval for let:
(define (eval-let bindings body env)
  (let* ((vars (map car bindings))
         (exps (map cadr bindings))
         (vals (map (eval-with env) exps))
         (env* (env-extend* env vars vals)))
    (eval body env*)))
    
; applies a procedure to arguments:
(define (apply-proc f values) 
  (match f
    [`(closure (lambda ,vs ,body) ,env) 
     ; =>
     (eval body (env-extend* env vs values))]
    
    [`(primitive ,p)
     ; =>
     (apply p values)]))

;; Environments map variables to mutable cells 
;; containing values.

(define-struct cell ([value #:mutable]))

; empty environment:
(define (env-empty)  (hash))

; initial environment, with bindings for primitives:
(define (env-initial)
  (env-extend* 
   (env-empty)
   '(+  -  /  *  <=  void  display  newline)
   (map (lambda (s) (list 'primitive s))
   `(,+ ,- ,/ ,* ,<= ,void ,display ,newline))))

; looks up a value:
(define (env-lookup env var)
  (cell-value (hash-ref env var)))

; sets a value in an environment:
(define (env-set! env var value)
  (set-cell-value! (hash-ref env var) value))

; extends an environment with several bindings:
(define (env-extend* env vars values)
  (match `(,vars ,values)
    [`((,v . ,vars) (,val . ,values))
     ; =>
     (env-extend* (hash-set env v (make-cell val)) vars values)]
    
    [`(() ())
     ; =>
     env]))

; mutates an environment with several assignments:
(define (env-set!* env vars values)
  (match `(,vars ,values)
    [`((,v . ,vars) (,val . ,values))
     ; =>
     (begin
       (env-set! env v val)
       (env-set!* env vars values))]
    
    [`(() ())
     ; =>
     (void)]))

;; Evaluation tests.

; define new syntax to make tests look prettier:
(define-syntax 
  test-eval 
  (syntax-rules (====)
    [(_ program ==== value)
     (let ((result (eval (quote program) (env-initial))))
       (when (not (equal? program value))
         (error "test failed!")))]))

(test-eval
  ((lambda (x) (+ 3 4)) 20)
  ====
  7)

(test-eval
  (letrec ((f (lambda (n) 
                 (if (<= n 1)
                     1
                     (* n (f (- n 1)))))))
    (f 5))
  ====
  120)

(test-eval
  (let ((x 100))
    (begin
      (set! x 20)
      x))
  ====
  20)

(test-eval
  (let ((x 1000))
    (begin (let ((x 10))
             20)
           x))
  ====
  1000)

;; Programs are translated into a single letrec expression.

(define (define->binding define)
  (match define
    [`(define (,f . ,formals) ,body)
     ; =>
     `(,f (lambda ,formals ,body))]
    
    [`(define ,v ,value)
     ; =>
     `(,v ,value)]
    
    [else 
     ; =>
     `(,(gensym) ,define)]))

(define (transform-top-level defines)
  `(letrec ,(map define->binding defines)
     (void)))

(define (eval-program program)
  (eval (transform-top-level program) (env-initial)))

(define (read-all)
  (let ((next (read)))
    (if (eof-object? next)
        '()
        (cons next (read-all)))))

; read in a program, and evaluate:
(eval-program (read-all))

You can download the source: Minilang.rkt.

from here

You should be able to quickly test new ideas for programming languages ​​by modifying the final interpreter.

If you want a language with different syntax, you can create a parser that removes s-expressions. Taking this approach clearly separates syntactic design from semantic design.

more resources

related posts




<a href

Leave a Comment