At the end of the last posting, I mentioned that we could probably simplify
ends? function. Currently, it returns a vector with the result and a
boolean indicating whether it had been changed.
Data as Code
Compared to most programming languages, Clojure (and lisp) programs are odd:
Functions and expressions look like lists; parameter lists for functions or
let expressions look like vectors; variables look like symbols.
In fact, those programming constructs start their lives as those data structures. Remember that our interactive environment is called a REPL—a read, evaluate, and print loop. In the read stage, a function or expression is read in as a list, vector, symbol, string, or number. It’s not actually compiled into computer code until the evaluate stage. Before then, it’s all data.
This characteristic of Clojure—that programs are data—is called homoiconicity. Part of what makes lisp so powerful and special is that it introduces a new step in the evaluation process. In most computer languages, the computer reads in an expression and immediately turns it into an action or byte code or whatever. In lisp, the computer reads in the expression as a native lisp data structure, and before it processes the data, it asks the programmer again how it should be handled.
This makes more sense with an example. Consider the Clojure built-in
You’ll recall that
when is just like
if, except that the true expression
can be a sequence of expressions and the false expression doesn’t exist.
user=> (when (= (nth "abc" 1) \b) (println "b") \b) b \b
println prints its arguments, separated by spaces, and followed by a new
The first “b” is what is printed. The “b” is the value returned by the
expression, from the third line of the expression.
Of course, that
when expression is the same as this
user=> (if (= (nth "abc" 1) \b) (do (println "b") \b)) b \b
do just executes a list of expressions and returns the value of the last
one. It’s a way of taking a sequence of expressions and using them in places
that only want one expression.)
In fact, the
when expression is exactly equal to the
Clojure reads the
when expression and transforms it into the
before evaluating it.
How is this done and how can we create our own code transformations?
A macro is a code transformation. If you know XSLT, a macro transforms a Clojure expression in almost the same way as XSLT transforms an XML document. (If you don’t know XSLT, forget I mentioned it.)
A macro definition looks a lot like a function definition. It has a macro name
and a vector of parameters. But macros start with the symbol
the body of the macro has a template that builds the output expression.
Here are the tools you use to create the template.
A quote tells Clojure to return the next expression as data, exactly the way it
is written, not as an expression or as a macro template. For example, the
x before returning, but the quote in the second
let means that Clojure returns the symbol
x, not the value that the
x evaluates to.
user=> (let [x 1] x) 1 user=> (let [x 1] 'x) x
A quasi-quote is a back-quote character. It indicates that, when the macro is called, the expression it quotes should be scanned for the constructs below, and the list or expression generated should be returned. Except for the scanning bit, it’s a lot like a regular quote.
user=> `(let [x 1] x) (clojure/let [user/x 1] user/x)
Inside a macro template, a tilde forces Clojure to evaluate the expression that follows it and insert its value into that place in the result.
user=> (let [x 2] `(let [y ~x] y)) (clojure/let [user/y 2] user/y)
In this, the outer
x to 2. Inside that
let, it defines a macro
template, which returns a list that looks like another
Inside the inner
let it inserts the value of
x from the outer let.
In many ways, macro templates are shortcuts for longer expressions that build lists and vectors. For example, the last expression is the same as this expression, using functions to build the output expression:
user=> (let [x 2] (list 'let (vector 'y x) 'y)) (let [y 2] y)
The tilde-at sequence is a variation of tilde that evaluates its argument, which should return a sequence, and inserts the elements from that sequence into a list in the macro template.
user=> (let [x '(1 2 3)] `(println ~@(map (fn [y] (* 2 y)) x) 'done)) (clojure/println 2 4 6 (quote user/done))
In this expression, the elements from
x are run through
map, which doubles
their values. The output of
map is inserted into the list that starts with
println and ends with
'done. Notice that
'done in the output is still
quoted. It was read in as data—
'done—and output as the same data.
gensym and #
Sometimes when you’re creating a macro template, you want to use create a variable. One danger is that it could shadow a variable in the surrounding code.
For example, in this code,
user=> (let [x 42] `(let [x 13] [x x])) (clojure/let [user/x 13] [user/x user/x])
what do the two
x variables refer to on the third line? The
x from the
let (42); or the
x from the inner
let (13)? In this case, they
both refer to the inner
x. Occasionally, you may do this on purpose, but
usually it’s a mistake.
To avoid this mistake, in variables you declare in a macro template, you
should append an
# onto the end of the variable name. This makes sure that
that variable name is never used elsewhere. It cannot clobber any other
user=> (let [x 42] `(let [x# 13] [x x#])) (clojure/let [x__1888 13] [user/x x__1888])
In this case, the two
x variables in the final line now refer to two
different variables. The first refers to the variable in the outer
the second refers to the variable in the inner
let, in the macro template.
You can see that it’s different because Clojure has removed the
and replaced it with
__1888. (That number always changes. If you run this
code, it will be different for you.) Moreover, Clojure guarantees that that
variable name will never be used again during that run.
A Quick Macro
So let’s write a quick macro, just to see what one looks like.
I usually develop with just a REPL and the text editor Vim, and if I need to debug anything, often I just put in print statements. In languages like Python, many of these statements look like this:
print "x+4 =", x+4 print "counts[word] =", counts[word]
That’s a lot of duplication and typing, and aren’t computers supposed to get rid of that?
The problem is that we want to bypass the way the language normally evaluates expressions. We want the computer to print the first expression out, just like it would data; we want it to evaluate the second expression and print out the result.
Because we want to change Clojure’s normal evaluation rules, this is a perfect place for a macro. We will pass in an expression and have Clojure print both the expression and its value. As an added bonus, we can have it also return the value. Here’s what we want it to look like:
user=> (let [x (+ 42 13)] (println '(+ 42 13) "=>" x) (flush) x) (+ 42 13) => 55 55
Let’s look at what this would look like as a macro named
(defmacro debug [expr] `(let [value# ~expr] (println '~expr "=>" value#) (flush) value#))
Let’s pick this apart.
(defmacro debug: Create a macro named
[expr]: The macro takes one parameter, which will be called
expr in the
body of this macro.
(let [value# ~expr]: Start the macro template with a
let expression. The
let will define one variable, named
value#, which will be assigned the
value of evaluating
(println '~expr "=>" value#): Print the expression
expr as data (because
it is quoted, an arrow, and the value of the variable
(flush): Flush the output buffer.
value#)): Return the value of the variable
Before we actually try it, we can see what calling it would return using the
macroexpand-1. This takes a list expression and returns the list
expression created by calling the first macro, if any, that needs to be
executed on it.
user=> (macroexpand-1 '(debug (+ 42 13))) (clojure/let [value__1892 (+ 42 13)] (clojure/println (quote (+ 42 13)) "=>" value__1892) (clojure/flush) value__1892)
(I reformatted this to make it readable.) With a little mangling, we can see
debug macro will generate the expression
(let [value__1892 (+ 42 13)] (println '(+ 42 13) "=>" value__1892) (flush) value__1892)
Which is exactly what we want. If we then call that macro, we get this.
user=> (debug (+ 42 13)) (+ 42 13) => 55 55
(The first line is the printed output of the expression. The final 55 is the result of the expression.)
In the next posting, we’re going to look at the process of how to write macros and how to avoid the pitfalls and dangers of macro writing, and we’re going to write a number of macros for the Porter Stemmer.