At the end of the last posting, I mentioned that we could probably simplify
the ends?
function. Currently, it returns a vector with the result and a
boolean indicating whether it had been changed.
Data as Code
Compared to most programming languages, Clojure (and lisp) programs are odd:
Functions and expressions look like lists; parameter lists for functions or
let
expressions look like vectors; variables look like symbols.
In fact, those programming constructs start their lives as those data structures. Remember that our interactive environment is called a REPL—a read, evaluate, and print loop. In the read stage, a function or expression is read in as a list, vector, symbol, string, or number. It’s not actually compiled into computer code until the evaluate stage. Before then, it’s all data.
This characteristic of Clojure—that programs are data—is called homoiconicity. Part of what makes lisp so powerful and special is that it introduces a new step in the evaluation process. In most computer languages, the computer reads in an expression and immediately turns it into an action or byte code or whatever. In lisp, the computer reads in the expression as a native lisp data structure, and before it processes the data, it asks the programmer again how it should be handled.
This makes more sense with an example. Consider the Clojure built-in when
.
You’ll recall that when
is just like if
, except that the true expression
can be a sequence of expressions and the false expression doesn’t exist.
Consider this when
expression:
user=> (when (= (nth "abc" 1) \b) (println "b") \b) b \b
(println
prints its arguments, separated by spaces, and followed by a new
line.)
The first “b” is what is printed. The “b” is the value returned by the when
expression, from the third line of the expression.
Of course, that when
expression is the same as this if
expression:
user=> (if (= (nth "abc" 1) \b) (do (println "b") \b)) b \b
(do
just executes a list of expressions and returns the value of the last
one. It’s a way of taking a sequence of expressions and using them in places
that only want one expression.)
In fact, the when
expression is exactly equal to the if
expression.
Clojure reads the when
expression and transforms it into the if
expression
before evaluating it.
How is this done and how can we create our own code transformations?
Macros.
Code Transformations
A macro is a code transformation. If you know XSLT, a macro transforms a Clojure expression in almost the same way as XSLT transforms an XML document. (If you don’t know XSLT, forget I mentioned it.)
A macro definition looks a lot like a function definition. It has a macro name
and a vector of parameters. But macros start with the symbol defmacro
, and
the body of the macro has a template that builds the output expression.
Here are the tools you use to create the template.
Quote
A quote tells Clojure to return the next expression as data, exactly the way it
is written, not as an expression or as a macro template. For example, the
first let
evaluates x
before returning, but the quote in the second
let
means that Clojure returns the symbol x
, not the value that the
variable x
evaluates to.
user=> (let [x 1] x) 1 user=> (let [x 1] 'x) x
Quasi-Quote
A quasi-quote is a back-quote character. It indicates that, when the macro is called, the expression it quotes should be scanned for the constructs below, and the list or expression generated should be returned. Except for the scanning bit, it’s a lot like a regular quote.
user=> `(let [x 1] x) (clojure/let [user/x 1] user/x)
Tilde
Inside a macro template, a tilde forces Clojure to evaluate the expression that follows it and insert its value into that place in the result.
user=> (let [x 2] `(let [y ~x] y)) (clojure/let [user/y 2] user/y)
In this, the outer let
sets x
to 2. Inside that let
, it defines a macro
template, which returns a list that looks like another let
expression.
Inside the inner let
it inserts the value of x
from the outer let.
In many ways, macro templates are shortcuts for longer expressions that build lists and vectors. For example, the last expression is the same as this expression, using functions to build the output expression:
user=> (let [x 2] (list 'let (vector 'y x) 'y)) (let [y 2] y)
Tilde-At
The tilde-at sequence is a variation of tilde that evaluates its argument, which should return a sequence, and inserts the elements from that sequence into a list in the macro template.
user=> (let [x '(1 2 3)] `(println ~@(map (fn [y] (* 2 y)) x) 'done)) (clojure/println 2 4 6 (quote user/done))
In this expression, the elements from x
are run through map
, which doubles
their values. The output of map
is inserted into the list that starts with
println
and ends with 'done
. Notice that 'done
in the output is still
quoted. It was read in as data—'done
—and output as the same data.
gensym and #
Sometimes when you’re creating a macro template, you want to use create a variable. One danger is that it could shadow a variable in the surrounding code.
For example, in this code,
user=> (let [x 42] `(let [x 13] [x x])) (clojure/let [user/x 13] [user/x user/x])
what do the two x
variables refer to on the third line? The x
from the
outer let
(42); or the x
from the inner let
(13)? In this case, they
both refer to the inner x
. Occasionally, you may do this on purpose, but
usually it’s a mistake.
To avoid this mistake, in variables you declare in a macro template, you
should append an #
onto the end of the variable name. This makes sure that
that variable name is never used elsewhere. It cannot clobber any other
variable name.
user=> (let [x 42] `(let [x# 13] [x x#])) (clojure/let [x__1888 13] [user/x x__1888])
In this case, the two x
variables in the final line now refer to two
different variables. The first refers to the variable in the outer let
, and
the second refers to the variable in the inner let
, in the macro template.
You can see that it’s different because Clojure has removed the #
character
and replaced it with __1888
. (That number always changes. If you run this
code, it will be different for you.) Moreover, Clojure guarantees that that
variable name will never be used again during that run.
A Quick Macro
So let’s write a quick macro, just to see what one looks like.
I usually develop with just a REPL and the text editor Vim, and if I need to debug anything, often I just put in print statements. In languages like Python, many of these statements look like this:
print "x+4 =", x+4 print "counts[word] =", counts[word]
That’s a lot of duplication and typing, and aren’t computers supposed to get rid of that?
The problem is that we want to bypass the way the language normally evaluates expressions. We want the computer to print the first expression out, just like it would data; we want it to evaluate the second expression and print out the result.
Because we want to change Clojure’s normal evaluation rules, this is a perfect place for a macro. We will pass in an expression and have Clojure print both the expression and its value. As an added bonus, we can have it also return the value. Here’s what we want it to look like:
user=> (let [x (+ 42 13)] (println '(+ 42 13) "=>" x) (flush) x) (+ 42 13) => 55 55
Let’s look at what this would look like as a macro named debug
:
(defmacro debug [expr] `(let [value# ~expr] (println '~expr "=>" value#) (flush) value#))
Let’s pick this apart.
(defmacro debug
: Create a macro named debug
.
[expr]
: The macro takes one parameter, which will be called expr
in the
body of this macro.
(let [value# ~expr]
: Start the macro template with a let
expression. The
let
will define one variable, named value#
, which will be assigned the
value of evaluating expr
.
(println '~expr "=>" value#)
: Print the expression expr
as data (because
it is quoted, an arrow, and the value of the variable value#
.
(flush)
: Flush the output buffer.
value#))
: Return the value of the variable value#
.
Before we actually try it, we can see what calling it would return using the
function macroexpand-1
. This takes a list expression and returns the list
expression created by calling the first macro, if any, that needs to be
executed on it.
user=> (macroexpand-1 '(debug (+ 42 13))) (clojure/let [value__1892 (+ 42 13)] (clojure/println (quote (+ 42 13)) "=>" value__1892) (clojure/flush) value__1892)
(I reformatted this to make it readable.) With a little mangling, we can see
that the debug
macro will generate the expression
(let [value__1892 (+ 42 13)] (println '(+ 42 13) "=>" value__1892) (flush) value__1892)
Which is exactly what we want. If we then call that macro, we get this.
user=> (debug (+ 42 13)) (+ 42 13) => 55 55
(The first line is the printed output of the expression. The final 55 is the result of the expression.)
In the next posting, we’re going to look at the process of how to write macros and how to avoid the pitfalls and dangers of macro writing, and we’re going to write a number of macros for the Porter Stemmer.
16 comments:
Nice post. However, '~' is called 'tilde'. See http://en.wikipedia.org/wiki/Tilde.
Umm. Wasn't that what I called it?
Confusedly,
Eric
I believe the problem is that your tilde looks like the backquote.
You're probably right.
Looking at it, I should maybe make the font size larger in the code snippets.
Excellent post! Keep 'em coming!
Agree with Paul, excellent post and thanks again for the whole series!
Thanks guys. I appreciate it.
Eric
I dont get the line (println '~expr "=>" value#). Above you state that tilde evaluates expr. Why do we want to evaluate here? We want to print the unevaluated expression don't we?
Hi Anon,
This is where macros get a little complicated. Remember that there are two levels of execution: first when the macro is executed, and second when the code that the macro generates is executed.
I'll step through what's happening, paying attention to the line you mentioned.
At macro expansion time, in (debug (+ 1 2)), expr gets bound to (+ 1 2). So (println '~expr "=>" value#) gets translated to (println '(+ 1 2) "=>" value#).
Without the tilde, it gets changed to (println 'expr "=>" value#).
At execution time, the quote in (println '(+ 1 2) "=>" value#) keeps the expression from being re-executed and has the form get passed to println untouched. value# is bound to 3, so the result is that it prints "(+ 1 2) => 3".
In the other, (println 'expr "=>" value#), because of the quote, expr gets passed through untouched, and the output is "expr => 3".
The difference here is also the quote, which comes into play at execution time and prevents the form (+ 1 2) from being executed a second time.
HTH,
Eric
Thank you Eric. It perfectly makes sense now ...
you say that ` and ' are basically the same, but it doesn't seem that way to me. if i do 'let vs `let at the repl...i get very different results
It looks like I said, "Except for the scanning bit, it’s a lot like a regular quote."
"The scanning bit" is an awfully big bit, though.
But you're right, I really did not stress how big of a difference that is.
Eric
Best explanation I've seen so far. Thanks!
Excellent post! You broke it in peaces so we can easily digest. Divider and conquer!!!
Hey Eric, love your clojure articles. Awesome work. Thx, S
Glad you like them!
Post a Comment