In the last posting, we created the macro if-ends?
, which was a combination
of let
, if
and the function ends?
. Today, we’ll look at another similar
macro, one that combines let
, cond
, and ends?
.
The Purpose of cond-ends?
The problem with if-ends?
is that sometimes we’ll want to look for twenty or
more different endings on a single stemmer and handle each case differently.
We can do that with if-ends?
, but it would be a lot cleaner if we had a
construct like cond
.
Input Patterns
For example, we might want to do something like this:
(cond-ends? st (make-stemmer "names") "ing" (do (println "ends with -ing") st) "ed" (do (println "ends with -ed") st) "s" (do (println "ends with -s") st) :else (do (println "no ending") st))
First, notice that this starts out like if-ends?
. It has a variable (st
)
that the new stemmer will be assigned to, followed by a stemmer instance to
test.
These are followed by a series of ending/expression pairs. The endings are
tested one by one until a match is found. Once it is, the expression following
that ending is executed with the variable st
available to it and assigned to
the new stemmer, with the index changed to reflect the ending. The return
value of that expression is the value of the entire cond-ends?
expression.
Finally, if no ending matches, the cond-ends?
should return the original
stemmer unchanged. Or, optionally, the last ending can be the keyword :else
.
If no ending before that point matched, the expression following :else
is
executed and its value returned.
Before we start on the expected output for this macro, let’s look at a short expression that uses the default value if no ending matches:
(cond-ends? st (make-stemmer "names") "ing" (do (println "ends with -ing") st))
Of course, that would be better expressed with if-ends?
, but using it as an
input pattern will help us build the default ending.
Output Patterns
Here’s what the macro should construct from the input above.
;(cond-ends? st (make-stemmer "names") ; "ing" (do (println "ends with -ing") st) ; "ed" (do (println "ends with -ed") st) ; "s" (do (println "ends with -s") st) ; :else (do (println "no ending") st)) ;; Set up the environment (let [stemmer# (make-stemmer "names"), word# (subword stemmer#)] ;; -ing (let [j# (- (count word#) 3)] (if (and (pos? j#) (= (subvec word# j#) [\i \n \g])) (let [st (assoc stemmer# :index (dec j#))] (do (println "ends with -ing") st)) ;; -ed (let [j# (- (count word#) 2)] (if (and (pos? j#) (= (subvec word# j#) [\e \d])) (let [st (assoc stemmer# :index (dec j#))] (do (println "ends with -ed") st)) ;; -s (let [j# (- (count word#) 1)] (if (and (pos? j#) (= (subvec word# j#) [\s])) (let [st (assoc stemmer# :index (dec j#))] (do (println "ends with -s") st)) ;; :else (let [st stemmer#] (do (println "no ending") st)))))))))
That’s scary.
Actually, it’s not that bad. Notice that after the first let, it is a series
of (let ... (if ...))
constructions. (I’ve added comments to separate each
of these sections.) Each of those three let
/if
constructions are almost
identical, until the :else
expression. Also, notice that I’ve not only
pre-compiled the vector itself, but also the length of the vector, and I’ve
inserted them into the result example as their literal values. (Pre-computing
the length of the vector was suggested by Holger Durer in the comments to the
last posting.)
Before we start writing the macro, let’s look at the expected output for the second example above.
;(cond-ends? st (make-stemmer "names") ; "ing" (do (println "ends with -ing") st)) ;; Set up the environment (let [stemmer# (make-stemmer "names"), word# (subword stemmer#)] ;; -ing (let [j# (- (count word#) 3)] (if (and (pos? j#) (= (subvec word# j#) [\i \n \g])) (let [st (assoc stemmer# :index (dec j#))] (do (println "ends with -ing") st)) ;; default :else stemmer#)))
One of the things to notice about these output expressions is that a lot of
the state for it can be computed once, and used throughout all the
conditionals in the cond-ends?
body. That observation will be useful later.
The Macro
Looking at the output above, we really have three tasks: set up the
environment for the cond-ends?
in the outer let
; for each test generate a
let
/if
construction; and finally generate an :else
value, maybe using
the default.
One way to tackle this problem would be to separate it into two macros:
One macro sets up the environment and calls another macro to build the test expressions;
The second macro builds one
let
/if
pair, similar to whatif-ends?
did, for one test expression, then it calls itself on the rest of the test expressions. It does this until it reaches the last test expression and that test is:else
or until it is out of test expressions.
Let’s start with the second macro. That will allow us to see what the first macro needs to provide to the test macro, and it has to be defined first anyway, because that’s the way Clojure likes things.
Generating the Tests
The test macro will be overloaded. Either it will take one ending/expression
pair, or it will take many. If there is only one, the ending could be an
:else
, or it could be the last ending to test for, and the :else
will be a
default. So the first, shorter version of this will return one of these two
constructions (taken from above):
;;; With :else ;; :else (let [st stemmer#] (do (println "no ending") st)) ;;; With a test + the default :else ;; -ing (let [j# (- (count word#) 3)] (if (and (pos? j#) (= (subvec word# j#) [\i \n \g])) (let [st (assoc stemmer# :index (dec j#))] (do (println "ends with -ing") st)) ;; default :else stemmer#))
Another point to make about the test expressions is that—as an astute commenter pointed out (thanks, Holger)—there is a lot of duplication between the two versions of the macro that builds the expressions. We can capture that duplication in a function and just call that function with the difference between the two structures. The function will then create the macro’s output structure.
All right. Let’s roll up our sleeves. The helper function will be called
make-cond-ends-test
. It take the variable symbol, the stemmer variable, an
ending string, a true expression, and a false expression.
(defn make-cond-ends-test [var stemmer word end true-expr false-expr] (let [vend (vec end)] `(let [j# (- (count ~word) ~(count vend))] (if (and (pos? j#) (= (subvec ~word j#) ~vend)) (let [~var (assoc ~stemmer :index (dec j#))] ~true-expr) ~false-expr))))
Notice that this function requires several parameters beyond the ending
(end
) and the result expressions (true-expr
and false-expr
)? It also has
var
, stemmer
, and word
. But the stemmer and the subword should have
been stored in variables in the first, outer let
by this point. That’s
right. In this function, these are the variables that those values were
stored in, represented here as symbols. Throughout this helper function,
~stemmer
inserts the variable that the stemmer will have been assigned to in
the outer let
.
Now we can define the secondary macro. We’ll call it cond-ends-helper
,
because I’m not feeling creative. It’s main purpose is to examine the input
and call make-cond-ends-test
with the appropriate false expression. The
first overload for it will handle the last expression by testing for :else
and deciding whether to use the default :else
:
(defmacro cond-ends-helper "This helps cond-ends? by processing the " test-exprs pairs in cond-ends?'s environment." ([var stemmer word end true-expr] (if (= end :else) `(let [~var ~stemmer] ~true-expr) (make-cond-ends-test var stemmer word end true-expr stemmer))) ;; The overload will go here ... )
What happens here? First, at compile time (before the quasi-quote or call to
make-cond-ends-test
), it tests whether the ending is :else
. If it is, it
just outputs the shorter version of the ending. If it isn’t :else
, it calls
the helper function, with all the same parameters plus the false expression as
a data structure. In this case, the false expression is the stemmer variable.
If your head is starting to hurt, take a moment. Unless you have programmed in
lisp before, this isn’t like any programming you’ve ever done. We’re writing a
program to write programs. Like any meta-activity, it can make your brain
explode. Unfortunately, this is a particularly convoluted macro, because it
has these two levels of macro, plus a function. Don’t worry if you don’t quite
get it at first. There’s nothing wrong with copying and pasting the macro’s
code into your source file and moving on. Start building macros like debug
and if-ends?
, and when you’re comfortable there, come back and read this
through again. It will make a lot more sense.
Now, if you’re still with me, here’s the overloaded version of the helper
macro. This takes an ending/expression pair and a list of the rest of the
pairs in the expression. It then calls make-cond-ends-test
with a new
version of
(defmacro cond-ends-helper "This helps cond-ends? by processing the test-exprs pairs in cond-ends?'s environment." ;; The earlier override goes here.... ([var stemmer word end true-expr & more] (make-cond-ends-test var stemmer word end true-expr `(cond-ends-helper ~var ~stemmer ~word ~@more))))
In some ways, this part of cond-ends-helper
is much simpler. There are just
a few things to comment on.
In the parameter list,
& more
collects the values of the rest of the parameters and stores them in a variable I’ve namedmore
. This works with functions too.The false expression that this passes to
make-cond-ends-test
begins with a quasi-quote (back quote). This just generates an expression to pass as the false expression. This expression gets inserted into the expression thatmake-conds-ends-test
creates.This quasi-quoted expression, when finally evaluated, is a macro call to the macro that created it. It starts the process all over again, passing the new macro expansion the values of
var
,stemmer
, andword
unchanged, and splicing all of the rest of the parameters that haven’t been handled yet into the expression using~@more
.
Creating the Environment
So we know that the cond-ends?
sets up the environment by creating an
expression like this:
;; Set up the environment (let [stemmer# (make-stemmer "names"), word# (subword stemmer#)] ; ... )
And it replaces the ellipses by calling the macro cond-ends-helper
with the
variable and the variables stemmer#
and word#
. Put like that, it almost
writes itself.
(defmacro cond-ends? "This is the same as a stacked series of if-ends?. This just sets up the environment for cond-ends-helper." [var stemmer & test-exprs] `(let [stemmer# ~stemmer, word# (subword stemmer#)] (cond-ends-helper ~var stemmer# word# ~@test-exprs)))
Notice that we use & test-exprs
to collect all of the test expressions, and
we use ~@test-exprs
to splice them into the cond-ends-helper
call.
Summary
For posterity’s sake, here is all the code for the cond-ends?
macro, in one
place:
(defn make-cond-ends-test [var stemmer word end true-expr false-expr] (let [vend (vec end)] `(let [j# (- (count ~word) ~(count vend))] (if (and (pos? j#) (= (subvec ~word j#) ~vend)) (let [~var (assoc ~stemmer :index (dec j#))] ~true-expr) ~false-expr)))) (defmacro cond-ends-helper "This helps cond-ends? by processing the test-exprs pairs in cond-ends?'s environment." ([var stemmer word end true-expr] (if (= end :else) `(let [~var ~stemmer] ~true-expr) (make-cond-ends-test var stemmer word end true-expr stemmer))) ([var stemmer word end true-expr & more] (make-cond-ends-test var stemmer word end true-expr `(cond-ends-helper ~var ~stemmer ~word ~@more)))) (defmacro cond-ends? "This is the same as a stacked series of if-ends?. This just sets up the environment for cond-ends-helper." [var stemmer & test-exprs] `(let [stemmer# ~stemmer, word# (subword stemmer#)] (cond-ends-helper ~var stemmer# word# ~@test-exprs)))
In the next posting, I’ll to wrap up macros by pointing out how they’re used and why.
4 comments:
Hi Eric,
First of all, the obligatory, but extremely well-deserved, thanks for this series. It's *really* interesting! I've just discovered it and read all the posts up to here in the last couple of days.
Not being steeped in macros, perhaps I oughtn't comment, but it seems to me that:
1. if-ends? should make use of the ends? function rather than include the logic directly in the macro
2. cond-ends? should use cond (and ends?) rather than tackling the harder task of building nested lets and ifs.
The reason I suggest this is to make it easier for beginners to understand. If there are performance reasons to do as much work as you've done, that's probably not enough justification in a pedagogical example. If there are "correctness" reasons for it, then maybe they should be discussed.
Thanks again. Looking forward to reading the rest, and any other series you feel like producing :)
Gavin Sinclair
Hi Gavin,
Glad you're getting something out of it. Unfortunately, it's far from finished at this point, although it does have a lot of useful information. (I also need to update the code to reflect the latest changes in Clojure, but that's another story.)
Now for your questions. (I've answered them together, since they're really the same question.)
I'm not sure that using ends? would make cond-ends? that much simpler, since ends? isn't really a straight predicate function. You'd still have to test at every stage. You might be able to wrap ends? in another function and use "or" to short-cut the tests. That would work, although it might also be a little too clever. You'd also still need to test for the ":else" condition.
You're right, though, the primary reason why I approached things this way was for performance. It does also illustrate one of the advantages of macros, though: they can "insert" expressions into the code at compile time, so that whenever it's executed, it's just a constant. For example, in "cond-ends?", the length of each of the endings is pre-computed ("~(count vend)") and inserted into the resulting expression as a constant integer. Similarly, the ending vector itself is a constant, although I'd have to think about whether that gains anything or not.
Pedagogically, I really should have stuck with a simpler stemming algorithm, as someone suggested (and you commented on) on another posting. That's one of the joys of blogs or any other serial publishing format: by the time realize what a thicket you're in, you may be too far along to turn back.
But you're probably right, using a simpler macro here would have made the explanation much simpler, too.
Glad you're enjoying it, though.
Eric
Thanks for your detailed response. One unfortunate thing I find about Clojure is that it's often hard for me to understand unless I'm writing it (and thereby experimenting at the REPL). Accordingly, I don't fully follow what you've written because it's been some time (a week or so) since I read and grokked the code involved. The solution to this is for me to type in all the code and try to implement if-ends? and cond-ends? differently myself. I'll let you know how I go (won't be very soon though).
Cheers,
Gavin
LOL. I know exactly what you mean. It's been months since I last looked at this code, and it took me a while to remember what i was thinking at the time. I'm still not sure that I did.
And I started to implement the suggestions I made in my last post, but I didn't really have time right then.
Life does sometimes get in the way.
Eric
Post a Comment