So far in this Clojure tutorial/NLP tutorial, we’ve mainly been looking at Clojure’s functions, but the last posting actually included a good chunk of code for the Porter Stemmer. Today, we’ll review functions. Some of this we’ve seen before, but some of it will be new.
Functions
As with any functional language, functions are the building blocks of Clojure. I’m going to briefly summarize what we’ve learned about functions so far, and then we’ll explore one important topic—recursion—in more detail.
Defining Functions
Generally, functions are defined using defn
, followed by the name of the
function, a vector listing the parameters it takes, and the expressions in the
body of the function. You can also include a documentation string between the
function name and the parameter vector:
user=> (defn greetings "Say hello." [name] (str "Hello, " name)) #'user/greetings user=> (greetings 'Eric) "Hello, Eric"
(The str
function here just creates strings out of all its arguments and
concatenates them together.)
Higher-Order Functions
Higher-order functions are functions that take other functions as values.
These may use the other function in a calculation, or they may create a new
function from the existing one. For example, map
calls a function on each
element in a sequence:
user=> (map greetings '(Eric Elsa)) ("Hello, Eric" "Hello, Elsa")
complement
, however, creates a new function that is equivalent to (not
(original-function *args...*))
:
user=> (def not-zero? (complement zero?)) #'user/not-zero? user=> (not-zero? 0) false user=> (not-zero? 1) true
Function Literals
Sometimes, particularly when you’re using a higher-order function, you may
want to create a short function just for that one place, and giving it a name
would clutter up your program. Or you may want to define a function inside
another function, in a let
expression, to using only within that function.
For either of these, use a function literal. A function literal looks like a
regular function definition, except instead of defn
, use fn
; the name is
optional; and generally it cannot have a documentation string. For example,
suppose that greeting
, which we defined above, doesn’t exist, and we want to
greet a list of people. We could do this by creating a function literal and
passing it to map
.
user=> (map (fn [n] (str "Hello, " n)) '(Eric Elsa)) ("Hello, Eric" "Hello, Elsa")
Or, we could temporarily define greet
using let
, and pass that to map
.
user=> (let [greet (fn [n] (str "Hello, " n))] (map greet '(Eric Elsa))) ("Hello, Eric" "Hello, Elsa")
Overriding Functions
I’ve already mentioned that Clojure allows you to provide different versions of a function for different argument lists. Do this by grouping each set of parameter vector and expressions in its own list. The documentation string, if there is one, comes before all of the groups:
user=> (defn count-parameters "This returns the number of parameters passed to the function." ([] 0) ([a] 1) ([a b] 2) ([a b c] 3) ([a b c d] 4) ([a b c d e] 5)) #'user/count-parameters user=> (count-parameters 'p0 'p1 'p2) 3 user=> (count-parameters) 0
This also works in function literals. This creates a shortened version of
count-parameters
called cp
and calls it twice within a vector. The two
calls are collected in a vector because let
only returns one value, and we
want to see the value of both calls to cp
.
user=> (let [cp (fn ([] 0) ([a] 1) ([a b] 2) ([a b c] 3))] [(cp 'p0 'p1) (cp)]) [2 0]
Recursion
Many problems can be broken into smaller versions of the same problem.
For example, you can test whether a list contains a particular item by looking at the first element of the list. At each place in the list, ask yourself: is the list empty? If it is, the item is not in the list. However, if the first element is what you’re looking for, good; if it’s not, strip the first element off the list and start over again.
This way of attacking problems—by calling the solution within itself—is called recursion. Recursive problems are fundamental to functional programming, so let’s look at it in more detail.
General Recursion
The main pitfall in creating a recursive problem is to make sure that it will end eventually. To do this, you need to make sure that you have an end condition. In the list-membership problem I described above, the end conditions are the empty list and the first element being the item you are looking for.
Let’s look at what this would look like in Clojure.
user=> (defn member? [sequence item] (if (not (seq sequence)) nil (if (= (peek sequence) item) sequence (member? (pop sequence) item)))) #'user/member?
In this, the first if
tests whether the sequence is empty (that is, if seq
cannot create a sequence out of it; if it is, it returns nil
, which
evaluates as false
. The second if
tests whether the first element in the
sequence is what we’re looking for; if it is, this returns the sequence at
that point. This is a common idiom in Clojure. Since the sequence has at least
one item, it will evaluate as true, fulfilling its contract as a test, but it
returns more information than that. This function is useful for more than
just determining whether an item is in a sequence: it also finds the item and
returns it. Finally, if the first element is not the item being sought, this
*calls member?
again with everything except the first item of the list*.
This is the recursive part of the function. Let’s test it.
user=> (member? '(1 2 3 4) 2) (2 3 4) user=> (member? '(1 2 3 4) 5) nil
Sometimes it’s helpful to map this out. In the diagram below, a call within another call is indented under it. A function call returning is indicated by a “=>” and a value at the same level of indentation as the original call. Here’s the sequence of calls in the first example above.
(member? '(1 2 3 4) 2) (member? '(2 3 4) 2) => '(2 3 4) => '(2 3 4)
The second example call to member?
would graph out like this:
(member? '(1 2 3 4) 5) (member? '(2 3 4) 5) (member? '(3 4) 5) (member? '(4) 5) (member? '() 5) => nil => nil => nil => nil => nil
In this case, we’re just returning the results of the membership test back unchanged, but we could modify the results as they’re being returned. For example, if we wanted to count how many times as item occurs in a sequence, we could do this:
user=> (defn count-item [sequence item] (if (not (seq sequence)) 0 (if (= (peek sequence) item) (inc (count-item (pop sequence) item)) (count-item (pop sequence) item)))) #'user/count-item
In many ways, this is very similar to member?
. It first tests whether the
sequence is empty, and if it is, it returns zero. If the first element is the
item, this calls itself (it recurses) and increments the result by one, to
count the current item. Otherwise, it recurses, but it does not increment the
result. Let’s see this in action.
user=> (count-item '(1 2 3 2 1 2 3 4 3 2 1) 2) 4 user=> (count-item '(1 2 3 2 1 2 3 4 3 2 1) 3) 3 user=> (count-item '(1 2 3 2 1 2 3 4 3 2 1) 5) 0
Here, the first example (in shortened form) graphs out like so:
(count-item '(1 2 3 2 1) 2) (count-item '(2 3 2 1) 2) (count-item '(3 2 1) 2) (count-item '(2 1) 2) (count-item '(1) 2) (count-item '() 2) => 0 => 0 => 1 => 1 => 2 => 2
You see that the results gets incremented as the computer leaves each function call where 2 is the first element in the input list.
Tail-Recursive Functions
While these two examples of recursion are superficially similar, on a deeper
level they are very different. In member?
, once you’ve computed the result,
you can return it immediately back to the function that originally called
member?
. If Clojure had some way to jump completely out of a function from
any level, you could do this. On the other hand, when count-item
recurses,
it is not finished with its calculation yet. It has to wait for itself to
return and possibly add one to the result. (Sentences like that remind me why
recursion can be confusing. Don’t worry. Eventually your brain will get used
to being twisted into a pretzel.)
There’s a term to describe functions like member?
that are finished with
their calculations when they recurse. It’s called tail-call recursion or
tail-recursive. This is a very good quality. The computer
can optimize these calls to make them very fast and very efficient.
But the Java Virtual Machine doesn’t recognize tail recursion on its own, so
Clojure needs a little help to make these optimizations. You signal a
tail-recursive function call by using the recur
built-in instead of the
function name when you recurse. Thus, we could re-write member?
to be
tail-recursive like this.
user=> (defn member? [sequence item] (if (not (seq sequence)) nil (if (= (peek sequence) item) sequence (recur (pop sequence) item)))) #'user/member?
See the recur
in the last line? That’s all that has changed.
This should work exactly the same (and it does—try it), but for long lists, it should be much more efficient.
In fact, it’s often worth putting in a little extra work to make
non-tail-recursive functions tail-recursive. There’s a straightforward
transformation you can use to make almost any function tail recursive. Just
add an extra parameter and use it to accumulate the results before you make
the recursive function call. The first time you call the function, you need to
pass the base value into the function for that parameter. For example,
count-item
with tail recursion would look like this:
user=> (defn count-item [sequence item accum] (if (not (seq sequence)) accum (if (= (peek sequence) item) (recur (pop sequence) item (inc accum)) (recur (pop sequence) item accum)))) #'user/count-item
The difference here is the accum
parameter. When item
equals the first
element of the sequence, accum
is incremented before the recursive
functional call is made. When the function starts again on the shorter list,
accum
is incremented. Finally, when the end of the list is reached, accum
is returned. It contains the counts accumulated as count-item
walked down
the sequence.
Of course, now we have to call count-item
differently also:
user=> (count-item '(1 2 3 2 3 4 3 2 1) 2 0) 3 user=> (count-item '(1 2 3 2 3 4 3 2 1) 1 0) 2 user=> (count-item '(1 2 3 2 3 4 3 2 1) 5 0) 0
Whenever we call count-item
, we have to include a superfluous zero that we
really don’t care about. That seems messy and error-prone. How can we get rid
of it?
Essentially, we want to hide this version of count-item
and replace it
with a new version that handles the extra zero for us. Then, we call the new
version and forget about this one. There are several ways to actually do this:
-
Have the public function named
count-item
and create a private version named something likecount-item-
; -
Use
let
to define the private function insidecount-item
; or -
Use
loop
.
loop
? Yes, this is new. loop
is a cross between a function call and let
.
It looks a lot like let
because it allows you to define variables. But it
also acts as a target for recur
. How would count-item
look with loop
?
(defn count-item [sequence item] (loop [sq sequence, accum 0] (if (not (seq sq)) accum (if (= (peek sq) item) (recur (pop sq) (inc accum)) (recur (pop sq) accum))))) #'user/count-item
Notice that the first line of loop
looks a lot like the first line of let
.
Both declare and initialize a series of variables. Just to keep things clear,
I’ve renamed sequence
to sq
within the loop
. Also, item
isn’t included
in the list of variables that loop
declares, since it doesn’t change.
Finally, at the end of the loop
are two recur
statements. When they are
evaluated, they cause the program to jump back to the loop
statement, but
this time, instead of the original values used to initialize the loop
variables, the values in the recur
call are used.
Now we can again call count-item
without the extra parameter:
user=> (count-item '(1 2 3 2 3 4 3 2 1) 2) 3 user=> (count-item '(1 2 3 2 3 4 3 2 1) 1) 2 user=> (count-item '(1 2 3 2 3 4 3 2 1) 5) 0
Stemmer Utility
With recursion, we can define another utility to use on the stemmer
structures. This function will take a predicate and a stemmer. For each step,
it will test the stemmer with the predicate, and if true, it will pop one
character from the word and recurse. If there are no letters left in the word
or if the predicate evaluates to false, it will return the stemmer the way it
is:
(defn pop-stemmer-on "This is an amalgam of a number of different functions: pop (it walks through the :word sequence using pop); drop-while (it drops items off while testing the sequence against drop-while); and maplist from Common Lisp (the predicate is tested against the entire current stemmer, not just the first element)." [predicate stemmer] (if (and (seq (:word stemmer)) (predicate stemmer)) (recur predicate (pop-word stemmer)) stemmer))
I’m ready for a break. Next time, We’ll look at some more of the functions that Clojure provides, and we’ll define a few predicates of our own.
3 comments:
Wonderful tutorial Eric,
I'm having a great time reading it.
Hi Eric,
I'm just wondering, in the pop-stemmer-on example, why did u usesd seq in the if condition instead of (not (empty? ...)) ...??
Hi rdsr.
I used seq for a few reasons.
First, it's the idiom used in the Clojure community. It probably is the idiom for a couple of reasons....
Second, it's shorter to type.
Third, if you look in clojure/core.clj, "empty?" is defined as "(not (seq ...))". So "(not (empty? ...))" would become "(not (not (seq ...)))". I'm just being more direct.
Glad you're liking this.
Eric
Post a Comment