Friday, September 12, 2008

Stemming, Part 19: Debugging

Before I leave the Porter Stemmer behind, I want to show you some of the tools I used to debug the code as I went along.

There are some more modern options for debugging Clojure than what I'm presenting here. (Search the mailing list for details.) Personally, I generally use print statements for debugging. It's primitive, but effective. In some languages, it can also be painful. Fortunately, lisp languages take much of the pain out of print-debugging.

Tracing

One common way to debug programs is to follow when a function is called and returns. This is called tracing, and this function and macro handle that.

(defn trace-call
  [f tag]
  (fn [& input]
    (print tag ":" input "-> ") (flush)
    (let [result (apply f input)]
      (println result) (flush)
      result)))

trace-call returns a new function that prints the input arguments to a function, calls the function, prints the result, and returns it. It takes the function and a tag to identify what is being traced.

(defmacro trace
  [fn-name]
  `(def ~fn-name (trace-call ~fn-name '~fn-name)))

The trace macro is syntactic sugar on trace-call. It replaces the function with a traced version of it that uses its own name as a tag. For example, this creates and traces a function that upper-cases strings:

user=> (defn upper-case [string] (.toUpperCase string))
#'user/upper-case
user=> (upper-case "name")
"NAME"
user=> (trace upper-case)
#'user/upper-case
user=> (upper-case "name")
upper-case : (name) -> NAME
"NAME"

The debug Macro

Another common trick in print-debugging is to print the value of an expression. The macro below evaluates an expression, prints both the expression and the result, and returns the result.

(defmacro debug
  [expr]
  `(let [value# ~expr]
     (println '~expr "=>" value#)
     (flush)
     value#))

For example:

user=> (debug (+ 1 2))
(+ 1 2) => 3
3

Lisp macros are especially helpful here, because they allow you to treat the expression both as data to print and as code to evaluate.

The debug-stem Function

This function is a debugging version to stem. It uses binding to replace all the major functions of the stemmer with traced versions of them.

(We'll talk more about binding later, when we deal with concurrency. Right now, just understand that binding changes the value of a top-level variable, like a function name, with a new value. But the variable only has that value for the duration of the binding. Afterward, it is returned to its former value.)

(defn debug-stem
  [word]
  (binding [stem (trace stem),
            make-stemmer (trace make-stemmer),
            step-1ab (trace step-1ab),
            step-1c (trace step-1c),
            step-2 (trace step-2),
            step-3 (trace step-3),
            step-4 (trace step-4),
            step-5 (trace step-5)]
    (stem word)))

That's it. These were the main functions I used in debugging the stemmer as I ported it from C and made it more Clojure-native.

Next up, we'll create a concordance and look at other ways of presenting the texts that we're analyzing.

By the way, I've also finally updated the repository for sample code.

Thursday, September 11, 2008

I'm Ba-a-ack

Boy, but doesn't life just get in the way sometimes?

On the other hand, I have also let it keep me from posting. I was getting so bored with the Clojure tutorial series. I hate to think how tired you must have been with it.

But I'm back, and I'm going to make some changes.

  1. I've put a link to the table of contents for the Clojure tutorial in the sidebar. No matter how tired I am of it, it's still the main content on here.

  2. I'm going to continue the Clojure tutorial, but the pace won't be quite as relentless as it was. Hopefully, I'll be able to inject a little more energy into it, and it won't be quite as boring.

  3. I'm going to intersperse the tutorial with some other postings. I'll catch you up on what I've been doing, as well as talk about some other things that have caught my interest.

That's it. The moral: When you're getting tired, take a break and retool.