Tokenization, Part 3: Functions
In the last post, we saved the regular expression that we used to tokenize a string to a variable. But it would be more convenient to be able to save the entire tokenization procedure to a variable. Pretty much all programming languages let us save a series of statements or expressions—a function—to evaluate later. How does Clojure do this?
In fact, creating a function looks a lot like creating a variable. First, start Clojure and make sure that
token-regex is still defined:
user=> (def token-regex #"\w+") #'user/token-regex
Next, define the function, only instead of using
user=> (defn tokenize-str [input-string] (re-seq token-regex input-string)) #'user/tokenize-str
Let’s break that apart:
defnindicates that we’re defining a function, not a variable.
tokenize-stris the name of the function. Functions and variables use the same set of names, so naming a variable
tokenize-strwill get rid of the function named
tokenize-str, and vice versa.
[input-string]is a square-bracket-delimited list of the parameters that this function accepts. In the case of
tokenize-str, it takes one argument, named
input-string. Expressions inside the function can refer to the value passed into the function using that name.
- After you type in that line and hit enter, nothing will happen. The first parenthesis before
defnis still open, so the Clojure REPL knows you’re not finished yet. You’ll need to enter the second line to continue.
- The second line is just the
re-seqfunction with both arguments as variables, like we used in the last posting. One variable is the regular expression from the previous
def, and one is
input-stringfrom the function definition.
- Functions return the value of their last expression. In this case, that is the function call to
Now let’s give it a try:
user=> (tokenize-str "This is a new input string with different tokens.") ("This" "is" "a" "new" "input" "string" "with" "different" "tokens")
Sure enough. Now calling
(tokenize-str ...) is the same as calling
(re-seq token-regex ...).
Saving Your Work
We’re starting to get enough code that typing it in every time we want to use it would be painful, inefficient, and worst of all, boring. Fortunately, like most other programming language, Clojure lets us save expressions to a file to execute all at once.
To do this, open your text editor and create a new file. Let’s call it
word.clj and save it in whatever directory you’re currently working in. Next enter in all the code we’ve entered so far:
(def token-regex #"\\w+") (defn tokenize-str [input-string] (re-seq token-regex input-string))
Now switch back to the Clojure REPL and load this file using the
user=> (load-file "word.clj") #'user/tokenize-str
After loading the file, Clojure prints the result of the last expression in the file. In this case, that is the expression defining the
We can use the variables and functions defined in that file, just as if we had typed them into the REPL:
user=> (tokenize-str "Another input string.") ("Another" "input" "string")
Also, if you find any bugs, you can let me know using the issues tracker there.
Next time we’ll improve the tokenization and talk about how to organize our code better.