The Hyperpessimist

The grandest failure.

Clojure and Hygienic Macros

A short intro into macros

The last couple of days I was playing with macros in Clojure and figured out that nobody seems to have written a proper introduction on how they work. Well, I’m not gonna do it properly either, but maybe it’ll help you anyway.

If you’ve never been using macros, that’s completely fine, since after years of using macros, I haven’t seen this many situation where I’d want to use one. Rule of thumb: if you can do it in a different way (supply a function as argument), do it this way.

Macros can be used where functions fail. Think of them as functions which don’t evaluate their arguments but they just get the argument as data. In the case of Clojure, a list of course.

Let’s start with a complete nonsense, yet illustrative macro:

1
2
3
(defmacro m [x] `(list ~x))`)
(m 42)
;;=> (42)

So we defined a macro m which takes an argument, named x and then what? Well, a macro is like a function, it returns it’s body. It’s body is a list which consists of a call to list and it’s argument x. So, this macro expands to (list x) and could of course be expressed as a function (I mentioned that this example is nonsensical, did I?). The location (m 42) is therefore essentially replaced by (list 42). What happens then? The macro is done. After that, (list 42) evaluates to (42). Here’s your result.

Now, let’s try to write a macro that makes more sense. How about, writing our own simple or operator?

1
2
3
4
5
6
7
(defn trivial-or [a b]
  (if a a b))

(trivial-or false true)
;;=> true
(trivial-or false false)
;;=> false

This looks correct enough, but what when we have side-effects? The way or is usually implemented, if the first argument is truthy, the check for the second is skipped. How about checking it?

1
2
3
4
5
6
7
(defn my-test []
  (println "Hello")
  false)

(trivial-or true (my-test))
;;=> Hello
;;=> true

Well, this didn’t work, it called my-test without the need to. We don’t want to evaluate our arguments, so what we need is a macro. Let’s think what kind of code we plan to generate first:

1
2
(let [evaluated-first-arg first-arg]
  (if evaluated-first-arg evaluated-first-arg second-arg))

Why the let? Cause we only want to evaluate first-arg once and if we used first-arg more than once (my-test) would be called multiple times. So we just cache the result after the first time. Then we just generate an if and insert these values.

Step by step, if we do

1
2
3
4
(trivial-or true (my-test))
; should turn into
(let [evaluated-first-arg true]
  (if evaluated-first-arg evaluated-first-arg (my-test))

Let’s write the macro for this:

1
2
3
4
5
6
7
8
9
(defmacro my-or [first-arg second-arg]
  `(let [evaluated-first-arg# first-arg]
    (if evaluated-first-arg# evaluated-first-arg# second-arg)))

(my-or true (my-test))
;;=> true
(macroexpand-1 '(my-or true (my-test)))
;;=> (clojure.core/let [evaluated-first-arg__413__auto__ user/first-arg]
  (if evaluated-first-arg__413__auto__ evaluated-first-arg__413__auto__ user/second-arg))

So, it seems to be working. If we call macroexpand-1 (expand the outermost macro, which is our my-or macro) we see that this code was generated and evaluated.

Now you might be wondering about a few things, most notably the backticks, the tilde and the hash sign, spread through the code, all random-like. Also, the super weird underscores in the variable names.

As you saw, macros basically replace the macro call with something that was generated by the macro. The data generated by the macro is a list. Now, a macro that returns always the same data is boring, so we’d like to enter the data from the arguments. We could modify the data by ourselves, but we can also use a template mechanism that is built into Clojure. The backtick is called “syntax-quote” (other Lisps like Scheme call this “quasi-quote”), which is like a normal quote but allows to enter variables:

1
2
3
4
(let [a 42] `(+ 1 ~a))
;;=> (clojure.core/+ 1 42)
(eval (let [a 42] `(+ 1 a)))
;;=> 43

What we saw is that the + operator was fully qualified and the a was replaced by it’s value, because of the ~, which is called “unquote”.

For the rest of the question, this is where it get’ interesting.

Clojure and macro hygiene

Go back to our stupid macro from the beginning:

1
2
3
(defmacro m [x] `(list ~x))`)
(m 42)
;;=> (42)

We defined a macro that uses list. But as macros are expansions, what will happen when we use a macro within a code region that is redefining list, for whatever reason?

1
2
3
4
5
6
7
(let [list '(1 2 3)]
  (m 42))

; would it expand to this?
(let [list '(1 2 3)]
  (list 42))
;;=> ClassCastException clojure.lang.PersistentList cannot be cast to clojure.lang.IFn

Now that would suck, right? This is the problem that Lisp macros usually have, they capture the scope of the point where they were called (think dynamic scoping), not where they were defined (think lexical scoping). Various Lisps have solved this problem in different ways, you can find a lot about this on the internet. Clojure has a somewhat unique solution: syntax-quote does not only quote things, it also adds the namespace to things:

1
2
`(list 1 2 3)
;;=> (clojure.core/list 1 2 3)

So, the call above expands to this:

1
2
3
4
5
6
(let [list '(1 2 3)]
  (m 42))

(let [list '(1 2 3)]
  (clojure.core/list 42))
;;=> 42

And this works. So, every time you have a symbol in your syntax-quote, it get’s its namespace added. This also happens for identifiers, so

1
2
3
4
5
(defn m2-result 42)
(defmacro m2 [] `(list m2-result))

(macroexpand-1 '(m2))
;;=> (clojure.core/list user/m2-result)

Notice, how the user/m2-result variable was captured.

Ok, now on to that # sign, what’s the deal with this? Again, consider that a macro expands in the place where it is called, thus it inherits all bindings of its parent forms.

1
2
3
4
5
6
7
8
9
10
(let [evaluated-first-arg 42]
  (my-or false evaluated-first-arg))
;; would expect my-or to return 42, since first-arg is false

;; now if it expanded without the '#'
(let [evaluated-first-arg 42]
  (let [evaluated-first-arg false]
    (if evaluated-first-arg evaluated-first-arg evaluated-first-arg)))
;;=> false
;; oh, not at all what we wanted.

The result would be wrong, since the inner use of evaluated-first-arg overwrites the outer evaluated-first-arg so from 42 we go to false. This is called accidental variable capture. Again, other Lisps have encountered this exact problem a long time ago, and there are different solution. One easy way, might be to use a binding name in let that is difficult to guess. Something like pnsndltn or just 40 fs. But these are first some awful variable names and second still not guaranteed to be unique. For this, Lisps have facilities to generate unique variable names, sometimes called gensym, which is exactly what the # does. Thus if you recall the earlier macro expansion:

1
2
3
(macroexpand-1 '(my-or true (my-test)))
;;=> (clojure.core/let [evaluated-first-arg__413__auto__ user/first-arg]
  (if evaluated-first-arg__413__auto__ evaluated-first-arg__413__auto__ user/second-arg))

We see that it generated a unique name that is guaranteed not to clash. Win.

Conclusion

And this concludes our short excurs into Clojure macros. In short:

  • Macros in Clojure just expand to some list structure that is evaluated
  • Clojure macros are hygienic (IMHO)
  • They do so by prefacing captured bindings with namespaces
  • New bindings can be generated via a gensym-like mechanism