A short intro into macros
The last couple of days I was playing with macros in Clojure and figured out that nobody seems to have written a proper introduction on how they work. Well, I’m not gonna do it properly either, but maybe it’ll help you anyway.
If you’ve never been using macros, that’s completely fine, since after years of using macros, I haven’t seen this many situation where I’d want to use one. Rule of thumb: if you can do it in a different way (supply a function as argument), do it this way.
Macros can be used where functions fail. Think of them as functions which don’t evaluate their arguments but they just get the argument as data. In the case of Clojure, a list of course.
Let’s start with a complete nonsense, yet illustrative macro:
1 2 3 |
|
So we defined a macro m
which takes an argument, named x
and then what?
Well, a macro is like a function, it returns it’s body. It’s body is a list
which consists of a call to list
and it’s argument x
. So, this macro
expands to (list x)
and could of course be expressed as a function (I
mentioned that this example is nonsensical, did I?). The location (m 42)
is
therefore essentially replaced by (list 42)
. What happens then? The macro is
done. After that, (list 42)
evaluates to (42)
. Here’s your result.
Now, let’s try to write a macro that makes more sense. How about, writing our
own simple or
operator?
1 2 3 4 5 6 7 |
|
This looks correct enough, but what when we have side-effects? The way or
is
usually implemented, if the first argument is truthy, the check for the second
is skipped. How about checking it?
1 2 3 4 5 6 7 |
|
Well, this didn’t work, it called my-test
without the need to. We don’t
want to evaluate our arguments, so what we need is a macro. Let’s think what
kind of code we plan to generate first:
1 2 |
|
Why the let
? Cause we only want to evaluate first-arg once and if we used
first-arg
more than once (my-test)
would be called multiple times. So we
just cache the result after the first time. Then we just generate an if
and
insert these values.
Step by step, if we do
1 2 3 4 |
|
Let’s write the macro for this:
1 2 3 4 5 6 7 8 9 |
|
So, it seems to be working. If we call macroexpand-1
(expand the outermost
macro, which is our my-or
macro) we see that this code was generated and
evaluated.
Now you might be wondering about a few things, most notably the backticks, the tilde and the hash sign, spread through the code, all random-like. Also, the super weird underscores in the variable names.
As you saw, macros basically replace the macro call with something that was generated by the macro. The data generated by the macro is a list. Now, a macro that returns always the same data is boring, so we’d like to enter the data from the arguments. We could modify the data by ourselves, but we can also use a template mechanism that is built into Clojure. The backtick is called “syntax-quote” (other Lisps like Scheme call this “quasi-quote”), which is like a normal quote but allows to enter variables:
1 2 3 4 |
|
What we saw is that the +
operator was fully qualified and the a
was
replaced by it’s value, because of the ~
, which is called “unquote”.
For the rest of the question, this is where it get’ interesting.
Clojure and macro hygiene
Go back to our stupid macro from the beginning:
1 2 3 |
|
We defined a macro that uses list
. But as macros are expansions, what will
happen when we use a macro within a code region that is redefining list
, for
whatever reason?
1 2 3 4 5 6 7 |
|
Now that would suck, right? This is the problem that Lisp macros usually have, they capture the scope of the point where they were called (think dynamic scoping), not where they were defined (think lexical scoping). Various Lisps have solved this problem in different ways, you can find a lot about this on the internet. Clojure has a somewhat unique solution: syntax-quote does not only quote things, it also adds the namespace to things:
1 2 |
|
So, the call above expands to this:
1 2 3 4 5 6 |
|
And this works. So, every time you have a symbol in your syntax-quote, it get’s its namespace added. This also happens for identifiers, so
1 2 3 4 5 |
|
Notice, how the user/m2-result
variable was captured.
Ok, now on to that #
sign, what’s the deal with this? Again, consider that a
macro expands in the place where it is called, thus it inherits all bindings of
its parent forms.
1 2 3 4 5 6 7 8 9 10 |
|
The result would be wrong, since the inner use of evaluated-first-arg
overwrites the outer evaluated-first-arg
so from 42
we go to false
. This
is called accidental variable capture. Again, other Lisps have encountered this
exact problem a long time ago, and there are different solution. One easy way,
might be to use a binding name in let
that is difficult to guess. Something
like pnsndltn
or just 40 f
s.
But these are first some awful variable names and second still not guaranteed
to be unique. For this, Lisps have facilities to generate unique variable
names, sometimes called gensym
, which is exactly what the #
does. Thus if
you recall the earlier macro expansion:
1 2 3 |
|
We see that it generated a unique name that is guaranteed not to clash. Win.
Conclusion
And this concludes our short excurs into Clojure macros. In short:
- Macros in Clojure just expand to some list structure that is evaluated
- Clojure macros are hygienic (IMHO)
- They do so by prefacing captured bindings with namespaces
- New bindings can be generated via a gensym-like mechanism