Emacs Lisp Profiling
Emacs lisp is slow. Like, really slow. Python can sum 1,000,000 numbers in less than a tenth of a second, while Emacs Lisp takes an order of magnitude longer.
="reduce(add, [5] * 100)",
timeit(stmt="from operator import add ; from functools import reduce",
setup=10000) # 0.079246 number
10000 (cl-reduce '+ (make-list 100 5))) ; 0.710435 (timeit
This means it is of utmost importance to write performant elisp code, as poorly performing elisp code is 10x more noticeable than poorly performing python code. Furthermore, elisp code is typically user-facing. Somebody is actively waiting for your code to finish running, and they probably can't do anything with their active buffer until it finishes, thanks to emacs' synchronous nature.
Below, I will profile some elisp code, disassemble some of emacs' C source, and hopefully clear up some common myths about writing performant elisp code.
Methodology
As with any performance benchmarks, absolute numbers are significantly less important than the relation between two benchmarks. However, if you are interested in how long emacs will take to do certain operations on your box, here are the specifications of the machine I'm performing these tests on:
Part | SKU |
---|---|
Processor | i3-5005U |
Memory | 4GB LPDDR3-1333 |
Emacs | 25.1 (Arch; -O2) |
Kernel | 4.9.6 |
We will be using this macro to test our code; It's a modified version of a macro from the emacs mailing list ca. 2006.
defmacro timeit (run-times &rest body)
("Measure the time it takes to evaluate BODY."
let ((time (current-time))
`(0))
(i < i ,run-times)
(while (
,@bodysetq i (1+ i)))
("%.06f" (float-time (time-since time))))) (message
Now that our specs and testing methodology have been covered, let's get to it. First, let's put the debate about "right" way to write a map in elisp to rest.
Profiling Maps
There are plenty of ways to write a generic map in emacs lisp - the most popular ways being the association list, or alist, property list, or plist, and hash tables. Some people swear by alists, while others stand by hash tables because of their C implementation and significantly better big-O runtimes. Others, like me, think plists look cleaner than both types, especially when the data is structurally complex.
setq hash #s(hash-table size 65 data (:top kek :kek bur :foo bar :baz quux :xyzzy ham)))
(setq plist '(:top kek :kek bur :foo bar :baz quux :xyzzy ham))
(setq alist '((:top . kek)
(
(:kek . bur)
(:foo . bar)
(:baz . quux)
(:xyzzy . ham)))
1000000 (plist-get plist :kek)) ; 0.254863
(timeit 1000000 (gethash :kek hash)) ; 0.287944
(timeit 1000000 (alist-get :kek alist)) ; 0.335358 (timeit
For those of you that don't know, #s
is a reader
macro which allows significantly easier definition of hash
tables, giving definitions a plist-like syntax. All hash tables
have a size of 65 by default, and a size of 200000 did not
produce different results.
It looks like plists win for a small n, while alists
surprisingly perform worse than hash tables. It appears
alist-get
has some overhead which could be worked
out, because the function would logically be very similar to
plist-get
.
Hash tables, surprisingly, have very light overhead compared to plists, so using them for small datasets is also a viable option. How about huge datasets? Let's populate a list with 300 elements, and then the element we're looking for, using dash:
setq plist (-snoc (-interleave (make-list 300 :foo) (make-list 300 'bar)) :kek 'bur))
(setq alist (-snoc (-zip (make-list 300 :foo) (make-list 300 'bar)) '(:kek . bur)))
(defun alist-to-ht (list)
(let ((ht (make-hash-table :size 400)))
(dolist (item list)
(car item) (cdr item) ht))
(puthash (
ht))setq hash (alist-to-ht alist))
(
1000000 (plist-get plist :kek)) ; 1.480839
(timeit 1000000 (gethash :kek hash)) ; 0.285781
(timeit 1000000 (alist-get :kek alist)) ; 1.290769 (timeit
As anybody who has taken a first-year CS course would guess,
the hash table wins when n becomes large. It appears
alist-get
performs slightly better for large n as
well. I wonder why? Let's look at plist-get, first.
Disassembling
plist-get
Checking out the emacs git repository, registering the source
directory with emacs, and pressing the location link after
typing C-h f plist-get
will take us right where we
need to go: the definition of plist-get
in emacs' C
source
("plist-get", Fplist_get, Splist_get, 2, 2, 0,
DEFUN : /* long docstring */ )
doc(Lisp_Object plist, Lisp_Object prop)
{
(plist)
FOR_EACH_TAIL_SAFE {
if (! CONSP (XCDR (li.tail)))
break;
if (EQ (prop, XCAR (li.tail)))
return XCAR (XCDR (li.tail));
.tail = XCDR (li.tail);
liif (EQ (li.tail, li.tortoise))
break;
}
return Qnil;
}
Oh, so that's why they call it editor macros. It
looks like FOR_EACH_TAIL_SAFE
is doing a lot of
either unnecessary or expensive operations in its incrementor.
alist-get appears to suffer from some overhead from being
written in elisp, but perhaps iterating through each list on the
elisp side of things is less heavy.
Thankfully, I don't have to read through that heap of macros
myself; gcc can do it for me. gcc -E
to the
rescue!
(8) Splist_get = { { PVEC_SUBR << PSEUDOVECTOR_AREA_BITS }, { .a2 = Fplist_get }, 2, 2, "plist-get", 0, 0}; Lisp_Object Fplist_get
(Lisp_Object plist, Lisp_Object prop)
{ // Hope you like horizontal scrolling
for (struct for_each_tail_internal li = { plist, plist, 2, 0, 2 };
(((enum Lisp_Type) ((li.tail) & ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L)) ? - (1 << 3) : (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons) || ((void) 0, 0);
(li.tail = (((void) (0 && ((((enum Lisp_Type)
((li.tail) & ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L))
? - (1 << 3) : (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons)))),
(struct Lisp_Cons *) __builtin_assume_aligned ((void *) (intptr_t) ((li.tail) - (Lisp_Cons)), 8))->u.cdr,
((--li.q != 0 || (( 0 ) ? maybe_quit () : (void) 0, 0 < --li.n)
|| (li.q = li.n = li.max <<= 1, li.n >>= (((0) < 0) + ((((((!!((((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) >> 4) & 8)
+ !!((((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) >> 4) & 4)
+ !!((((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) >> 4) & 2)
+ !!((((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) >> 4) & 1))
+ (!!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) & 8)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) & 4)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) & 2)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 8) & 1)))
+ ((!!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 4) & 8)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 4) & 4)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 4) & 2)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) >> 4) & 1))
+ (!!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 16) & 1))))
+ (((!!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) >> 4) & 8)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) >> 4) & 4)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) >> 4) & 2)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) >> 4) & 1))
+ (!!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 8) & 1)))
+ ((!!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 4) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 4) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 4) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) >> 4) & 1))
+ (!!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 31 >> 1) & 1)))))
+ ((((!!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) >> 4) & 8)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) >> 4) & 4)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) >> 4) & 2)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) >> 4) & 1))
+ (!!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 8) & 1)))
+ ((!!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 4) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 4) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 4) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) >> 4) & 1)) + (!!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 16) & 1)))) + (((!!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) >> 4) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) >> 4) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) >> 4) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) >> 4) & 1))
+ (!!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 8) & 1)))
+ ((!!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 4) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 4) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 4) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) >> 4) & 1))
+ (!!((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) & 8)
+ !!((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) & 4)
+ !!((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) & 2)
+ !!((((0x7fff * 2 + 1)) >> 31 >> 31 >> 2) & 1))))))
+ (((((!!(((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) >> 4) & 8)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) >> 4) & 4)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) >> 4) & 2)
+ !!(((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) >> 4) & 1)) + (!!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 8) & 1)))
+ ((!!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 4) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 4) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 4) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) >> 4) & 1)) + (!!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 16) & 1))))
+ (((!!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) >> 4) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) >> 4) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) >> 4) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) >> 4) & 1)) + (!!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 8) & 1)))
+ ((!!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 4) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 4) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 4) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 31 >> 1) >> 4) & 1)) + (!!((((0x7fff * 2 + 1)) >> 31 >> 1) & 8)
+ !!((((0x7fff * 2 + 1)) >> 31 >> 1) & 4)
+ !!((((0x7fff * 2 + 1)) >> 31 >> 1) & 2)
+ !!((((0x7fff * 2 + 1)) >> 31 >> 1) & 1)))))
+ ((((!!((((((0x7fff * 2 + 1)) >> 16) >> 8) >> 4) & 8)
+ !!((((((0x7fff * 2 + 1)) >> 16) >> 8) >> 4) & 4)
+ !!((((((0x7fff * 2 + 1)) >> 16) >> 8) >> 4) & 2)
+ !!((((((0x7fff * 2 + 1)) >> 16) >> 8) >> 4) & 1)) + (!!(((((0x7fff * 2 + 1)) >> 16) >> 8) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 16) >> 8) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 16) >> 8) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 16) >> 8) & 1)))
+ ((!!(((((0x7fff * 2 + 1)) >> 16) >> 4) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 16) >> 4) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 16) >> 4) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 16) >> 4) & 1))
+ (!!((((0x7fff * 2 + 1)) >> 16) & 8)
+ !!((((0x7fff * 2 + 1)) >> 16) & 4)
+ !!((((0x7fff * 2 + 1)) >> 16) & 2)
+ !!((((0x7fff * 2 + 1)) >> 16) & 1))))
+ (((!!(((((0x7fff * 2 + 1)) >> 8) >> 4) & 8)
+ !!(((((0x7fff * 2 + 1)) >> 8) >> 4) & 4)
+ !!(((((0x7fff * 2 + 1)) >> 8) >> 4) & 2)
+ !!(((((0x7fff * 2 + 1)) >> 8) >> 4) & 1)) + (!!((((0x7fff * 2 + 1)) >> 8) & 8)
+ !!((((0x7fff * 2 + 1)) >> 8) & 4)
+ !!((((0x7fff * 2 + 1)) >> 8) & 2)
+ !!((((0x7fff * 2 + 1)) >> 8) & 1)))
+ ((!!((((0x7fff * 2 + 1)) >> 4) & 8)
+ !!((((0x7fff * 2 + 1)) >> 4) & 4)
+ !!((((0x7fff * 2 + 1)) >> 4) & 2)
+ !!((((0x7fff * 2 + 1)) >> 4) & 1)) + (!!(((0x7fff * 2 + 1)) & 8)
+ !!(((0x7fff * 2 + 1)) & 4)
+ !!(((0x7fff * 2 + 1)) & 2)
+ !!(((0x7fff * 2 + 1)) & 1)))))))),
.tortoise = li.tail, 0 )) && ((li.tail) == (li.tortoise))) ? ((void) (li.tail = builtin_lisp_symbol (0))) : (void) 0)) {
liif (!(((enum Lisp_Type) (((((void) ( 0 && ((((enum Lisp_Type) ((li.tail) & ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L) ) ? - (1 << 3)
: (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons)))), (struct Lisp_Cons *) __builtin_assume_aligned ((void *) (intptr_t) ((li.tail) - (Lisp_Cons)), 8))->u.cdr)
& ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L)) ? - (1 << 3) : (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons))
break;
if (((prop) == ((((void) ( 0 && ((((enum Lisp_Type) ((li.tail) & ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L) ) ? - (1 << 3)
: (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons)))), (struct Lisp_Cons *) __builtin_assume_aligned ((void *) (intptr_t) ((li.tail) - (Lisp_Cons)), 8))->car)))
return (((void) ( 0 && ((((enum Lisp_Type) (((((void) ( 0 && ((((enum Lisp_Type) ((li.tail) & ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L) ) ? - (1 << 3)
: (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons)))), (struct Lisp_Cons *) __builtin_assume_aligned ((void *) (intptr_t) ((li.tail) - (Lisp_Cons)), 8))->u.cdr)
& ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L) ) ? - (1 << 3) : (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons)))),
(struct Lisp_Cons *) __builtin_assume_aligned ((void *) (intptr_t) (((((void) ( 0 && ((((enum Lisp_Type) ((li.tail) & ~(((0x7fffffffffffffffL >> (3 - 1)) / 2
< (9223372036854775807L)) ? - (1 << 3) : (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons)))), (struct Lisp_Cons *) __builtin_assume_aligned ((void *) (intptr_t)
((li.tail) - (Lisp_Cons)), 8))->u.cdr) - (Lisp_Cons)), 8))->car;
.tail = (((void) ( 0 && ((((enum Lisp_Type) ((li.tail) & ~(((0x7fffffffffffffffL >> (3 - 1)) / 2 < (9223372036854775807L)) ? - (1 << 3)
li: (0x7fffffffffffffffL >> (3 - 1))))) == Lisp_Cons)))), (struct Lisp_Cons *) __builtin_assume_aligned ((void *) (intptr_t) ((li.tail) - (Lisp_Cons)), 8))->u.cdr;
if (((li.tail) == (li.tortoise)))
break;
}
return builtin_lisp_symbol (0);
}
Oh dear, that's very unhelpful, and there's no way that's
getting formatted correctly on mobile. We need to go deeper.
Disassembling the file reveals that plist_get
takes
around 80 instructions per loop, and has 11 possible jumps per
iteration. Five of those jumps are in the for loop's
incrementation step. I'm sure we don't need five jumps to move
through an iterator, but that's for another day.
5acd: 55 push rbp
5ace: 48 89 e5 mov rbp,rsp
5ad1: 48 83 ec 40 sub rsp,0x40
5ad5: 48 89 7d c8 mov QWORD PTR [rbp-0x38],rdi
5ad9: 48 89 75 c0 mov QWORD PTR [rbp-0x40],rsi
FOR_EACH_TAIL_SAFE (plist)
5add: 48 8b 45 c8 mov rax,QWORD PTR [rbp-0x38]
5ae1: 48 89 45 d0 mov QWORD PTR [rbp-0x30],rax
5ae5: 48 8b 45 c8 mov rax,QWORD PTR [rbp-0x38]
5ae9: 48 89 45 d8 mov QWORD PTR [rbp-0x28],rax
5aed: 48 c7 45 e0 02 00 00 mov QWORD PTR [rbp-0x20],0x2
5af4: 00
5af5: 48 c7 45 e8 00 00 00 mov QWORD PTR [rbp-0x18],0x0
5afc: 00
5afd: 66 c7 45 f0 02 00 mov WORD PTR [rbp-0x10],0x2
5b03: e9 e7 00 00 00 jmp 5bef <Fplist_get+0x122>
{
if (! CONSP (XCDR (li.tail)))
5b08: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
5b0c: 48 83 e8 03 sub rax,0x3
5b10: 48 8b 40 08 mov rax,QWORD PTR [rax+0x8]
5b14: 83 e0 07 and eax,0x7
5b17: 83 f8 03 cmp eax,0x3
5b1a: 0f 85 e1 00 00 00 jne 5c01 <Fplist_get+0x134>
break;
if (EQ (prop, XCAR (li.tail)))
5b20: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
5b24: 48 83 e8 03 sub rax,0x3
5b28: 48 8b 00 mov rax,QWORD PTR [rax]
5b2b: 48 3b 45 c0 cmp rax,QWORD PTR [rbp-0x40]
5b2f: 75 18 jne 5b49 <Fplist_get+0x7c>
return XCAR (XCDR (li.tail));
5b31: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
5b35: 48 83 e8 03 sub rax,0x3
5b39: 48 8b 40 08 mov rax,QWORD PTR [rax+0x8]
5b3d: 48 83 e8 03 sub rax,0x3
5b41: 48 8b 00 mov rax,QWORD PTR [rax]
5b44: e9 c6 00 00 00 jmp 5c0f <Fplist_get+0x142>
li.tail = XCDR (li.tail);
5b49: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
5b4d: 48 83 e8 03 sub rax,0x3
5b51: 48 8b 40 08 mov rax,QWORD PTR [rax+0x8]
5b55: 48 89 45 d0 mov QWORD PTR [rbp-0x30],rax
if (EQ (li.tail, li.tortoise))
5b59: 48 8b 55 d0 mov rdx,QWORD PTR [rbp-0x30]
5b5d: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
5b61: 48 39 c2 cmp rdx,rax
5b64: 0f 84 9a 00 00 00 je 5c04 <Fplist_get+0x137>
FOR_EACH_TAIL_SAFE (plist)
5b6a: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
5b6e: 48 83 e8 03 sub rax,0x3
5b72: 48 8b 40 08 mov rax,QWORD PTR [rax+0x8]
5b76: 48 89 45 d0 mov QWORD PTR [rbp-0x30],rax
5b7a: 0f b7 45 f0 movzx eax,WORD PTR [rbp-0x10]
5b7e: 83 e8 01 sub eax,0x1
5b81: 66 89 45 f0 mov WORD PTR [rbp-0x10],ax
5b85: 0f b7 45 f0 movzx eax,WORD PTR [rbp-0x10]
5b89: 66 85 c0 test ax,ax
5b8c: 75 46 jne 5bd4 <Fplist_get+0x107>
5b8e: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
5b92: 48 83 e8 01 sub rax,0x1
5b96: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
5b9a: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
5b9e: 48 85 c0 test rax,rax
5ba1: 7f 31 jg 5bd4 <Fplist_get+0x107>
5ba3: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
5ba7: 48 01 c0 add rax,rax
5baa: 48 89 45 e0 mov QWORD PTR [rbp-0x20],rax
5bae: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
5bb2: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
5bb6: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
5bba: 66 89 45 f0 mov WORD PTR [rbp-0x10],ax
5bbe: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
5bc2: 48 c1 f8 10 sar rax,0x10
5bc6: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
5bca: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
5bce: 48 89 45 d8 mov QWORD PTR [rbp-0x28],rax
5bd2: eb 1b jmp 5bef <Fplist_get+0x122>
5bd4: 48 8b 55 d0 mov rdx,QWORD PTR [rbp-0x30]
5bd8: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
5bdc: 48 39 c2 cmp rdx,rax
5bdf: 75 0e jne 5bef <Fplist_get+0x122>
5be1: bf 00 00 00 00 mov edi,0x0
5be6: e8 00 00 00 00 call 5beb <Fplist_get+0x11e>
5beb: 48 89 45 d0 mov QWORD PTR [rbp-0x30],rax
5bef: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
5bf3: 83 e0 07 and eax,0x7
5bf6: 83 f8 03 cmp eax,0x3
5bf9: 0f 84 09 ff ff ff je 5b08 <Fplist_get+0x3b>
5bff: eb 04 jmp 5c05 <Fplist_get+0x138>
break;
5c01: 90 nop
5c02: eb 01 jmp 5c05 <Fplist_get+0x138>
break;
5c04: 90 nop
}
return Qnil;
5c05: bf 00 00 00 00 mov edi,0x0
5c0a: e8 00 00 00 00 call 5c0f <Fplist_get+0x142>
Disassembling
alist-get
alist-get
is an O(1) elisp wrapper around
assq
, another C function. Inspecting the source and
disassembly reveals a slight efficiency gain over plists.
("assq", Fassq, Sassq, 2, 2, 0,
DEFUN : /* long docstring */
doc(Lisp_Object key, Lisp_Object list)
{
(list) {
FOR_EACH_TAIL if (CONSP (XCAR (li.tail)) && EQ (XCAR (XCAR (li.tail)), key))
return XCAR (li.tail);
}
return Qnil;
}
39d6: 55 push rbp
39d7: 48 89 e5 mov rbp,rsp
39da: 48 83 ec 40 sub rsp,0x40
39de: 48 89 7d c8 mov QWORD PTR [rbp-0x38],rdi
39e2: 48 89 75 c0 mov QWORD PTR [rbp-0x40],rsi
FOR_EACH_TAIL (list)
39e6: 48 8b 45 c0 mov rax,QWORD PTR [rbp-0x40]
39ea: 48 89 45 d0 mov QWORD PTR [rbp-0x30],rax
39ee: 48 8b 45 c0 mov rax,QWORD PTR [rbp-0x40]
39f2: 48 89 45 d8 mov QWORD PTR [rbp-0x28],rax
39f6: 48 c7 45 e0 02 00 00 mov QWORD PTR [rbp-0x20],0x2
39fd: 00
39fe: 48 c7 45 e8 00 00 00 mov QWORD PTR [rbp-0x18],0x0
3a05: 00
3a06: 66 c7 45 f0 02 00 mov WORD PTR [rbp-0x10],0x2
3a0c: e9 c3 00 00 00 jmp 3ad4 <Fassq+0xfe>
if (CONSP (XCAR (li.tail)) && EQ (XCAR (XCAR (li.tail)), key))
3a11: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
3a15: 48 83 e8 03 sub rax,0x3
3a19: 48 8b 00 mov rax,QWORD PTR [rax]
3a1c: 83 e0 07 and eax,0x7
3a1f: 83 f8 03 cmp eax,0x3
3a22: 75 28 jne 3a4c <Fassq+0x76>
3a24: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
3a28: 48 83 e8 03 sub rax,0x3
3a2c: 48 8b 00 mov rax,QWORD PTR [rax]
3a2f: 48 83 e8 03 sub rax,0x3
3a33: 48 8b 00 mov rax,QWORD PTR [rax]
3a36: 48 3b 45 c8 cmp rax,QWORD PTR [rbp-0x38]
3a3a: 75 10 jne 3a4c <Fassq+0x76>
return XCAR (li.tail);
3a3c: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
3a40: 48 83 e8 03 sub rax,0x3
3a44: 48 8b 00 mov rax,QWORD PTR [rax]
3a47: e9 b5 00 00 00 jmp 3b01 <Fassq+0x12b>
FOR_EACH_TAIL (list)
3a4c: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
3a50: 48 83 e8 03 sub rax,0x3
3a54: 48 8b 40 08 mov rax,QWORD PTR [rax+0x8]
3a58: 48 89 45 d0 mov QWORD PTR [rbp-0x30],rax
3a5c: 0f b7 45 f0 movzx eax,WORD PTR [rbp-0x10]
3a60: 83 e8 01 sub eax,0x1
3a63: 66 89 45 f0 mov WORD PTR [rbp-0x10],ax
3a67: 0f b7 45 f0 movzx eax,WORD PTR [rbp-0x10]
3a6b: 66 85 c0 test ax,ax
3a6e: 75 4b jne 3abb <Fassq+0xe5>
3a70: e8 00 00 00 00 call 3a75 <Fassq+0x9f>
3a75: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
3a79: 48 83 e8 01 sub rax,0x1
3a7d: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
3a81: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
3a85: 48 85 c0 test rax,rax
3a88: 7f 31 jg 3abb <Fassq+0xe5>
3a8a: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
3a8e: 48 01 c0 add rax,rax
3a91: 48 89 45 e0 mov QWORD PTR [rbp-0x20],rax
3a95: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
3a99: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
3a9d: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
3aa1: 66 89 45 f0 mov WORD PTR [rbp-0x10],ax
3aa5: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
3aa9: 48 c1 f8 10 sar rax,0x10
3aad: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
3ab1: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
3ab5: 48 89 45 d8 mov QWORD PTR [rbp-0x28],rax
3ab9: eb 19 jmp 3ad4 <Fassq+0xfe>
3abb: 48 8b 55 d0 mov rdx,QWORD PTR [rbp-0x30]
3abf: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
3ac3: 48 39 c2 cmp rdx,rax
3ac6: 75 0c jne 3ad4 <Fassq+0xfe>
3ac8: 48 8b 45 c0 mov rax,QWORD PTR [rbp-0x40]
3acc: 48 89 c7 mov rdi,rax
3acf: e8 00 00 00 00 call 3ad4 <Fassq+0xfe>
3ad4: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
3ad8: 83 e0 07 and eax,0x7
3adb: 83 f8 03 cmp eax,0x3
3ade: 0f 84 2d ff ff ff je 3a11 <Fassq+0x3b>
3ae4: 48 8b 45 d0 mov rax,QWORD PTR [rbp-0x30]
3ae8: 48 8b 55 c0 mov rdx,QWORD PTR [rbp-0x40]
3aec: 48 89 d6 mov rsi,rdx
3aef: 48 89 c7 mov rdi,rax
3af2: e8 00 00 00 00 call 3af7 <Fassq+0x121>
return Qnil;
3af7: bf 00 00 00 00 mov edi,0x0
3afc: e8 00 00 00 00 call 3b01 <Fassq+0x12b>
}
3b01: c9 leave
3b02: c3 ret
For those keeping count, that's 70 instructions per loop, and 9 jumps. That'd be the source of our constant-factor speedup. The slowness for small values of n are probably because of the elisp layer, which runs only once per call.
Of course, big n is better served with hash tables, as the prettier literal syntax of plists and alists becomes unwieldy after n grows larger than 5.
I have plenty of other things to cover about emacs' performance, like the myth of recursive vs. iterative code, lists and vectors, and the common lisp libraries. I'll cover those in a future blog.