Understanding Clojure's Persistent Vectors, Pt. 3

ScottBurson · on Oct 24, 2014

At the beginning of part 1, the author says that this data structure provides "practically O(1) runtime" (emphasis added) for the important operations of lookups, insertion, updating, etc. At the end it is explained how it's "O(log32 n)", which of course is actually O(log n).

And then here in part 3 where he keeps saying "effectively constant time* (emphasis in original this time), what he means is O(log n) time.

The Clojure people really need to stop doing this. Big-O notation is intended to describe how the time required for an operation grows as the number of items being handled gets large. It is precisely not intended to communicate the absolute time the operation takes in small cases. They are confusing people when they try to communicate both pieces of information at the same time. What they should say is something like "lookup takes O(log n) time, and for seqs of size up to N items, it takes M times as long as ArrayList.get".

And then, of course, to describe O(log n) time as "effectively constant" is just lying.

(Second and fourth paragraphs added in edit.)

TheLoneWolfling · on Oct 24, 2014

I have to disagree with you here.

Let me put it this way: There are theoretical limits to the upper bound of problem size. The number of bits storable in the (observable) universe is only, what, about 10^120? [1]

log32(10^120) is only around 80. So if you have a constant-time algorithm that runs around 80x slower than a logrithmic algorithm for small cases, the constant-time algorithm will never be faster. Period. And I've seem constant-time algorithms that take >80x the speed of theoretically asymptotically slower algorithms. I mean: a single cache miss can cause that level of slowdown.

Or, to put it another way, log32(n) <= 80 for all achievable n. So, realistically, you might as well consider it effectively O(80), i.e. O(1). You have something similar, albeit more extreme, for inverse Ackermann time.

[1] http://physics.aps.org/story/v9/st27

malisper · on Oct 24, 2014

> There are theoretical limits to the upper bound of problem size.

Big O is defined as what happens as the size of the input approaches infinity. You cannot take a O(log n) algorithm and say it is effectively O(1), that is not what big O means. If anything, you are arguing for why big O notation is a bad measure for the efficiency of an algorithm, not why you can call a O(log n) algorithm O(1).

wtetzner · on Oct 24, 2014

From a quick look at the Clojure code, it looks like elements are indexed by ints.

log_32(2 ^ 31) = 6.2.

millstone · on Oct 24, 2014

With a mutable vector, the fast path for appending an object is to increment the count and store a value at the end. That's two stores.

With a persistent vector, the fast path for appending an object is to allocate memory, copy the old structure into the new one (incurring several write barriers), increment the count, store the new object at the end, and then (eventually) deallocate the old structure.

Is it that hard to imagine that the persistent vector fast path will be 80x slower than the mutable vector fast path? As you say, a single cache miss can be responsible for that slowdown, and a cache miss is likely with the freshly allocated memory.

So then, by your argument, we ought to characterize the "constant time" persistent vector operation as logarithmic, since it's 80x slower.

tolmasky · on Oct 24, 2014

Doesn't it matter in the way functions compose however? Definitely not my area of expertise, but similar to the classic N^2 iteration trap (for(i=0;i<strlen(str);++i) ...) you could see something like that happening by multiplying log operations together.

For example, if for each item in a this vector, I have 2 nested inner loops where I act as if the operation is O(1), I start feeling like it should perform O(N), in reality though, its performing O(N * log32^2 N). If N is 1,000,000 (not an unreasonable amount by any means), we're now talking 1.5 * 10^7 operations vs 1 * 10^6 that I expect. Perhaps arguably not too unreasonable a slowdown, but its certainly affecting the way my application scales in a way that wasn't immediately intuitive when I initially wrote the code. Again, not sure if this is an practical worry, but curious if these seemingly small differences can magnify as they bubble up through composition.

ajuc · on Oct 24, 2014

If for all realistically possible n your O(log n) function is faster than O(1) function, then calling your function K times will still be faster than calling that O(1) function K times.

ScottBurson · on Oct 24, 2014

That argument applies to any O(log n) algorithm. The base of the logarithm doesn't matter.

TheLoneWolfling · on Oct 24, 2014

Well, yes and no.

Yes, in that the argument applies.

No, in that the size is larger. It's more reasonable to say one algorithm is 80x faster than another of the same complexity than to say one algorithm is 530x faster than another of the same complexity.

lomnakkus · on Oct 25, 2014

Constants matter. To "overcome" O(log_a n) you need a constant of n^a for any given n. That's quite a lot in practice... although it's matematically trivial.

lomnakkus · on Oct 24, 2014

log_32(2^48) is <= 10. Your computer cannot access anywhere close to 2^48 items.

I appreciate a little pedantry as much as the next guy, but if you can come up with an equally concise way of saying it than "effectively constant" then please do so.

rawnlq · on Oct 24, 2014

Here you go: Õ (read soft-O)

See: http://en.wikipedia.org/wiki/Big_O_notation#Extensions_to_th... (basically means ignore multiplicative log factors)

Also, to play devil's advocate, why not say all log factors are effectively constant? Note that 5 * log_32(x) = log_2(x) or to use your example, log_2(2^48) <= 50.

lomnakkus · on Oct 25, 2014

Interesting. I was not aware of that notation, thank you.

Re: "all log factors": Agreed, as long as the base of the logarithm is high enough, obviously :). I'm well aware that the base doesn't matter mathematically, but in practice it actually does. Especially since 32 is the number of bits in a CPU word, and thus neatly fits into cache lines, is easily manipulable by CPU instructions, etc. etc. (See Bagwell's papers for details.)

ajuc · on Oct 24, 2014

I agree when we speak about algorithms.

When we speak about specific implementations it's useles most often, because most of these implementations use datatypes that simply don't work for numbers bigger than 2^64, collections that can hold at most 2^64 items etc.

So your program can sort collections of at most 2^64 items in n log n? Nice, I can do that in constant time.

JeanPierre · on Oct 24, 2014

I understand your concern, but I don't really think the problem is within Clojure people: When I started out explaining the structure to other people and that most operations have O(log n) runtime, people dismissed Clojure as being too slow for practical purposes. Part of this is because programmers, not Clojure devs in particular, connotate O(log n) with "much slower than O(1)". However, this data structure is incredibly much faster than the "O(log n)" people are generally used to.

And this is the problem. As you yourself say, most programmers usually associate/assume "Big-O notation is intended to describe how the time required for an operation grows as the number of items being handled gets large".

But formally, this is not the case at all. O(g(x)) tells you the that the worst case input to the algorithm will require less operations (usually operations, sometimes memory or other things) than the function g multiplied by some positive constant. g is usually determined by the size of the input. (This definition is also simplified somewhat, but it illustrates the problem)

We can say that ArrayLists.add have O(n) runtime and that quicksort have O(n²) runtime, but they both also have O(n³) and O(n!) runtime.

In addition, Big-O does not describe how the algorithm runs in the average case, or how frequent the worst case is. In and of itself, the Big-O notation tells us almost nothing about an algorithm – you barely conclude anything from it. I guess that is part of why it has become this very simplified term with a very wibbly-wobbly definition from time to time.

So when you're saying that calling it "practically O(1)" is lying, I agree if this was in an academic context. But for the average programmer (regardless of language!), O(g(x)) is usually thought of as the "worst case runtime for an algorithm". Since the actual runtime is – well – effectively constant, I considered it okay.

---

That being said, I have formally explained the persistent vector if you're interested in the academic version. The background chapter in http://hypirion.com/thesis.pdf should cover it.

ScottBurson · on Oct 24, 2014

I understand that log-time algorithms have a marketing problem. I think the correct way to address that is the one I already suggested: give benchmark numbers for lookup, relative to ArrayList, for a typical collection size (or maybe for a few sample sizes -- 32, 1024, and 32768, perhaps). These will clearly show both that lookup is fast on small seqs and that it slows down logarithmically.

I would be more sympathetic to your argument that O(log n) is "practically" O(1) if you would put it in context. It's certainly true that how much the difference matters depends on the application. For someone doing real-time work (HFT, for example) it could be critical. For the average web app, it matters extremely little; the programmer productivity gains offered by functional collections far outweigh the small cost in CPU cycles. I totally agree with that argument, and have made it myself many times, but I still wouldn't say that log time is "practically" constant time for all applications.

JeanPierre · on Oct 25, 2014

Agreed on the benchmarks – that's the next (and last) part on the list I'm going to do with this series.

And yes, of course, at some point, constant factors break into applications. Would it be better if add in a short comment on "practically O(1) for non-HPC applications", or would you phrase it differently to give it better context?

ScottBurson · on Oct 25, 2014

Ah, good -- numbers will clarify the situation considerably.

Personally my preference would be to argue that for applications that aren't terribly performance-sensitive, log time in general just isn't that bad. Even binary trees are quite usable for many purposes. (I believe the Linux kernel uses red-black trees for various things.) For a general-purpose sequence data structure that will be used in a variety of ways, a very good case can be made that it's worth accepting log-time lookup in order to get log-time insertion, subsequence, and concatenation -- operations that all take linear time on an ArrayList (and linear time will kill you far sooner than log time!).

On top of that, Hickey, following Bagwell, has done a great job at keeping the constant factors down, as (I presume) the benchmark numbers will show.

Another point I would make is that true random access into a seq is not the only way elements are retrieved. Often this is done via an iterator. I presume the Clojure seq iterator takes amortized constant time per element (it should); if so, I think it would be worth pointing that out.

All that said, I guess I could live with "practically O(1) for non-HPC applications".

Sorry I was so combative. I had not looked closely at Clojure seqs, so I do appreciate this series of posts you have written.

plinkplonk · on Oct 25, 2014

you should probably try "practically O(1) for the average web app", if you want to keep that phrasing.

Better to skip the "practically O(1)" etc bits and just say O(log n).

rawnlq · on Oct 24, 2014

Cool you seem to be the author.

Instead of telling people that it's O(log n) you should just show benchmarks instead.

Hitting L1 takes 1ns but main memory takes 100ns. So although you can say accessing an element in an array is always O(1) = O(100), it is hiding a factor of a 100 depending on how it is accessed. Having a log in the big Oh usually means it is jumping around a lot and is no longer a cache friendly algorithm, so that worries me more than whether it is log_2(x) or log_2(x)/5.

millstone · on Oct 24, 2014

> However, this data structure is incredibly much faster than the "O(log n)" people are generally used to.

Is it? I'll bet it's actually much slower than familiar O(log n) operations like binary search, in particular due to the memory allocations.

weavejester · on Oct 25, 2014

If you're comparing a binary search to a lookup on a Clojure hashmap, there should be no memory allocations necessary.

NoodleIncident · on Oct 24, 2014

When does this distinction matter? At some point even updating an O(1) vector is going to actually depend on your pager, cache, swap, bignum libraries and other non-O(1) factors, but no one calls out C developers for "lying".

mamcx · on Oct 24, 2014

How performant is this kind of structures? For what is best?

ScottBurson · on Oct 24, 2014

Okay, "just lying" is too strong. But see my reply to JeanPierre.

michaelochurch · on Oct 24, 2014

You are, on the matter of what these notations mean, absolutely right.

This "effectively constant time" claim didn't originate from Clojure, by the way. Scala, which has similar 32-tree structures, used it before then.

In functional programming, binary-tree structures were common for maps and sets (and still are, in Haskell and Ocaml) because they can use comparisons without hashing and can be faster. However, those trees tend to involve ~log2 n bounces, which can become a performance problem if they're very big. People end up preferring (less functional, and less performant due to hashing if your collection is small) hash tables.

The 32-tree structures are great, on the whole. Not only do they offer fast lookups, but the amount of copying that has to be done for inserts and deletions is generally very small.

The "effectively constant" language emerged as a way of saying, "you're not necessarily going to want to replace this with a hashtable at 1,000,000 elements".

Rich Hickey gave a talk early on about this in which he acknowledged the fact that it was still O(log n) time. As he said, constants matter.

The reason people equate O(log n) lookup with "slow" is that they (mistakenly) assume that n ~ 1,000,000 will prima facie the lookup 20x slower than it "needs to be". And that's, of course, false.

ScottBurson · on Oct 24, 2014

Just to be clear, I would never suggest that the constant factor doesn't matter. I might, however, repeat the familiar quote about premature optimization :-)

calibraxis · on Oct 24, 2014

As I vaguely recall, even Plain Old Array accesses take logarithmic time on real machines. However, if we don't use bignums, they're effectively constant time.

Furthermore, it's common to "lie" in computer science. Knuth wrote: "Another noteworthy characteristic of this manual is that it doesn't always tell the truth ... The author feels that this technique of deliberate lying will actually make it easier for you to learn the ideas. Once you understand the simple but false rule, it will not be hard to supplement that rule with its exceptions."

Particularly since O(..) notation is generally used in reference to MODELS, not real machines.

But in any case, Clojure's "effectively constant" is obviously correct.

billsix · on Oct 25, 2014

> As I vaguely recall, even Plain Old Array accesses take logarithmic time on real machines.

How does adding an offset to a pointer and referencing that data require logarithmic time?

>. However, if we don't use bignums, they're effectively constant time.

What are these bignums that are used for array access on "real machines"?

plinkplonk · on Oct 24, 2014

I've seen it in some Clojure books too. "effectively constant","practically O(1)" etc. The funny thing is that this kind of scamming is not necessary at all.

tolmasky · on Oct 24, 2014

I'm curious if anyone who has read (or is familiar with the data structures described in) Purely Functional Data Structures[0] can weigh in on how this persistent vector compares to say, the real time deque which is described to have worst case runtime of O(1) for cons/head/tail/snoc/last/init (along with plenty of other structures which have similar amortized time guarantees, etc.) Do these structures not have practical performance characteristics, or is there another reason a O(log) solution was chosen? Like the other poster mentioned here, its a bit upsetting having read something that rigorously analyzes worst time behavior to have an article hand wave away log32 as "effectively constant time". Especially because amortization is even trickier to consider with persistent data structures since you have to deal with the fact that you may have many operations taking place on your worst case node (several pushes on your edge case base vector, etc). Perhaps these tries actually account for that very elegantly, but I certainly don't know enough to intuit it from what I've seen here.

0. http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

Edit: PDF version: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

Kutta · on Oct 24, 2014

The key point here is that Clojure-style vectors have logarithmic random reads/writes, while Okasaki's aforementioned deques do the same in linear time. Clojure uses the persistent vector by default, so it's essential that people can port their usual imperative vector algorithms and expect to get acceptable performance. Hence the focus on random access.

As to amortization, operations on Clojure-style vectors have pretty much zero variance (modulo the size of the vector). There is no resizing, no reordering, no rotation or whatever, it's just the exact same algorithm every time.

lomnakkus · on Oct 24, 2014

AFAIR the deques in PFDS don't permit anything better than O(log n) random access -- and the ones that do that are hideously complicated (again AFAIR, it's been years since I read it). Also, the thing is constants matter, and this is what the "log_32(n)" vs. "log(n)" discussion is driving at.

(You're still not going to get anywhere close to a linear scan of an array with directly embedded values, but there's just no way to achieve that kind of raw performance for pure functional data structures.)

lisper · on Oct 24, 2014

I always thought "persistent" meant "stored in non-volatile storage" but this author seems to be using it as a synonym for "immutable" or "functional" which seems really weird to me. Is this a Clojure-ism?

augustl · on Oct 24, 2014

In the context of data structures, persistent has a very specific meaning. So it's not really a Clojure-ism.

https://en.wikipedia.org/wiki/Persistent_data_structure

It's not a synonym for immutable or functional. A persistent data structure is one that shares structure with other data structures.

ajanuary · on Oct 24, 2014

> A persistent data structure is one that shares structure with other data structures.

No, a persistent data structure is one where, when you apply an operation that would ordinarily modify it, you instead get back a new data structure with the operation applied, and the old data structure appears the same.

I could make an array persistent by copying the entire array each time. No structural sharing at all.

Of course, you want structural sharing because it makes things a heck of a lot more efficient, but it's not a part of the definition of a persistent data structure.

lisper · on Oct 24, 2014

Then I'm still confused. Is "persistent data structure" a synonym for "purely functional data structure" in the sense of Oakasaki96?

http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

jbapple · on Oct 25, 2014

Good question. The answer is no. Here's something I wrote elsewhere and have copied here:

Immutability [in purely functional data structures] is an implementation technique. Among other things, it provides persistence, which is an interface. The persistence API is something like:

`version update(operation o, version v)` performs operation `o` on version `v`, returning a new version. If the data structure is immutable, the new version is a new structure (that may share immutable parts of the old structure). If the data structure isn't immutable, the returned version might just be a version number. The version `v` remains a valid version, and it shouldn't change in any `observe`-able way because of this update - the update is only visible in the returned version, not in `v`.

`data observe(query q, version v)` observes a data structure at version `v` without changing it or creating a new version.

For more about these differences, see:

* The chapter "Persistent data structures" by Haim Kaplan in Handbook on Data Structures and Applications: http://www.math.tau.ac.il/~haimk/papers/persistent-survey.ps

* "Making Data Structures Persistent" by Driscoll et al.: http://www.cs.cmu.edu/~sleator/papers/Persistence.htm

* The first lecture of the Spring 2012 incarnation of MIT's 6.851: Advanced Data Structures: http://courses.csail.mit.edu/6.851/spring12/lectures/L01.htm...

ScottBurson · on Oct 24, 2014

In practice the two terms do seem to be almost synonymous, as functional data structures have to be persistent, and so efficient functional data structures have to be efficiently persistent.

But it's possible for a persistent data structure not to be purely functional. Here's a realistic example. A trie can be updated functionally, as this article describes. However, when the trie is initially being constructed, so that there is known to be only one variable holding a pointer to the root, it is unnecessary to use functional update to add each node (which would generate a lot of garbage). The nodes can be added by side-effect, and then once the trie's initial state is complete and its root pointer is exposed for use by the rest of the program, subsequent updates can then be done functionally.

ajanuary · on Oct 25, 2014

Persistence is defined in terms of the interface to an abstract data type. When I do a write operation on a data structure, I get a new version back, and all read operations on the old version still return the same values.

    version1 = make_record(x = 10)
    v1_x = get_x(version1)                 ; 10
    version2 = update(version1, x = 20)
    v2_x = get_x(version2)                 ; 20
    get_x(version1) == v1_x                ; true

The update operation is free to mutate the original data structure however it likes, so long as from the outside the read operations look the same.

This freedom to mutate is often used to make efficient persistent data structures by re-using the same memory but tagging different parts of it with a version.

    version1 = make_record(x = 10)

               +--------+--------+      +-----+   +--------+---------+--------+
    version1 ->| tag: 1 | data: -+----->| x: -+-->| tag: 1 | val: 10 | next: -+---|
               +--------+--------+      +-----+   +--------+---------+--------+


    version2 = update(version1, x = 20)
        
               +--------+--------+      +-----+   +--------+---------+--------+   +--------+---------+--------+
    version1 ->| tag: 1 | data: -+--+-->| x: -+-->| tag: 1 | val: 10 | next: -+-->| tag: 2 | val: 20 | next: -+---|
               +--------+--------+  |   +-----+   +--------+---------+--------+   +--------+---------+--------+
               +--------+--------+  |
    version2 ->| tag: 2 | data: -+--+
               +--------+--------+

When version2 is created, it really just appends to the same data structure as version1, but tags the cell as a new version. When you do get_x(version1) it follows down the list, finds the cell with tag 1, and returns it's value. When you do get_x(version2) it follows down the list until the cell with tag 2 and returns it's value.

From the outside, get_x(version1) still returns the same value before and after the update, but the data structure it refers to has been mutated underneath it.

There are also ways to use structural sharing in persistent data structures where the you don't do any mutation of the old versions. For examples of this, see the linked article.

Functional data structures forbid mutation. This means they must be persistent, because update operations have to return new data structures, and the old data structures aren't allowed to be mutated so by definition all read operations on them still return the same values.

Functional data structures end up forming a subset of fully persistent data structures. They're a useful category for a lot of the same reasons people are touting pure functional languages are useful. For instance, you never have to worry about locks when you have concurrent access. With the mutating example above, if one thread does update(version1, x = 20) and another does update(version1, x = 30) I need to coordinate which one gets tag 2 and which one gets tag 3.

Persistent data structures are the more general concept, functional data structures are a specialized subset.

dopamean · on Oct 24, 2014

If this is totally accurate then thanks for clearing that up. I was of the impression that part of what made persistent data structures persistent was the structural sharing.