Then I'm still confused. Is "persistent data structure" a synonym for "purely fu...

jbapple · on Oct 25, 2014

Good question. The answer is no. Here's something I wrote elsewhere and have copied here:

Immutability [in purely functional data structures] is an implementation technique. Among other things, it provides persistence, which is an interface. The persistence API is something like:

`version update(operation o, version v)` performs operation `o` on version `v`, returning a new version. If the data structure is immutable, the new version is a new structure (that may share immutable parts of the old structure). If the data structure isn't immutable, the returned version might just be a version number. The version `v` remains a valid version, and it shouldn't change in any `observe`-able way because of this update - the update is only visible in the returned version, not in `v`.

`data observe(query q, version v)` observes a data structure at version `v` without changing it or creating a new version.

For more about these differences, see:

* The chapter "Persistent data structures" by Haim Kaplan in Handbook on Data Structures and Applications: http://www.math.tau.ac.il/~haimk/papers/persistent-survey.ps

* "Making Data Structures Persistent" by Driscoll et al.: http://www.cs.cmu.edu/~sleator/papers/Persistence.htm

* The first lecture of the Spring 2012 incarnation of MIT's 6.851: Advanced Data Structures: http://courses.csail.mit.edu/6.851/spring12/lectures/L01.htm...

ScottBurson · on Oct 24, 2014

In practice the two terms do seem to be almost synonymous, as functional data structures have to be persistent, and so efficient functional data structures have to be efficiently persistent.

But it's possible for a persistent data structure not to be purely functional. Here's a realistic example. A trie can be updated functionally, as this article describes. However, when the trie is initially being constructed, so that there is known to be only one variable holding a pointer to the root, it is unnecessary to use functional update to add each node (which would generate a lot of garbage). The nodes can be added by side-effect, and then once the trie's initial state is complete and its root pointer is exposed for use by the rest of the program, subsequent updates can then be done functionally.

ajanuary · on Oct 25, 2014

Persistence is defined in terms of the interface to an abstract data type. When I do a write operation on a data structure, I get a new version back, and all read operations on the old version still return the same values.

    version1 = make_record(x = 10)
    v1_x = get_x(version1)                 ; 10
    version2 = update(version1, x = 20)
    v2_x = get_x(version2)                 ; 20
    get_x(version1) == v1_x                ; true

The update operation is free to mutate the original data structure however it likes, so long as from the outside the read operations look the same.

This freedom to mutate is often used to make efficient persistent data structures by re-using the same memory but tagging different parts of it with a version.

    version1 = make_record(x = 10)

               +--------+--------+      +-----+   +--------+---------+--------+
    version1 ->| tag: 1 | data: -+----->| x: -+-->| tag: 1 | val: 10 | next: -+---|
               +--------+--------+      +-----+   +--------+---------+--------+


    version2 = update(version1, x = 20)
        
               +--------+--------+      +-----+   +--------+---------+--------+   +--------+---------+--------+
    version1 ->| tag: 1 | data: -+--+-->| x: -+-->| tag: 1 | val: 10 | next: -+-->| tag: 2 | val: 20 | next: -+---|
               +--------+--------+  |   +-----+   +--------+---------+--------+   +--------+---------+--------+
               +--------+--------+  |
    version2 ->| tag: 2 | data: -+--+
               +--------+--------+

When version2 is created, it really just appends to the same data structure as version1, but tags the cell as a new version. When you do get_x(version1) it follows down the list, finds the cell with tag 1, and returns it's value. When you do get_x(version2) it follows down the list until the cell with tag 2 and returns it's value.

From the outside, get_x(version1) still returns the same value before and after the update, but the data structure it refers to has been mutated underneath it.

There are also ways to use structural sharing in persistent data structures where the you don't do any mutation of the old versions. For examples of this, see the linked article.

Functional data structures forbid mutation. This means they must be persistent, because update operations have to return new data structures, and the old data structures aren't allowed to be mutated so by definition all read operations on them still return the same values.

Functional data structures end up forming a subset of fully persistent data structures. They're a useful category for a lot of the same reasons people are touting pure functional languages are useful. For instance, you never have to worry about locks when you have concurrent access. With the mutating example above, if one thread does update(version1, x = 20) and another does update(version1, x = 30) I need to coordinate which one gets tag 2 and which one gets tag 3.

Persistent data structures are the more general concept, functional data structures are a specialized subset.