> 0-based indexing with closed intervals is better for slicing. This shouldn't b...

celrod · on Jan 20, 2021

It's related to 0-based indexing in that if you you want to take/iterate over the first `N` elements, `0:N` works with 0-based indexing + close-open, but if you had 1-based and close-open, you'd need the awkward `1:N+1`.

This is why 1-based index languages normally use closed-closed intervals, so that they can use `1:N`.

I'm a die hard Julian (Julia is 1-based), but I do a lot of pointer arithmetic in my packages internally. I've come to prefer 0-based indexing, as it really is more natural there. 0-based plus close-open intervals are also nicer for partitioning an iteration space/tiling loops, thanks to the fact the parent commented pointed out on the end of one iteration being the start of the next. This is a nice pattern for partitioning `N` into roughly block_size-sized blocks:

  iters, rem = divrem(N, block_size)
  start = 0
  for i in [0,iters)
    end = start + block_size + i < rem
    # operate on [start, end)
    start = end
  end

But that's only slightly nicer. To translate this into 1-based indexing and closed-closed intervals, you'd just substitute the `# operate` line with

    # operate on [start+1, end]

the `[0,iters)` with `[1:iters]`, and `i < rem` with `i <= rem`.

1- vs 0-based indexing is bike-shedding. A simple question we can all have opinions on that's easy to argue about, when it really doesn't matter much.

Julia uses 1-based indexing, but its pointer arithmetic is (obviously) 0-based, because pointer arithmetic != indexing. Adding 0 still adds 0, and adding 1 and `unsafe_load`ing will give me a different value than if I didn't add anything at all. (This is just reemphasizing the final point made by the blog post.)

cbolton · on Jan 20, 2021

Do you use 0-based arrays in your Julia code (since Julia supports arbitrary first indices, e.g. with OffsetArrays)? If not, why not?

celrod · on Jan 20, 2021

I have on occasion, and when working on custom array types I've added support for being offset with optionally compile-time known offsets. As StrideArrays.jl matures and I write more libraries making use of it, I may use 0-based indices more often. The idea of dynamic offsets when they aren't needed bothers me, even though I've benchmarked that as being pretty meaningless. The bigger problem is just that OffetArrays are a wrapper, and there's a Julia bug where TBAA information on wrappers often gets lost, so that the compiler reloads the pointer you're loading from on every iteration of a loop. Aside from that being slow itself, it also causes the autovectorizer to fail. This causes a severe regression when it occurs. Performance should be fine in other cases. LoopVectorization or `@simd ivdep` should also avoid the problem.

For the most part, my preference on 1 vs 0 is weak, and for coffee other people are likely to look at i do want to make it easy to understand / not full of surprises.

jomar · on Jan 20, 2021

The natural representation of intervals for doing arithmetic on is 0-based half-open.

Half-open because of the slicing properties, as noted in your posting and the grandparent posting.

0-based because of the simplification for converting between relative coordinate systems. Suppose you have one interval A represented as offsets within a larger interval B, and you'd like to know what A's coordinates are in the global coordinate system that B uses. This is much easier to compute when everything uses 0-based coordinates.

Here is a slightly longer discussion of that in a genomics context: https://github.com/ga4gh/ga4gh-schemas/issues/121#issuecomme... and a draft of a document I wrote up (again, in a genomics context) so as never to have to have this discussion ever again: https://github.com/jmarshall/ga4gh-schemablocks.github.io/bl...

sdbrady · on Jan 20, 2021

I think this is a good write up, but I think the notation is still carrying some mental baggage. It's not necessary to have these open and closed brackets/parenthesis. They don't add anything, and if anything, they confuse the matter. An interval is just (1, 2) (or [1, 2] if preferred aesthetically). Since a base cannot be "on" either 1 or 2, it's not meaningful to have these open/closed interval notion. In other words, (1, 2) == [1, 2) == (1, 2] == [1, 2].

Open/closed intervals only come into play in continuous dimensions. DNA sequences, arrays in memory, et al are discrete.

samatman · on Jan 20, 2021

On the first point, yep, completely misspoke, what you don't want is open intervals.

To the second point, as I said: Djikstra's argument for using 0 instead of 1 with half-open intervals is, to my taste, perfect. As I have nothing to add to it, I will simply defer.

bvrmn · on Jan 20, 2021

For </<= intervals and 1-based indexing you have to write `for(i=1; i++; i <= N)`. So you lost nice property of having `upper - lower` number of iterations.

dragonwriter · on Jan 20, 2021

For half-open intervals, you have upper-lower iterations regardless of base. In C for loop terms, it's:

  for (i=lower;i<upper;i++) {}

If you are using 0-based indexing, lower is zero and upper is the number of elements for a full iteration over all indexes; with 1-based indexing lower is 1 and upper is one greater than the number of elements.

Now, despite the utility of half-open intervals, most people’s intuition is around ordinal index ranges, so 0-based indexing is counterintuitive with half-open intervals because slices start at 0, and 1-based indexing is counterintuitive with them because they end at N+1.

This is because, useful or not, half-open intervals are counterintuitive, but they are worth developing the intuition for because they combine nicely.

bvrmn · on Jan 20, 2021

> with 1-based indexing lower is 1 and upper is one greater than the number of elements.

Do you see +1 tweak? Also consider that "for loop" can't be expressed as </* interval and always expressed as <=/* interval. So 1-based indexing have to be either 1 <= i <= N (closed interval, bad, N - 1 != number of iterations) or 1 <= i < N + 1 (half open interval, good, but +1 tweak).