Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Re: assumptions about distribution

But if the assumption is that they are all using a couple slots of that root table, then the root table is not a performance advantage. It would be faster to just start the trie immediately and switch off the first bit.

The point of having a structure like that at the top is that it is a cheap-o form of level compression, and increases the fan-out of the root node, allowing you to cut the search space down much faster given a single very cheap lookup.

As for segregated fit, the bucket array itself is probably actually good enough for that. What /could/ make that interesting is if you /also/ stated that "small" (many leading zero) keys happened to be much more prevalent, and in fact twice as prevalent, as those in the next bucket up.

Otherwise, it is just further unbalancing his trie and is getting reasonably the same tradeoff as if he were using a patricia trie (which will use path compression to eat large numbers of 0s and otherwise will be reasonably as unbalanced without the cost of the bucket array, which is going to be many internal nodes worth of memory).



In fact, it is even worse than that: in the case of random keys (which he admits in the documentation is important for trie balance without helping the user obtain them) he'd be much much better off just chopping the first five bits off the key and using that to index into his top-level array.

Doing that would let the top-level array segment the search space perfectly into 32 sub-buckets, avoiding the fact that the higher order array elements are increasingly meaningless (as they are exponentially smaller).

At that point, however, the datastructure is now a level-compressed trie ;P. (LPC tries seem to be at the state of the art for this kind of structure).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: