Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Should be pretty trivial for him to shard collections across multiple databases with the same level of intelligence as MongoDB's automated sharding simply using the primary key, since MongoDB doesn't join.

Not sure how sharding is the point of MongoDB, though - in most of the universe, sharding is a database architecture/schema-level thing, not a database-server level thing, and for good reason - it's pretty darn hard for a database server to shard effectively without some knowledge of the app layer (and which keys are likely to become hot).

Personally, if I really wanted a flexible-schema "document based" database, I'd have implemented this using the FriendFeed K/V + Index model ( http://backchannel.org/blog/friendfeed-schemaless-mysql ) plus Postgres's HStore functionality, storing K/V per document in an HStore rather than in one giant K/V table like FriendFeed. That way I wouldn't need to use V8 and JSON parsing to run queries, and the mythic MongoDB-style "sharding" would be just as easy (just distribute the document -> hstore table across shards keyed on ID again).



I disagree that sharding can ever be a trivial problem if you're going to try to tackle moving data between shards while staying online. I'm not saying it's impossible, just that it's not trivial.


MongoDB's relatively simple approach involves continuing to use the old shard as an authoritative source (and committing updates to it) while shipping data in the background, then pushing the additional changes across and marking the new shard as "master."

Such an approach wouldn't be horribly difficult to implement in SQL using a copy table and write triggers - almost identically to how SoundCloud's Large Hadron Migrator allows writes to occur over a MySQL InnoDB table that's locked for migration (but even simpler because the table schema can't conflict afterwords).

The entire problem is admittedly nontrivial, since if the application happens to be writing data to the shard under migration too quickly (or the shard being migrated to dies), the server can end up in a situation where the new shard is never able to catch up and become a master. However, the easy solution (give up and retry later) is Good Enough for most situations (and is pretty much how MongoDB works).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: