Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Larger disk space does help effortlessly scale storage in single-node systems, but I agree with you that shared nothing (and/or shared something) is a necessary step for extracting maximum performance on a larger machine. When it comes to distributed architectures, shared nothingness is important as well. The decoupling of storage from compute and stateless from stateful helps minimize resource allocation when it comes to billion-scale vector storage, indexing, and search. Milvus 2.0 implements this type of architecture - here's a link to our VLDB 2022 paper, if you're interested: https://arxiv.org/abs/2206.13843


Just having "moar disk!" ≠ "scalability." Because unlike running single-thread on a shard-per-core [or hyperthread] basis, aligning NUMA memory to those cores, etc., there's no way to make your storage "shared-nothing."

At ScyllaDB we've put years of non-trivial effort into IO scheduling to optimize it for large amounts of storage. You also need to consider the type of workload. Because optimizing for reads, writes, or mixed workloads are all different beasties.

More here:

https://www.scylladb.com/2022/08/03/implementing-a-new-io-sc...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: