> IIRC we modified our database to run off ramdisk
If a ramdisk is sufficient for your use case, why would you use a Raft-based distributed networked consensus database like etcd in the first place?
Its whole point is to protect you not only from power failure (like fsync does) but even from entire machine failure.
And the network is usually higher latency than a local SSD anyway.
> 1000 fsyncs/second is a tiny fraction of the real world production load we required
Kubernetes uses etcd for storing the cluster state. Do you update the cluster state more than 1000 times per second? Curious what operation needs that.
kubelet constantly checks-in to report "I am alive, and here is the state of affairs" so that kube-apiserver can make informed decisions about whether a Pod needs attention to align with its desired state
So, I don't this second know what the formal reconciliation loop is called for that Node->apiserver handshake but it is a polling operation in that the connection isn't left open all the time, and it is a status-reporting operation. So that's how I ended up calling it "status polling." It is a write operation because whichever etcd is assigned to track the current state of the Events needs to be mutated to record the current state of the Events
It actually could be that even a smaller sized cluster could get into this same situation if there were a bazillion tiny Pods (e.g. not directly correlated with the physical size of the cluster) but since the limits say one cannot have more than 110 Pods per Node, I'm guessing the Event magnification is easier to see with physically wider clusters
> And the network is usually higher latency than a local SSD anyway.
Until your write outrun the disk performance/page cache and your disk I/O performance spikes. Linux used to be really bad at this when memory cgroups were involved until a cpuple of years ago.
> If a ramdisk is sufficient for your use case, why would you use a Raft-based distributed networked consensus database like etcd in the first place?
Because at the time Kubernetes required it. If the adapters to other databases existed at the time I would have tested them out.
> Kubernetes uses etcd for storing the cluster state. Do you update the cluster state more than 1000 times per second? Curious what operation needs that.
Steady state in a medium to large cluster exceeds that. At the time I was looking at these etcd issues I was running fleets of 200+ node clusters and hitting a scaling wall around 200-300. These days I use a major Kubernetes service that does not use etcd behind the scenes and my fleets can scale up to 15000 nodes at the extreme end.
If a ramdisk is sufficient for your use case, why would you use a Raft-based distributed networked consensus database like etcd in the first place?
Its whole point is to protect you not only from power failure (like fsync does) but even from entire machine failure.
And the network is usually higher latency than a local SSD anyway.
> 1000 fsyncs/second is a tiny fraction of the real world production load we required
Kubernetes uses etcd for storing the cluster state. Do you update the cluster state more than 1000 times per second? Curious what operation needs that.