Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I work for a retailer where the service I’m responsible for is used by every cash register around the world for certain operations. When I came in, the RDS DB for this service had 60GB allocated to it, had literally just run out of space and caused an outage. The last team just gave it an additional 20GB. A month later, I was put in charge of it and it was already 5GB away from running out of space again. I put an end to that and gave it 250GB. The cost is minimal compared to a store not being able to open due to an outage.

The instances for the service itself had 20GB of EBS allocated to them. Luckily they don’t need much local storage. But that’s typical here. There’s a Jenkins instance that is even more of a pain. I’m not responsible for it but every week or two one of the worker nodes runs out of space because they’re given 8GB of storage space. I’m just watching the disaster unfold over the course of a year and a half as I’m constantly telling that team to just up the storage space on the worker nodes instead of constantly having to fiddle with cron jobs.

It’s not even an expense thing. They just… don’t want to increase the storage space. It drives me insane.



I'd guess the worry is that once you increase the storage, you never decrease it again. Ever. It's a one-way street. So, once everything is 5x over-provisioned, then the services tend to fill that space anyway (cause why not be wasteful if it doesn't cost anything) and a year later you are in the same seat again.

I'm not saying this is real, but the worry certainly is.


That's certainly real and something to consider when provisioning systems. I'm fully on board with that. The problem is when the cost of the cost-savings solution vastly outweighs the cost of over-provisioning infrastructure. Like this Jenkins issue bubbling up ~2-4 times a month vs just giving the worker nodes more storage space. There's been times where it happened during the night and people got paged.

Or comparing the cost of one store not being able to open on time because the RDS database's space ran out. VPs and directors start yelling and there's suddenly like 20+ people involved in figuring out why this one store didn't open on time. What's the cost of that compared to just giving the DB 250GB of space so this never comes up again?

But you are also 100% correct and I've seen that happen here, too. There's some instances I'm responsible for that were using EFS for their local storage. Costing thousands of dollars every month for absolutely no reason. I switched those to reasonably-sized EBS volumes and that alone was half of my annual savings goal.

I was completely flabbergasted seeing these instances using EFS while others were stuck on 8GB EBS volumes. Backups on the EFS drives had ballooned to the many TBs. And the backups were worthless! Instances themselves are ephemeral. They use S3 for long-term storage & metadata is on a database. Those are the things that should be backed up & their cost compared to EFS is minuscule.


Yeah. I suppose the tricky thing is:

> compared to just giving the DB 250GB of space so this never comes up again?

As long as there is reasonable confidence in that this is actually the case, then just provision the space and be done with it. That requires a certain understanding of future space requirements/expectations, and anything even just so slightly running away / leaking space will hit any limit given enough time. So, due diligence requires looking at whether it's actually needed.


Yup, I implemented a bunch of graphs and alerts. Right now it's at 100GB of usage so it's still growing but at a fairly predictable rate. Another nice thing to know is if it's possible to reduce that usage. I haven't been able to look into that but I know one of the causes of the usage increase. The service uses the DB to store some indexing data. There's a team forcing it to re-index and I can tell when they deploy because the storage spikes a little bit every time they do a deployment. Nothing I can do about that, sadly.


In my experience itd just control for the sake of control


Does nobody have space alerts ?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: