Consul issues have bitten me at two companies, and I heard word of it being the ...

m0xte · on Nov 17, 2019

Totally agree. It can really shit the bed hard it can. I had an 0.8 cluster that crashed with a never ending leadership election. Only option was to burn the whole cluster to the ground and reinitialise it from scratch. That issue seems to have gone away since 1.0 but I’m not sure I can sleep after running it for a couple of years. Every time I bounce a cluster node for patching I clench.

I could write a book on the problems I’ve had with vagrant as well. Lost at least a day a week to that in the last month.

closeparen · on Nov 17, 2019

The worst outages will always be those involving core infrastructure.

yclept · on Nov 17, 2019

Consul seems to be more prone to issues than one would hope though. Imo the feature set is not worth the increased complexity and operational burden. There are simpler ways of handling service discovery and configuration without running your own consensus based cluster.

kilburn · on Nov 17, 2019

Genuine question: can you explain a couple of these simpler ways please?

closeparen · on Nov 18, 2019

Central authorities are typically simpler than gossip and consensus systems. They have failure modes too, of course, but those failure modes are better understood and potentially easier to manage.

Sometimes you can't avoid the need for distributed consensus, but you can box it inside a well defined abstraction like leader election, and then do everything else in a traditional client-server way.