It sounds like they are using NFS for distributing their code, which should be a...

rachelbythebay · on Feb 7, 2016

Let's say you distribute a binary over NFS -- compile some C or C++ or whatever you like. Then various hosts run /path/to/binary. At some point, the NFS server changes that file out from underneath those hosts, because, well, it can. The usual "text file busy" you'd get when trying to do that on a local filesystem never happens.

At some point after that, the hosts running the binary will try to page something involving that binary and will SIGBUS and die.

That's just one of many failure modes.

apk17 · on Feb 7, 2016

They say they put out new package; I take that to imply they don't rewrite files once deployed.

But what you're describing has interesing failure modes indeed - from just successfully patching the running process to the SIVSEGV to having the target process ending up in an infinite loop (BTST).

jagsta · on Feb 7, 2016

That's not the only issue with running NFS with a horizontally scaled web cluster, stuff like this starts to hurt: http://www.serverphorums.com/read.php?7,655118