Should you be scared of Unix signals?

pm215 · on June 13, 2016

If you've ever read the Lions' Book commentary on 6th Edition Unix, you'll notice that many parts of the API as implemented back there are pretty solid -- quality, well designed interfaces that have stood the test of time.

Signals are not one of those parts. The 6th Ed signal handling code reads to me as somewhat of an afterthought whose use cases were mostly "kill the process for a fatal signal or terminal ^C", "ptrace for a debugger" and maybe SIGALRM. The data structures don't allow a process to have more than one pending signal -- if a new one comes along the fact an old one was pending is simply dropped. Running a signal handler automatically deregistered it, leaving a race condition if two signals arrived in close succession (this is a well known bug fixed by BSD later). And EINTR is an irrelevance if signals are generally fatal but its effects spread like poison through every other kernel API if you need your program to be reliable even with signals being delivered.

The worst bugs and races were fixed up by the BSD folks and others, but the underlying concept is an unfortunate combination of "basically irredeemable", "indispensable" (you have to have some kind of "kernel tells you something has happened" API, and signals are what we got) and "insidious" (thanks to EINTR). I think they're a strong candidate for "worst design decision in unix".

(PS: one of the reasons they stand out in 6th Ed is that so much of the rest of that code is so good!)

Koromix · on June 14, 2016

Neil Brown published an interesting series about Unix design mistakes a few years ago: https://lwn.net/Articles/414618/

The first part of the "Unfixable designs" article is about Unix signals.

Qwertious · on June 14, 2016

>The worst bugs and races were fixed up by the BSD folks and others, but the underlying concept is an unfortunate combination of "basically irredeemable", "indispensable" (you have to have some kind of "kernel tells you something has happened" API, and signals are what we got) and "insidious" (thanks to EINTR). I think they're a strong candidate for "worst design decision in unix".

Suppose you were to throw the whole thing out and write a good replacement (and backwards-compatibility be damned), what would it be like?

pjc50 · on June 14, 2016

Suppose you were to throw the whole thing out and write a good replacement (and backwards-compatibility be damned), what would it be like?

Steal the best bits from Windows NT, and improve the existing mechanisms.

Kill signals in their current form. Build a general-purpose notification mechanism consisting of a mutex and a message. Possibly allow a process to have more than one message queue (Windows makes this really, really easy).

All IO, networking and informational signals (SIGWINCH, SIGCHLD etc) then come as messages (these may have to be fixed size, but anything from a few words to a 4k page would do). select, poll etc are replaced by waiting on a mutex. You can put all your worker threads waiting on that mutex. A message arrives. The kernel wakes one waiting thread and gives it the message (via an atomic deque-or-block syscall). You don't have to do any O(n) processing to work out which socket it relates to as the kernel has helpfully put it in the message.

In the deluxe 4k page version, a 1500-byte ethernet frame arrives, is DMA'd into the top half of a page, the kernel inspects it and sets message headers to say where the data is, and hands it directly into the userspace of a waiting process.

The one downside of this is that UNIX pipe programs become slightly more complicated. Rather than just doing "while(read()) write()" you'd have to switch on the type of message recieved and implement your own abnormal exit functionality. This could probably be tidied away for you inside the standard library.

External process control mechanisms would have to be built for killing processes and suspend/resume.

gpderetta · on June 14, 2016

Unix already has a general notification mechanism in the form of poll and select, no need to add a new one. The problem is not all interesting events are not portably delivered via a file descriptor, but that can be more easily extended (as done by lot of unices, including linux) than coming up with a completely new primitive.

But some messages really must be delivered synchronously and can't normally be queued: SIGSEGV, SIGFPE, SIGBUS, etc. There is really no way around interrupts.

BTW mutexes are not for signaling. What you want for signaling in a queue are semaphores, events or condition variables (or even file descriptors, like eventfd).

pjc50 · on June 14, 2016

Well, the brief was to "throw the whole thing out", including select (which is bad) and poll (which is merely adequate).

The machine traps are interesting in that they should only be generated locally - there's no sensible case for injecting SIGSEGV into other processes. Arguably we should learn from Windows "structured exception handling" here. There are two sensible things to do with traps (other than sudden death): hand over to a callback of some kind (which should be told about the state of the stack), or turn into a language-native exception and throw that.

gpderetta · on June 14, 2016

Poll is perfectly fine for the very large majority of unix applications which do not need to scale to tens of thousands of sockets.

The handling over to callback is exactly what is done by unix signals. Converting to exceptions can be implemented on top of signal handlers, but note that even MS stopped mapping by default structured exceptions to language exception a while ago, at least in C++, as unwinding the stack, destroying state and potentially calling destructors is the last thing you want on a segmentation fault or other unexpected events.

pjc50 · on June 14, 2016

The handling over to callback is exactly what is done by unix signals

Not quite, there are quite a lot of restrictions on what you can do in a signal handler. It ought to be possible to design a callback mechanism without those restrictions. And a signal tells you nothing about its origin or what file descriptor / child process etc. it might relate to.

gpderetta · on June 14, 2016

I assume that by restrictions you are talking about the async signal safety; this is inherent on the 'interrupt' nature of signals as they can happen at any point in a program execution, there is really no way around that. It would be of course nice if more functions where async signal safe (especially malloc).

Regarding the lack metadata, I agreed else thread that messages that carry such data ought to be transported via an explicit message queue, not via signals.

plorkyeran · on June 14, 2016

The equivalent of SIGSEGV/SIGBUS on Mach is handled in basically the way he describes. On an access violation, it suspends the thread and delivers a message to the registered port. The thing listening on the port (in a different thread or even different process) receives the message, does whatever it needs to do, and then sends a response, after which the original thread is woken back up. From the perspective of the violating thread it was handled synchronously, but the actual implementation was an async message queue.

gpderetta · on June 14, 2016

It is not much better though. If it is another thread it has to work under the same async signal safety rules of a signal handler (the blocked thread might be holding an arbitrary mutex). If it is another process, there isn't a lot it can do.

JdeBP · on June 14, 2016

Now go and read https://news.ycombinator.com/item?id=11864211 . There are those who argue (at some length, q.v.) that "the best bits from Windows NT" are not to have readiness-oriented application designs in the first place.

pjc50 · on June 14, 2016

Any particular bit of that huge thread which I've already read? :P The bit under https://news.ycombinator.com/item?id=11866697 ?

I was thinking more along the lines of Windows "registered I/O" and Linux "netmap". 'trentnelson' argues for a distinction between readiness and asynchronous. I don't think the distinction needs to exist, and that in an ideal world the disk subsystem would be more like a kind of networking subsystem. If the request can't be satisfied from a RAM buffer, add it to the "outbound requests" list and return to the calling thread. When the corresponding reply eventually turns up from the disk, add it to the input queue of the process.

JoshTriplett · on June 14, 2016

> Suppose you were to throw the whole thing out and write a good replacement (and backwards-compatibility be damned), what would it be like?

Keep the "you can always kill or stop a process" provided by SIGKILL and SIGSTOP/SIGCONT, using dedicated system calls. Handle every other kind of message (from the kernel or otherwise) via file descriptors, similar to signalfd.

JdeBP · on June 14, 2016

That's two people so far who have mentioned signalfd here, without reference to what the headlined article has to say on the matter (which, ironically, amongst other things points to https://news.ycombinator.com/item?id=9564975 on Hacker News and what it, in its turn, points to).

gmazza · on June 14, 2016

(k)dbus is the most likely replacement here and systemd is a major vehicle for its adoption beyond DBus' current niche use in Gnome.

ktRolster · on June 14, 2016

systemd?

edit: Actually there are some fairly decent systems for IPC now, with the Mach kernel from CMU probably being the most popular example. Most systems with micro-kernels probably have something similar, and of course dbus is there, and Android intents.

I think Applescript deserves a mention here, because although it's a rather weird language, the IPC stuff is really really easy.

rtpg · on June 14, 2016

Considering that you can basically establish no invariants in the signals model, to me it sounds like a cooperative model is the way to go.

Have a way to poll for "outstanding" signals and deal with them accordingly, in some sort of queue (and the OS drops older messages or something, if they're important you'd deal with them).

The "freeze-the-world-and-do-this-thing-instead" model seems like the best way to introduce "my entire program is now broken"-style bugs. The OS deals with interrupts but the OS is also running by itself.

jbarham · on June 14, 2016

FWIW Plan 9 uses what it calls a "notification" mechanism: http://plan9.bell-labs.com/magic/man2html/2/notify

adrianratnapala · on June 14, 2016

While the "notification" system seems is cleaner than the signals system (and presumably avoids those races fixed by BSD) -- it does not seem fundamentally different.

If I am understanding that man page correctly, it does nothing at all to fix EINTR: "If the note interrupted an incomplete system call, that call returns an error (with error string interrupted) after the process resumes."

nitwit005 · on June 14, 2016

EINTR is only "needed" because of the situation where a call is made to a blocking system call like read, and a signal is received before it completes. It needs to cancel the blocking call so that it can run the signal handler.

If you get rid of signals, you wouldn't need EINTR. Even with signals, what people do 99% of the time is just retry the system call immediately. It would probably have been much better to have it do so automatically by default, avoiding exposing EINTR to most users.

noselasd · on June 14, 2016

EINTR is retried automatically if you use sigaction() and set the SA_RESTART flag.

If you use signal() instead of sigaction() it depends on your OS whether EINTR is automatically retried - never use signal().

(There's a few exceptions still - partial completed syscalls, e.g. a signal can cause read() to return partial data or sleep() to return before it should - neither case gives you EINTR though)

adrianratnapala · on June 14, 2016

Yes, but how does the Plan9 "notify" mechanism help with any of that? It seems to just be a tidied up version of signals, and therefore "needs" to do EINTR in much the same way as Unix.

to3m · on June 14, 2016

More on what EINTR is for: http://250bpm.com/blog:12

reacweb · on June 14, 2016

Maybe, the signal coming from another process should be handled in an independent thread (maybe created the first time a signal is raised). Something would still be needed to interrupt pending system calls.

silvestrov · on June 13, 2016

I think signals are a "poor mans" implementation of threads and queues. They didn't have threads and queues back in the 70'ies, so they resorted to an ugly hack. Ditto with interruptable system calls.

It's somewhat like the interrupt-level callbacks classic MacOS (version 1-9) had for network and file i/o. They too didn't have threads back then, so interrupt-callbacks was the only way to avoid blocking calls which would make the UI hang while waiting for network requests.

mturmon · on June 14, 2016

This is an interesting perspective. I think of signals the other way around -- that they came out of people realizing how useful hardware interrupts could be in allowing devices to talk to the kernel, so they engineered a similar affordance (signals) to allow the kernel to talk to processes.

In the case of hardware interrupts (like for old keyboards), if you didn't grab that character off the hardware right away, it would vanish, or another one would take its place. As long as you were fast, it worked. Same with signals. As long as you don't try to do too much in the handler, it works. In other words, it was designed to be "good enough" if used properly (just like all hardware).

pvg · on June 14, 2016

I'm pretty sure they had threads and queues in the 70s. Or they wouldn't have bothered inventing primitives like semaphores in the 60s.

silvestrov · on June 14, 2016

No, threads is a rather late addition to unix. It came in the 90'ies. E.g. Sun Solaris got thread support in 1993. HP/UX got kerner threads in 1997.

adrianratnapala · on June 14, 2016

...so interrupt-callbacks was the only way to avoid blocking calls

I don't know how early MacOS works, but on Unix its not the _only_ way: that's what poll(), select() and the like are for.

And while threads their uses, they don't really change this aspect. Eventually the thread must report a result, and the best way to do that is to report into to some queue that gets polled somewhere.

cturner · on June 14, 2016

"on Unix its not the _only_ way"

Options were more limited in early unix. Use of signals as an adhoc IPC mechanism predates BSD sockets api (1983). Poll has something to do with SysV, so would have been later. There may have been other IPC mechanisms that were dropped before modern unix (but presumably dropped for good reasons). Art of Unix Programming refers to one called mx.

pjmlp · on June 14, 2016

No time to read it back again, but if I remember correctly not even POSIX defines portable semantics for signals across implementations, it always leaves some room for implementation specific behavior.

euske · on June 14, 2016

I found a paragraph in this article http://www.linusakesson.net/programming/tty/ very apt at describing what Unix signals are like:

  In *The Hitchhiker's Guide to the Galaxy*, Douglas Adams 
  mentions an extremely dull planet, inhabited by a bunch of 
  depressed humans and a certain breed of animals with sharp
  teeth which communicate with the humans by biting them very
  hard in the thighs. This is strikingly similar to UNIX, in
  which the kernel communicates with processes by sending
  paralyzing or deadly signals to them.

Animats · on June 13, 2016

Signals are like interrupts, and like interrupts, they're handled in an unusual environment. That's the main problem. You can be inside some nonreentrant library when a signal handler is called.

Most programs that do something complicated with signals generate an event in the signal handler and put it on a queue to be handled later. The queue should be lock-free, or there's a risk of deadlock.

ambrop7 · on June 14, 2016

Interrupts typically do not observably interrupt currently running code. In a simple system (e.g. embedded system with no threads, all event-driven), the interrupt handler will run, then the processor will go back to running whatever it was running before. This is not so for UNIX signals, in case you are in the middle of a system call, because the mere occurrence of the signal will change the behavior of the interrupted code.

Yes I know it's not the same because in my example single-threaded system there's no such thing as a blocking call.

Actually I don't see a good reason why signals in unix would have to cause EINTR errors in system calls. Perhaps a better solution would be to let the system call go on normally. Since the signal doesn't observably interrupt code not in a system call, why would it observably interrupt code in a system call?

In case anyone thinks, "so you can detect the signal in the main code", that is a bad answer because whatever you do you will have race conditions if the signal happens just before you enter the system call. Your only chance is to use things like ppoll() which are designed for proper signal handling, and these things could work just as well in a hypothetical unix design with no EINTR.

scottlamb · on June 14, 2016

> Actually I don't see a good reason why signals in unix would have to cause EINTR errors in system calls. Perhaps a better solution would be to let the system call go on normally. Since the signal doesn't observably interrupt code not in a system call, why would it observably interrupt code in a system call?

You can request this behavior via the SA_RESTART flag. I'm not sure if it applies to all syscalls (the Linux manpage suggests not). I'm also not sure if things that take a relative timeout internally subtract the elapsed time when interrupted; one could imagine that each signal snoozes the timer on nanosleep or select/poll/epoll_wait so that you oversleep.

gpderetta · on June 14, 2016

You can siglongjump out of a signal handler [1]. If you sigsetjump right before doing a blocking call, you can reliably detect signals.

Another way to avoid the race condition in poll/select, before p{poll/select} were standardized, was to store the timeout parameter in a global variable and have the signal handler set it to zero. Finally there is the self pipe trick, which admittedly doesn't require EINTR at all.

[1] This is historical unix behaviour. At one time it was specified by the SUS, but it seems that it was dropped from more recent SUS/Posix standards.

scottlamb · on June 14, 2016

> You can siglongjump out of a signal handler [1]. If you sigsetjump right before doing a blocking call, you can reliably detect signals.

The problem with that approach is that if the system call has already returned by the time the signal handler runs and jumps, the system call's return gets clobbered. So if for example you're doing blocking reads/writes, you don't know how many bytes you read or wrote.

If your only blocking syscall is level-driven polling this approach is fine but the self-pipe trick is easier.

I wrote (10 years ago) a library to do something similar reliably. It required custom wrappers for every system call of interest so I could know by the instruction pointer in the ucontext_t whether the system call had actually run yet or not. http://www.slamb.org/projects/sigsafe/ The library's a bit stale now; it doesn't do the vsyscall thing for example.

gpderetta · on June 14, 2016

Duh! You are right, losing the results of partial read/writes is not acceptable. I guess on x86, completely unportably, you could check whether the current ip is pointing to a syscall/int instruction.

scottlamb · on June 14, 2016

You probably want to jump if you're "just before" the syscall, too, though. So you end up with basically this:

syscall wrappers: https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-... (probably could do something better with that thread local; and as I mentioned this isn't using vsyscall)

signal handler: https://github.com/scottlamb/sigsafe/blob/master/src/x86_64-... (although it bugs me now that I iterate the whole array if the instruction pointer's not in any of the syscalls)

and just to be sure, a race checker: https://github.com/scottlamb/sigsafe/blob/master/tests/race_...

zAy0LfpBZLC8mAC · on June 14, 2016

> Another way to avoid the race condition in poll/select, before p{poll/select} were standardized, was to store the timeout parameter in a global variable and have the signal handler set it to zero.

That doesn't work. At some point, the userspace code has to copy the value from the global variable to the location where the system call calling convention expects it, some time later followed by execution of the system call trap (and usually there is even the libc system call wrapper in between the two that takes care of adapting the userspace calling convention to the system call calling convention). If your signal handler gets to run in between the two, the system call timeout will remain unchanged.

Or in other words: Yes, be afraid of signals, whatever clever scheme you come up with to handle them probably is wrong.

gpderetta · on June 14, 2016

It works with select because it takes a pointer to the timespec object, which is read kernel side [1]. I misremembered poll doing the same thing but it takes a plain integer parameter.

[1] not guaranteed of course, but it is historical Unix behaviour

Edit: btw, i do agree with your final sentence!

JdeBP · on June 14, 2016

... and, thence, SA_RESTART.

eric_the_read · on June 13, 2016

This is FoaF-level stuff, but:

I used to work with a guy who in a past life was an HP-UX dev. He told me that the guys who worked on the signals support in the OS had a 10-foot pole between their cubicles that had a flag on top reading: "You must be THIS tall to use signals."

vikiomega9 · on June 14, 2016

What's FoaF? (RDF?)

ludamad · on June 14, 2016

Friend of a friend

marios · on June 13, 2016

I've linked to this before, but AFAIK it's still relevant as the gotchas regarding signals haven't changed. Slides from a talk titled "Signal Handlers" from OpenBSD developer Henning Brauer : http://www.openbsd.org/papers/opencon04/index.html To answer the articles question: should you be scared of Unix signals ? No. But you shouldn't do anything complicated in the signal handlers.

wscott · on June 13, 2016

BitKeeper uses signals to implement a paging data structure from a compressed backing store. I allocate the memory for my data structure that is backed by a file on the disk and then use mprotect() to mark that memory as read-only. Later when trying to access that memory a signal handler traps the access and loads and decompresses the data from disk into memory.

This is only done for unix systems that implement the sigaction() POSIX signals. It is tricky to get right, but it does work.

BTW I did find I could never get OSX 10.4 to work correctly, but by 10.7 Apple had finally fixed the bugs in their signal code.

monk_the_dog · on June 14, 2016

Oodbs used a similar technique to translate addresses from a large "global" address space into smaller "local" one. Here is a paper if you're interested (ftp://ftp.cs.utexas.edu/pub/garbage/swizz.ps). Did you discover this technique on your own? At any rate, very cool.

blucoat · on June 14, 2016

>SIGSEGV is a very important signal. It happens when your program tries to access memory that it does not have. An appropriate reaction might be to

    allocate more memory
    read some data from disk into that memory
    do something with garbage collcetion (but what? I'm confused about this still.)

What? Are there any Real World Programs which do anything other than print a stacktrace and exit? I don't think this person gets what a segfault is.

kaushiks · on June 14, 2016

While I agree that it'd be unusual for regular applications to have to resort to using SEGVs to implement features, low level systems code, especially VMs often do, for performance reasons. The Hotspot JVM for instance uses SEGVs to force a thread into a safepoint. The JIT inserts a read instruction, among other places, at backward branches, which tries to read from a page in memory called the polling page. Said page is mapped during normal operation of the application. When the VM needs to bring threads to a "safe point", say to perform a GC, it does so by unmapping the polling page. This causes each of the active threads to fault on the read and enter the SEGV handler, which notices that the faulting address falls within the polling page and executes appropriate "safe point" actions. Libc implementations use a similar technique to commit pages for a thread's stack lazily.

barrkel · on June 14, 2016

Windows uses page faults in the stack guard page to lazily commit stack pages. Compilers allocating large structures on the stack need to generate loops touching each allocated page in turn to guarantee the allocation. On Windows the lazy allocation can be done entirely in user code - it doesn't need to be an OS feature. I believe pthreads uses the same technique on Linux; very far from sure though.

Generational GC can use segfaults to detect writes to older generations and mark pages that need scanning for references to younger allocations. They can also act as a way of triggering a safe point without polluting the branch prediction cache: unmap a page when you want an interrupt, and periodically touch the page in code that needs interrupting (loops etc.). Virtual machines for languages like Java can and do use these techniques.

hornetblack · on June 14, 2016

If you had a Green threaded program and one of the threads segfaulted. You would probably want to catch Segv and kill that thread. (Not killing the OS thread running it).

I've also seen is used to implement a distributed malloc. When a segfault occurs, the handler messages the programs peers asking if they have the data for that address. If so the peers sends the page and the handler maps in a new page for that address with the correct data in it. This is essentially implementing a page fault handler in user space. (For some network backed memory).

gpderetta · on June 14, 2016

Why would you only want to kill that green thread? On any thread implementation I'm aware of, an unhandled segfault kills the whole process. Anything else is disaster waiting to happen.

bediger4000 · on June 14, 2016

I've read that one of the original Unix shells (Thompson's or Bourne's) used a combination of sbrk()/brk() system calls and SIGSEGV to do dynamic memory allocation for itself. I can't find a reference to this via Google, as any information about old shells and SIGSEGV is swamped by modern people talking about bash and bad programs, or trapping SIGSEGV in scripts or some such. The "heirloom sh" code doesn't have anything like that, but it's clearly been tinkered with, as it uses sigaction(), a BSD innovation.

So, feel free to ignore this vague memory.

wyldfire · on June 13, 2016

As others have said, there's peril to be had there for sure, so tread carefully. Minimize the scope of your handler is the best advice, certainly also refer to "Async-signal-safe functions" in signal(7) if you must use libc funcs.

One challenge in a distributed system when there's (ab)use of signals is finding out which process issued a signal. There might be a better facility to do it now but I've used systemtap [1] to find out who the sender was with satisfactory results.

[1] https://sourceware.org/systemtap/examples/process/sig_by_pid...

js2 · on June 13, 2016

Except for SIGKILL. When you get sent SIGKILL nobody communicates with you, you just die immediately. But the rest of the signals you're allowed to install signal handlers for.

SIGSTOP also cannot be caught nor ignored.

wtf_is_frp · on June 13, 2016

or blocked.

ambrop7 · on June 14, 2016

The worst thing about signals is that they interrupt whatever system call the thread is currently inside (EINTR). This is not typically observed but can have dire consequences randomly. For example, last time I checked, in Python 2.7, a signal that invoked a signal handler will cause a running print() to throw an exception. Here you should consider signals like SIGCHLD which you want to handle and not kill the process.

A particular case this happens is when doing event-driven programming. The only way to be sure that you don't have such bugs lurking is to setup signal handling such that a signal cannot possibly interrupt unsuspecting code. Currently, I'm aware of two solutions, both involve blocking signals:

1. Block relevant signals in the main loop (or generally all threads) and use signalfd to detect and consume signals (or similar mechanisms on other platforms, e.g. kqueue).

2. Start a dummy thread whose only purpose is to handle signals, leave relevant signals unblocked in this thread and block them in all other threads. Write the signal handler to communicate the signal to your main loop via the self-pipe mechanism or similar.

Note that solution (2) can usually be implemented for an existing framework without changing that framework - you only need to add code to main which starts that thread then blocks signals, before any other threads are started.

I consider "fixing" code to be robust to signals a non-solution, because you would have to verify every single piece of code running in your program, including third-party libraries.

ptx · on June 14, 2016

> For example, last time I checked, in Python 2.7, a signal that invoked a signal handler will cause a running print() to throw an exception.

Python 3.5 fixes this – system calls are now automatically retried:

https://docs.python.org/3.5/whatsnew/3.5.html#pep-475-retry-...

0x0 · on June 14, 2016

I've never seen a safe signal handler beyond a SIGTERM that sets a "volatile int time_to_quit = 1" for a main loop to pick up on later...

JdeBP · on June 14, 2016

volatile int isn't necessarily safe. volatile sig_atomic_t would be, however.

bitwize · on June 13, 2016

YES. Especially when there are much better ways of handling OS interrupts to your program -- like Structured Exception Handling under Windows.

gpderetta · on June 14, 2016

The big problem with unix signals is that they have been abused to deliver some messages that should really be delivered via a message pipe (e.g. SIGCHD, all the terminal/tty specific signals). The other is that the set of signals is limited and signal handlers are a process (or thread) wide resource, so it is hard to make use of them in a library.

Other than that, the general ability of interrupting and delivering a message to a thread no matter what is doing is necessary and signals are a way to implement that. Exceptions are another way, but that can be implemented on top of signals.

edit: but there is really no excuse for EINTR. The "Worse is Better" essay has something to say about this.

known · on June 14, 2016

Consider this code highly experimental and yourself highly mental if you try and use it in a production environment

http://www.kegel.com/c10k.html#examples.nb.sigio

jasonzemos · on June 13, 2016

On linux, the behavior for locking a mutex (I tested a GNU C++11 std::mutex) from a signal handler is to consider it (at least for the interrupted thread) to have been unlocked. This allows intuitive synchronization from handlers and avoids deadlocks, which I'm assuming is facilitated in the kernel-futex design. If any kernel hackers want to chime in on why this works (and is safe (is safe?)) in the face of most unix specifications, generic docs and articles like this it may be enlightening.

gpderetta · on June 14, 2016

You are well into UB land. The behaviour you describe is very dangerous as the signal handler will be accessing the mutex protected data structure while it is in a potentially inconsistent state.

The right, portable way to signal from a signal handler are POSIX semaphores that on glibc are a thin wrapper over futexes. Any data structure access must be non blocking.

Edit: autocorrect

jasonzemos · on June 14, 2016

I should mention I was using x86. My initial assumption was that the kernel simply references the robust list (even if it was initially entirely resolved in userspace), and yields back to the interrupted thread -- I should emphasize my test showed the kernel breaks into the lock and presents a semi-coherent as-is structure entering and exiting the handler's lock. Of course this is all way way UB for portability or complex structures indeed...

0x0 · on June 14, 2016

Even on x86, you could have a pthread_mutex protecting a struct with two integers that need to be updated "atomically", and have a signal delivered in the middle?

jasonzemos · on June 14, 2016

Sure, and then the handler only sees one integer as updated and the other integer will be updated after the handler. The lock gets silently broken unfortunately, but there's probably a useful reason for why this is. It could just deadlock instead.

gpderetta · on June 14, 2016

Deadlocking would be the ideal outcome. Deadlocks are easy to debug. Silent concurrent memory corruptions not so much.

0x0 · on June 14, 2016

Well that's just plain broken.

scottlamb · on June 14, 2016

If the signal handler breaks the lock, I suspect it's more by accident than design. This is not a reasonable thing to depend on.

> presents a semi-coherent as-is structure entering and exiting the handler's lock

I don't understand the word "semi-coherent" in this sentence. Effectively, a signal handler interrupts some thread and doesn't allow it to proceed until the signal handler is finished.

In general, you could say there are three aspects to why locking is needed:

* hardware memory barriers. Here I think you're fine; really the signal handler runs on the thread in question so anything the interrupted thread did the signal handler sees on entry and likewise anything the signal handler did the interrupted thread sees on exit.

* simplifying your code (or the library code you're calling). Even ignoring CPU/compiler re-ordering, it's much saner to just write your program such that you guarantee certain invariants are held when the locks are not held and make no such guarantees while the locks are held. If you have to look at all possible interactions line-by-line or instruction-by-instruction, there are so many more combinations to test. It's not as hopeless for thread-vs-signal as with thread-vs-thread (O(instructions in critical section) vs O(2^instructions) orderings, as the signal handler always runs to completion instead of interleaving with its thread arbitrarily) but it's still plenty bad enough.

* compiler memory barriers. You're basically screwed here IIUC. Compilers can and do bizarre things such as reusing your variable to store something completely different. https://software.intel.com/en-us/blogs/2013/01/06/benign-dat...

Except as a last-ditch attempt at gathering debugging information on crash (where not perfectly reliable is likely acceptable), I'd say it's totally unreasonable to ever access mutex-protected state from a signal handler that can run while the thread it interrupts might hold the lock. Note this includes malloc and free. Don't call those from a signal handler. (I suppose you might get away with it if the interrupted thread never calls them or only does so with the signal handler blocked, but that'd be very unusual, and it's still not guaranteed to be safe according to the POSIX standard. I'm not sure off-hand what the kernel/libc might do that would mess this up but I wouldn't bet on it.)

As I mentioned in another comment, I think of signals as two separate things. Process-directed signals can be handled relatively simple without even requiring signal handlers. Thread-directed signals are tricky and you're doing them wrong.

0x0 · on June 13, 2016

Do you have a source for this? Are you talking about pthread_mutex_lock() ? Maybe I'm misreading you, but it sounds like you are saying it would be safe to treat a mutex as unconditionally unlocked inside a signal handler, which doesn't make much sense (what if a signal is delivered between _lock() and _unlock()? )

breadbox · on June 13, 2016

Also, signals become much easier to deal with if your program is single-threaded. Once threads get involved, it becomes more complex to know which thread(s) will receive a given signal.

signa11 · on June 14, 2016

> Once threads get involved, it becomes more complex to know which thread(s) will receive a given signal.

well, one approach that might be worth looking into would be to designate a special thread as a signal-handling-only thread. others just block every signal that can possibly be blocked. this signal-handling-thread then communicates the signals etc. to others as needed.

prima-facie, this boils down to signal handling for single threaded programs. what might be the downsides ?

scottlamb · on June 14, 2016

People say "signals" as if they're just one thing, but I find it more useful to break them into two categories:

* process-directed signals such as SIGHUP, SIGINT, SIGWINCH, SIGTERM, SIGQUIT, SIGCHLD. They come from outside the process, including the `kill` command, the init system, and the terminal. For these, a dedicated signal-handling thread is a common, practical approach. Even if your program is single-threaded before implementing signal handling, creating a new thread might be the best approach. Or you could integrate signal handling with an event loop via the self-pipe trick.

* thread-directed signals such as SIGSEGV, SIGFPE, SIGBUS (the preceding are all machine exceptions), SIGPIPE, SIGPROF, or anything sent by pthread_kill / pthread_sigqueue. If you need to handle these signals (usually for diagnostics), by definition you have to do it in the thread in question. And you almost certainly need a traditional signal(2) / sigaction(2) style signal handler.

jhallenworld · on June 14, 2016

When I first started programming UNIX I thought SIGIO made sense as a good mechanism for I/O multiplexing. I thought this because of much previous experience with interrupt handlers in the embedded world. However at the time, it just did not work (no sigsuspend) and even today it's a big mess. SIGIO should just be removed- it's the wrong way to handle I/O in UNIX.

ausjke · on June 14, 2016

I think these days we all should use sigaction instead of signal()? why is this totally missed in the post.

http://stackoverflow.com/questions/231912/what-is-the-differ...

zAy0LfpBZLC8mAC · on June 14, 2016

I would say: Oh, yes, be as afraid as you can be. But don't let that stop you from figuring out why it is perfectly rational to be afraid of signals ;-)

Now, it is not impossible to use signals, but there are many opportunities to screw it up, often in non-obvious ways (so, things seem to work, but it's not actually reliable). And at the same time, signals almost never give you any advantage over alternatives if you do it correctly. That profiler thingy might be one of the rare cases where it actually makes sense.

In particular, what tends to be so tempting about signals is that they are executed "immediately", so you get to react without any further delay, no matter what else your program is currently doing--who wouldn't want that? Except that doesn't actually work, because you need to somehow access the state of your program in order to actually do anything useful with the signal notification. But you cannot access that state unless you can be sure it's in a consistent state and that your accesses won't interfere with what your program is doing in some unpredictable way. Just like in multithreaded programming. You have to somehow coordinate with your program to make sure things happen in an orderly fashion. Which essentially means that you only can access the program's state at certain times when the program isn't currently using it. Like, when it is unlocked. Except the signal handler potentially preempts your program, so you can't use locks to perform the coordination, because that could deadlock. Except if you were to use locking primitives that also block signals, so that preemption during critical sections can't occur. But then, you effectively have a weird polling solution (the unlocking at the end of the critical section/before entering the event loop dispatcher effectively acts as if you were polling for signal events).

Also, you cannot even reliably queue signals without potentially dropping some. Now, the kernel does that anyhow, so you can't rely on all signals being delivered individually anyhow, but it still is important to understand why that is (which is also why the kernel behaves the way it does): If you consume events, you have to either have some mechanism to slow down the source to prevent it from generating events at a higher rate than you can handle (like, if you can't keep up reading from a pipe, the writer end of that pipe will block in order to stop it from producing more data), or you would need potentially infinite amounts of memory to be able to store all those events for later processing. Now, the latter isn't really possible, of course - but it's even worse in signal handlers because you cannot really allocate memory there because there is only one memory allocator in the libc and that most definitely is not reentrant (like, you cannot allocate memory right in the middle of your thread freeing memory).

What this boils down to is that you always have to somehow defer processing of signals to some point in time where you can actually safely access your program's state, which is something that you can achieve with pipes and sockets much more easily.

jfoutz · on June 14, 2016

This is one of the great counter examples to betteridge's law.

puppers · on June 13, 2016

Signals are not scary if you read the docs! Perhaps you could tone it down with the exclamation points!