Strace - The Sysadmin's Microscope

jrappleye · on Jan 6, 2012

It's also worth looking at SystemTap or DTrace, depending on what OS you're running. While strace will allow you to look at an individual process and its children, SystemTap/DTrace will allow you to gather data on system call (and then some) usage system wide. Some examples:

- monitor execve() calls system-wide

- monitor I/O to a specific file, from any process

- measure per-process network usage

(note that newer Linux kernels may have other ways of accomplishing some of these tasks that I'm not aware of).

I've had a lot of success using SystemTap to look at low-level filesystem performance issues in the kernel. We've run SystemTap scripts on our production filesystem servers for over a year with no problems whatsoever.

Edit: formatting

JoshTriplett · on Jan 6, 2012

On a current Linux system, you can monitor several of the items you mention using "perf".

yan · on Jan 5, 2012

Also very useful: ltrace.

ltrace is like strace, but does library calls instead of system calls.

minimax · on Jan 5, 2012

strace tip #31415927: If your program is I/O bound, sometimes you can improve performance by increasing the size of the read buffer. A bigger buffer means fewer system calls and potentially increased performance. How do you know how big the read buffer is? Sometimes it's hard to tell even if you have the source (i.e. you're using stdio). With strace you can see the number of bytes you're trying to slurp in with each read system call. If it looks like a small number, you can then go figure out how to make it a bigger number, perhaps using setvbuf or rolling your own buffered I/O.

dap · on Jan 6, 2012

Yes, buffer size can have a significant effect on performance. You can quickly see buffer sizes used by "read" across your system with "dtrace -n 'syscall::read:entry{ @ = quantize(arg2); }'", which summarizes the output (in case you're doing more of these than you can reasonably see in the console) and has significantly less impact on the program you're tracing. Output for my system:

           value  ------------- Distribution ------------- count    
               0 |                                         0        
               1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1278     
               2 |                                         0        
               4 |                                         0        
               8 |                                         0        
              16 |                                         0        
              32 |                                         0        
              64 |                                         0        
             128 |                                         0        
             256 |                                         0        
             512 |                                         1        
            1024 |                                         0        
            2048 |@                                        20       
            4096 |                                         2        
            8192 |                                         0        
           16384 |                                         0        
           32768 |                                         0        
           65536 |                                         3        
          131072 |                                         2        
          262144 |                                         1        
          524288 |                                         0

and if you want to know where, say, the 512-byte reads are coming from:

# dtrace -n 'syscall::read:entry/arg2 == 512/{ @[ ustack() ] = count(); }'

  mdworker                                          
              libSystem.B.dylib`read+0xa
              Foundation`-[NSConcreteFileHandle readDataOfLength:]+0x1d6
              RichText`GetMetadataForURL+0x338
              mdworker`0x100006a66
              mdworker`0x100009ec1
              libSystem.B.dylib`_pthread_start+0x14b
              libSystem.B.dylib`thread_start+0xd
                1

spullara · on Jan 6, 2012

For those of you on the Mac that doesn't have strace but does have dtrace, here are some preinstalled dtrace scripts you have at your fingertips:

dtruss - similar to strace opensnoop - all files nettop - all network access iosnoop/iotop - all io execsnoop - all new processes errinfo - all system calls resulting in errors

Not exactly the same info but I think much more powerful as it is system wide and you call always filter out what you don't need to know, or write your own scripts!

dicroce · on Jan 5, 2012

I use strace all the time (and ltrace too)... I honestly don't know how people get by without it...

BTW, the Windows equivalent is called "Process Monitor" from the SysInternals guy...

upwardbound · on Jan 5, 2012

What are the args to make Mac OS's "dtrace" behave like "strace" does when given no options?

bodyfour · on Jan 6, 2012

Look at dtruss.

Unfortunately you need to "sudo" to use it, since dtrace needs root permsisions.

dekz · on Jan 6, 2012

Helped my trouble shoot why sudo was taking 25+ seconds yesterday. Apparently it was timing out attempting to perform some NIS operations on a misconfigured setup.

Maxious · on Jan 6, 2012

I've had this issue too, apparently it used to completely fail if the hostname lookup failed even if your sudoers didn't talk about hostnames at all: https://bugs.launchpad.net/ubuntu/+source/sudo/+bug/32906

tocomment · on Jan 6, 2012

Does anyone know of anything that will parse the output of strace or dtrace. What I'd like to do is generate a graph of my pipeline showing which program calls which other program, how long does it take to get back, which files it uses, etc.

I think it would be a great visualization to go along with my documentation.

jrappleye · on Jan 6, 2012

A little Googling turned up this (definitely not the original source, though):

https://github.com/CyanogenMod/android_external_strace/blob/...

  $ strace  -tt -q -o graph.strace -s 100 -f bash -c 'ls |wc -l'
  $ ./strace-graph graph.strace 
  bash -c ls |wc -l
    +-- ls
    `-- wc -l

The comments in the code claim it will show elapsed time for each process, but that's not working for me.

I discovered the Python ptrace module while I was searching for this. I have a project for which modifying the Python module might be a nice alternative to parsing strace output.

JonnieCache · on Jan 6, 2012

OS X has a fancy GUI for this called Instruments.

(Obviously the cmdline tools are there as well.)

bingbing · on Jan 6, 2012

Instruments is a frontend to dtrace, I think the closest equivalent to strace on OS X is dtruss.

Additionally, OS X has several useful stock dtrace scripts, check 'apropos dtrace' for a listing

onedognight · on Jan 5, 2012

Does anyone know of a usable (no sudo needed) strace for Darwin / OS X?

spullara · on Jan 6, 2012

With dtrace you don't run your program as root. Dtrace runs as root. It isn't a launcher like strace.

onedognight · on Jan 7, 2012

I'm referring to dtruss which is a launcher like strace and is built on dtrace. The dtruss "interface" predates dtrace, but historically didn't need to be run a root.

1amzave · on Jan 6, 2012

This has bugged me for quite a while, a few days ago enough so that I actually grabbed the strace source code with the intent of compiling it on OSX, at which point I discovered the real reason: ptrace on OSX doesn't support the PTRACE_SYSCALL flag. (I gave up at that point.)

brown9-2 · on Jan 6, 2012

This is a tangent but why is sudo needed != usable?

onedognight · on Jan 6, 2012

I cannot think of a single time when I'm debugging an errant program when I would like to have it run as root. I can however think of many things, like debugging permission issues, resource limit issues, reading and writing files, etc where it makes a big difference to run as root. I know you can su and then su back to yourself, but that's a pain and things aren't exactly the same anymore, i.e. there is a usability problem.

seanp2k2 · on Jan 6, 2012

Your can send your complaints to Apple and hope that they fix it, or use free software that doesn't do stupid things like this. IMO, you kind of forfeited your right to complain when you started using non-free software.

jrappleye · on Jan 6, 2012

Unfortunately, not all organizations give their users root access :-(

dedward · on Jan 6, 2012

usualy yoh debug code with gdb Nd the like....dtrace is morevfor profiling the whole setup and figuring out where to optimize......plus you can, i believe, insrument your code with dtrace probes, even better. if a company wants their product optimized theyll have to provide the tools for it.