There are times when your software is failing but it doesn't output any error messages. Maybe you are getting a 500 error when you load a cgi script but the log is not telling you why. Strace can often help in situations like this.
Strace does not trace the internals of your application, it only knows about syscalls, or calls into the kernel. This can be enough in many cases to show you when your application might be trying to look for a file that is not there, or when it cannot allocate memory or some other error.
Generally I call strace with option -f so it will follow child processes, and then the name of the command to trace. To save the output to a file, use -o.
Code:
[josephw@myhost ~]$ strace -f -o strace.out ls /
bin dev home lib media mnt proc sbin space sys usr
boot etc initrd lost+found misc opt root selinux srv tmp var
[josephw@myhost ~]$ less strace.out
6368 execve("/bin/ls", ["ls", "/"], [/* 17 vars */]) = 0
6368 uname({sys="Linux", node="dss03.spry.com", ...}) = 0
6368 brk(0) = 0x8cc5000
6368 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
6368 open("/etc/ld.so.cache", O_RDONLY) = 3
6368 fstat64(3, {st_mode=S_IFREG|0644, st_size=36486, ...}) = 0
6368 old_mmap(NULL, 36486, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f8f000
6368 close(3) = 0
6368 open("/lib/tls/librt.so.1", O_RDONLY) = 3
....output trimmed....
The first column shows the process id, and then you see the syscall and its result. Here it opens /etc/ld.so.cache in readonly mode, and the result is file descriptor 3. After this it calls fstat64 and old_mmap on file descriptor 3. Then it closes file descriptor 3. Once it has been closed, it can be reused as you see in the next call where it opens /lib/tls/librt.so.1.
In cases where you want to trace a process that is already running, pass the -p option to strace. I will run strace on Apache.
Code:
[josephw@myhost ~]$ ps aux |grep http
apache 6457 0.0 0.5 18764 5456 ? S Oct31 0:00 /usr/sbin/httpd
apache 6498 0.0 0.5 18764 5464 ? S Oct31 0:00 /usr/sbin/httpd
apache 6500 0.0 0.5 18768 6000 ? S Oct31 0:00 /usr/sbin/httpd
apache 6505 0.0 0.5 18768 6080 ? S Oct31 0:00 /usr/sbin/httpd
apache 6507 0.0 0.5 18764 5464 ? S Oct31 0:00 /usr/sbin/httpd
Here I have the process IDs for apache running on my box. Which one to trace? Pick one, then keep reloading your web site until the process you are tracing is the one that serves up your page. If the process does not belong to you, you will probably need to be root in order to trace it.
Code:
[root@myhost ~]# strace -f -o strace.out -p 6457 &
[root@myhost ~]# tail -f strace.out
....output trimmed....
21361 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT
(No such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/sse2", 0xbfeca780) = -1 ENOENT (No su
ch file or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/libperl.so", O_RDONLY) = -1 ENOENT (No
such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686", 0xbfeca780) = -1 ENOENT (No such fi
le or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No
such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/sse2", 0xbfeca780) = -1 ENOENT (No such fi
le or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/libperl.so", O_RDONLY) = -1 ENOENT (No such
file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls", 0xbfeca780) = -1 ENOENT (No such file or
directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No
such file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/sse2", 0xbfeca780) = -1 ENOENT (No such f
ile or directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such
file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686", 0xbfeca780) = -1 ENOENT (No such file o
r directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such
file or directory)
21361 stat64("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/sse2", 0xbfeca780) = -1 ENOENT (No such file o
r directory)
21361 open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so", O_RDONLY) = 3
....output trimmed....
I ran strace in the background by using & so I could view the output of strace.out with tail -f. This way I could know when my process was the one that was doing the work. Then I reloaded my site about 5 times until it cycled around to process 6457.
In the beginning of the output shown, it may look like there is an error, because it keeps showing "No such file or directory" while trying to load libperl.so. But you can see that it is looking for the library in multiple paths and eventually it does find it in the directory /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/. In this case, the page was loaded without any errors. I will do it again, but this time I will try to load a page that does not exist, in this case
http://10.2.4.15/test.
Code:
....output trimmed....
6457 lstat64("/usr/local/www/test", 0xbffb3104) = -1 ENOENT (No such file or directory)
....output trimmed....
6457 read(18, 0xbffb3270, 512) = -1 ECONNRESET (Connection reset by peer)
6457 close(18) = 0
6457 read(5, 0xbffb347f, 1) = -1 EAGAIN (Resource temporarily unavailable)
6457 semop(7307267, 0x1c9740, 1 <unfinished ...>
It tried to find /usr/local/www/test but could not so it failed in the end.