www.LinuxHowtos.org howtos, tips&tricks and tutorials for linux

from small one page howto to huge articles all in one place

Other .linuxhowtos.org sites:gentoo.linuxhowtos.org

Last additions:

using iotop to find disk usage hogs

words:

887

views:

212399

userrating:

May 25th. 2007:

Words

486

Views

260745

why adblockers are bad

Workaround and fixes for the current Core Dump Handling vulnerability affected kernels

words:

161

views:

152326

userrating:

April, 26th. 2006:

Words

Views

109249

New subdomain: toolsntoys.linuxhowtos.org

You are here: manpages

STRACE

Section: User Commands (1)
Updated: 202-0-05
Index Return to Main Contents

NAME

strace - trace system calls and signals

SYNOPSIS

[ -ACdffhiqqrtttTvVwxxyyYzZ ] [ -a column ] [ -b execve ] [-e expr]... [ -I n ] [ -o file ] [ -O overhead ] [-p pid]... [-P path]... [ -s strsize ] [ -S sortby ] [ -U columns ] [ -X format ] [ --seccomp-bpf ] [ --syscall-limit=limit ] [ --tips[=format] ] { -p pid | [ -DDD ] [-E var[=val]]... [ -u username ] command [args] } -c [ -dfwzZ ] [ -b execve ] [-e expr]... [ -I n ] [ -O overhead ] [-p pid]... [-P path]... [ -S sortby ] [ -U columns ] [ --seccomp-bpf ] [ --syscall-limit=limit ] [ --tips[=format] ] { -p pid | [ -DDD ] [-E var[=val]]... [ -u username ] command [args] } --tips[=format]

DESCRIPTION

In its simplest use case, strace runs the specified command until it exits. It intercepts and records the system calls made by a process and the signals a process receives. The name of each system call, its arguments, and its return value are printed to standard error or to the file specified with the -o option.

strace is a useful diagnostic, instructional, and debugging tool. System administrators, diagnosticians, and troubleshooters will find it invaluable for solving problems with programs for which source code is not readily available, as recompilation is not required for tracing. Students, hackers, and the overl-curious will discover that a great deal can be learned about a system and its system calls by tracing even ordinary programs. Programmers will find that since system calls and signals occur at the user/kernel interface, a close examination of this boundary is very useful for bug isolation, sanity checking, and attempting to capture race conditions.

Each line in the trace contains the system call name, followed by its arguments in parentheses and its return value. An example from tracing the command "cat /dev/null" is:

open("/dev/null", O_RDONLY) = 3

Errors, typically indicated by a return value of -1, have the errno symbol and error string appended.

open("/foo/bar", O_RDONLY) = -1 ENOENT (No such file or directory)

Signals are printed as a signal symbol and a decoded siginfo structure. An excerpt from tracing and interrupting the command "sleep 666" is:

sigsuspend([] <unfinished ...>- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=...}-
+++ killed by SIGINT +++

If a system call is being executed while another is called from a different thread or process, strace will attempt to preserve the order of these events and mark the ongoing call as unfinished. When the call returns, it will be marked as resumed.

[pid 28772] select(4, [3], NULL, NULL, NULL <unfinished ...>
[pid 28779] clock_gettime(CLOCK_REALTIME, {tv_sec=1130322148, tv_nsec=3977000}) = 0
[pid 28772] <... select resumed> )      = 1 (in [3])

The interruption of a (restartable) system call by a signal delivery is handled differently, as the kernel terminates the system call and arranges for its immediate r-execution after the signal handler completes.

read(0, 0x7ffff72cf5cf, 1)              = ? ERESTARTSYS (To be restarted)- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL}-
rt_sigreturn({mask=[]})                 = 0
read(0, "", 1)                          = 0

Arguments are printed in symbolic form with passion. This example shows the shell performing ">>xyzzy" output redirection:

open("xyzzy", O_WRONLY|O_APPEND|O_CREAT, 0666) = 3

Here, the second and third arguments of open(2) are decoded by breaking down the flag argument into its three bitwis-OR constituents and printing the mode value in octal, following tradition. Where traditional or native usage differs from ANSI or POSIX, the latter forms are preferred. In some cases, strace output has proven to be more readable than the source code itself.

Structure pointers are dereferenced, and their members are displayed as appropriate. In most cases, arguments are formatted in the most -like fashion possible. For example, the essence of the command "ls -l /dev/null" is captured as:

lstat("/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}) = 0

Notice how the struct stat argument is dereferenced and how each member is displayed symbolically. In particular, observe how the st_mode member is carefully decoded into a bitwis-OR of symbolic and numeric values. Also, note that in this example, the first argument to lstat(2) is an input to the system call, and the second argument is an output. Since output arguments are not modified if the system call fails, arguments may not always be dereferenced. For example, retrying the "ls -l" example with a no-existent file produces the following line:

lstat("/foo/bar", 0xb004) = -1 ENOENT (No such file or directory)

In this case, the porch light is on but nobody is home. The pointer's value is displayed because the structure it points to was not populated due to the error.

System calls unknown to strace are printed in a raw format, with the hexadecimal system call number prefixed with "syscall_":

syscall_0xbad(0x1, 0x2, 0x3, 0x4, 0x5, 0x6) =-1 ENOSYS (Function not implemented)

Character pointers are dereferenced and printed as C strings. No-printing characters in strings are normally represented by standard C escape codes. Only the first strsize (32 by default) bytes of strings are printed; longer strings have an ellipsis appended following the closing quote. Here is a line from "ls -l" where the getpwuid(3) library routine is reading the password file:

read(3, "root::0:0:System Administrator:/"..., 1024) = 422

While structures are annotated using curly braces, pointers to basic types and arrays are printed using square brackets with commas separating the elements. Here is an example from the command id(1) on a system with supplementary group IDs:

getgroups(32, [100, 0]) = 2

On the other hand, bi-sets are also shown using square brackets, but set elements are separated only by a space. Here is the shell, preparing to execute an external command:

sigprocmask(SIG_BLOCK, [CHLD TTOU], []) = 0

Here, the second argument is a bi-set of two signals, SIGCHLD and SIGTTOU. In some cases, the bi-set is so full that it is more valuable to print the unset elements. In that case, the bi-set is prefixed by a tilde, like this:

sigprocmask(SIG_UNBLOCK, ~[], NULL) = 0

Here, the second argument represents the full set of all signals.

OPTIONS

General

-e expr

Modifies which events to trace or how to trace them by specifying a qualifying expression. The format of the expression is:

: [,qualifier/=][!],value/[,,value/]...

where qualifier is one of trace (or t), trace-fds (or trace-fd or fd or fds), abbrev (or a), verbose (or v), raw (or x), signal (or signals or s), read (or reads or r), write (or writes or w), fault, inject, status, quiet (or silent or silence or q), decode-fds (or decode-fd), decode-pids (or decode-pid), or kvm, and value is a qualifie-dependent symbol or number. The default qualifier is trace. Using an exclamation mark negates the set of values. For example, -e open is equivalent to -e trace=open, which in turn means trace only the open system call. By contrast, -e trace=!open means to trace every system call except open. In addition, the special values all and none may be used to trace every event or no events, respectively.

Note that some shells use the exclamation mark for history expansion even inside quoted arguments. In that case, the exclamation mark must be escaped with a backslash.

Startup

-E var=,val: --env=var=val Runs the command with the environment variable var=val set for execution.
-E var: --env=var Removes var from the inherited environment variables before executing the command.
-p pid: --attach=pid Attaches to the process with the process ID pid and begin tracing. The trace may be terminated at any time by a keyboard interrupt signal (CTRL-C). strace will respond by detaching itself from the traced processes, leaving them to continue running.
: Multiple -p options can be used to attach to several processes in addition to the command, which is optional if at least one -p option is given.
: A single -p option can accept multiple process IDs separated by a comma (lq,rq), space (lq rq), tab, or newline. Consequently, syntaxes like -p "$(pidof PROG)" and -p "$(pgrep PROG)" are supported.
-u username: --user=username Runs command with the user ID, group ID, and supplementary groups of username. This option is only useful when running as root, as it enables the correct execution of setuid and/or setgid binaries. Unless this option is used, setuid and setgid programs are executed without their effective privileges. -u UID:GID --user=UID:GID Alternative syntax where the program is started with exactly the given user and group IDs, and an empty list of supplementary groups. In this case, user and group name lookups are not performed.
--argv0=name: Sets the executed command's argv[0] to name. This is useful for tracing mult-call executables that interpret argv[0], such as busybox or kmod.

Tracing

-b syscall

--detach-on=syscall Detaches from the traced process if the specified system call is reached. Currently, only execve keyword is supported, which includes execve(2) and execveat(2) system calls. This option is useful for tracing a mult-threaded process with -f without also tracing its (potentially very complex) child processes.

-D

--daemonize --daemonize=grandchild Runs the tracer process as a grandchild of the tracee, not as its parent. This reduces the visible effect of strace by keeping the tracee a direct child of the calling process.

-DD

--daemonize=pgroup --daemonize=pgrp Runs tracer process as tracee's grandchild in a separate process group. In addition to reducing the visible effect of strace, this also prevents strace from being terminated by a kill(2) signal sent to the entire process group.

-DDD

--daemonize=session Runs the tracer process as the tracee's grandchild in a separate session (known as "true daemonisation"). In addition to reduction of the visible effect of strace, this also prevents strace from being terminated upon session termination.

-f

--follow-forks Traces child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system calls. Note that if process PID is mult-threaded, using -f -p PID attaches to all of its threads, not just the one with thread_id = PID.

--output-separately

If the --output=filename option is in effect, the trace for each process is written to a separate filename.pid file, where pid is the process ID.

-ff

--follow-forks --output-separately Combines the effects of --follow-forks and --output-separately options. This is incompatible with -c, since no pe-process counts are kept.

Use strace-log-merge(1) to get a combined view of the log files.

-I interruptible

--interruptible=interruptible Controls when strace can be interrupted by signals (such as pressing CTRL-C).

1, anywhere: no signals are blocked; 2, waiting fatal signals are blocked while decoding system call (default); 3, never fatal signals are always blocked (default if -o FILE PROG); 4, never_tstp fatal signals and SIGTSTP (CTRL-Z) are always blocked (useful to make strace -o FILE PROG not stop on CTRL-Z, default if -D).

--syscall-limit=limit

Detaches all tracees after limit system calls have been captured. System calls filtered out via --trace, --trace-path or --status options are not considered when keeping track of the number of system calls that are captured.

--kill-on-exit

Applies the PTRACE_O_EXITKILL ptrace option to all tracees, which sends a SIGKILL signal to a tracee if the tracer exits. This prevents tracees from being left running after the tracer exits, as they will not be detached on cleanup. --kill-on-exit is not compatible with -p/--attach options.

Filtering

-e trace=,syscall_set

-e t=,syscall_set --trace=,syscall_set Traces only the specified set of system calls. syscall_set is defined as [!],value[,,value/], and value can be one of the following:

syscall: Traces specific system call, specified by its name (see syscalls(2) for a reference, but also see NOTES).
?value: A question mark preceding the qualification suppresses errors if no matching system calls are found.
value@64: Limits the system call specification described by value to the 6-bit personality.
value@32: Limits the system call specification described by value to the 3-bit personality.
value@x32: Limits the system call specification described by value to the x32 personality.
all: Traces all system calls.
/regex: Traces only those system calls that match the regex. You can use POSIX Extended Regular Expression syntax (see regex(7)).
%file: file Traces all system calls that take a file name as an argument. You can think of this as an abbreviation for --trace=open,stat,chmod,unlink,... which is useful to seeing what files the process is referencing. Furthermore, using the abbreviation will ensure that you don't accidentally forget to include a call like newfstatat(2) in the list. The syntax without a preceding percent sign ([dq]--trace=file[dq]) is deprecated.
%process: process Traces system calls associated with process lifecycle (creation, exec, termination). The syntax without a preceding percent sign ([dq]--trace=process[dq]) is deprecated.
%net: %network network Traces all the network related system calls. The syntax without a preceding percent sign ([dq]--trace=network[dq]) is deprecated.
%signal: signal Traces all signal related system calls. The syntax without a preceding percent sign ([dq]--trace=signal[dq]) is deprecated.
%ipc: ipc Traces all IPC related system calls. The syntax without a preceding percent sign ([dq]--trace=ipc[dq]) is deprecated.
%desc: desc Traces all file descriptor related system calls. The syntax without a preceding percent sign ([dq]--trace=desc[dq]) is deprecated.
%memory: memory Traces all memory mapping related system calls. The syntax without a preceding percent sign ([dq]--trace=memory[dq]) is deprecated.
%creds: Traces system calls that read or modify user and group identifiers or capability sets.
%stat: Traces stat system call variants.
%lstat: Traces lstat system call variants.
%fstat: Traces fstat, fstatat, and statx system call variants.
%%stat: Traces system calls used for requesting file status (stat, lstat, fstat, fstatat, statx, and their variants).
%statfs: Traces statfs, statfs64, statvfs, osf_statfs, and osf_statfs64 system calls. The same effect can be achieved with --trace=/^(.*_)?statv?fs regular expression.
%fstatfs: Traces fstatfs, fstatfs64, fstatvfs, osf_fstatfs, and osf_fstatfs64 system calls. The same effect can be achieved with --trace=/fstatv?fs regular expression.
%%statfs: Traces system calls related to file system statistics (statf-like, fstatf-like, and ustat). The same effect can be achieved with --trace=/statv?fs|fsstat|ustat regular expression.
%clock: Traces system calls that read or modify system clocks.
%pure: Traces system calls that always succeed and have no arguments. Currently, this list includes arc_gettls(2), getdtablesize(2), getegid(2), getegid32(2), geteuid(2), geteuid32(2), getgid(2), getgid32(2), getpagesize(2), getpgrp(2), getpid(2), getppid(2), get_thread_area(2) (on architectures other than x86), gettid(2), get_tls(2), getuid(2), getuid32(2), getxgid(2), getxpid(2), getxuid(2), kern_features(2), and metag_get_tls(2) system calls.

The -c option is useful for determining which system calls might be useful to trace. For example, --trace=open,close,read,write means to only trace those four system calls. Be careful when making inferences about the user/kernel boundary if only a subset of system calls are being monitored. The default is --trace=all.

-e trace-fd=,set

-e trace-fds=,set -e fd=,set -e fds=,set --trace-fds=,set Traces only the system calls that operate on the specified subset of (no-negative) file descriptors. Note that usage of this option also filters out all the system calls that do not operate on file descriptors at all.

This filter is combined with the --trace-path filter; a system call is traced if it matches either of them.

-e signal=,set

-e signals=,set -e s=,set --signal=,set Traces only the specified subset of signals. The default is --signal=all. For example, --signal=!SIGIO (or --signal=!io) causes SIGIO signals not to be traced.

-e status=,set

--status=,set Prints only system calls with the specified return status. The default is --status=all. When using the status qualifier, the chronological order of events may not be preserved. This is because strace must wait for a system call to complete before deciding whether to print it. If two system calls are executed by concurrent threads, strace will first print both the entry and exit of the first system call to exit, regardless of their respective entry time. The entry and exit of the second system call to exit will be printed afterwards. Here is an example when select(2) is called, but a different thread calls clock_gettime(2) before select(2) finishes:

[pid 28779] 1130322148.939977 clock_gettime(CLOCK_REALTIME, {1130322148, 939977000}) = 0
[pid 28772] 1130322148.438139 select(4, [3], NULL, NULL, NULL) = 1 (in [3])

set can include the following elements:

successful: Traces system calls that returned without an error code. The -z option has the effect of --status=successful. failed Traces system calls that returned with an error code. The -Z option has the effect of --status=failed. unfinished Traces system calls that did not return. This might happen, for example, due to an execve call in a different thread from the same thread group. unavailable Traces system calls that returned but strace failed to fetch the error status. detached Traces system calls for which strace detached before the return.

-P path

--trace-path=path Traces only system calls accessing path. Multiple -P options can be used to specify several paths. This filter is combined with the --trace-fds filter; a system call is traced if it matches either option.

-z

--successful-only Prints only system calls that returned without an error code.

-Z

--failed-only Prints only system calls that returned with an error code.

Output format

-a column

--columns=column Aligns return values in a specific column (default column 40).

-e abbrev=,syscall_set

-e a=,syscall_set --abbrev=,syscall_set Abbreviates the output from printing each member of large structures. The syntax of the syscall_set specification is the same as in the --trace option. The default is --abbrev=all. The -v option has the effect of --abbrev=none.

-e verbose=,syscall_set

-e v=,syscall_set --verbose=,syscall_set Dereferences structures for the specified set of system calls. The syntax of the syscall_set specification is the same as in the --trace option. The default is --verbose=all.

-e raw=,syscall_set

-e x=,syscall_set --raw=,syscall_set Prints raw, undecoded arguments for the specified set of system calls. The syntax of the syscall_set specification is the same as in the --trace option. This option has the effect of causing all arguments to be printed in hexadecimal. This option is useful if the decoding is not trusted, or if the actual numeric value of an argument is needed. See also -X raw option.

-e read=,set

-e reads=,set -e r=,set --read=,set Performs a full hexadecimal and ASCII dump of all the data read from file descriptors listed in the specified set. For example, to see all input activity on file descriptors 3 and 5 use --read=,3,5. Note that this is independent from the normal tracing of the read(2) system call that is controlled by the option --trace=read.

-e write=,set

-e writes=,set -e w=,set --write=,set Performs a full hexadecimal and ASCII dump of all the data written to file descriptors listed in the specified set. For example, to see all output activity on file descriptors 3 and 5 use --write=,3,,5. Note that this is independent from the normal tracing of the write(2) system call that is controlled by the option --trace=write.

-e quiet=,set

-e silent=,set -e silence=,set -e q=,set --quiet=,set --silent=,set --silence=,set Suppresses various information messages. The default is --quiet=none. set can include the following elements:

attach: Suppresses messages about attaching and detaching ([dq][ Process NNNN attached ][dq], [dq][ Process NNNN detached ][dq]). exit Suppress messages about process exits ([dq]+++ exited with SSS +++[dq]). pat-resolution Suppress messages about resolution of paths provided via the -P option ([dq]Requested path [dq]...[dq] resolved into [dq]...[dq][dq]). personality Suppress messages about process personality changes ([dq][ Process PID=NNNN runs in PPP mode. ][dq]). threa-execve superseded Suppress messages about process being superseded by execve(2) in another thread ([dq]+++ superseded by execve in pid NNNN +++[dq]).

-e decode-fds=,set

--decode-fds=,set Decodes various information associated with file descriptors. The default is --decode-fds=none. set can include the following elements:

path: Prints file paths. Also enables printing of tracee's current working directory when AT_FDCWD constant is used. socket Prints socket protoco-specific information. dev Prints character/block device numbers. eventfd Prints eventfd object details associated with eventfd file descriptors. pidfd Prints PIDs associated with pidfd file descriptors. signalfd Prints signal masks associated with signalfd file descriptors.

-e decode-pids=,set

--decode-pids=,set Decodes various information associated with process IDs (and also thread IDs, process group IDs, and session IDs). The default is --decode-pids=none. set can include the following elements:

comm: Prints command names associated with thread or process IDs. pidns Prints thread, process, process group, and session IDs in strace's PID namespace if the tracee is in a different PID namespace.

-e kvm=vcpu

--kvm=vcpu Prints the exit reason of kvm vcpu. Requires Linux kernel version 4.16.0 or higher.

-e namespace=new

--namespace=new Prints the new namespaces entered by the tracee. The following system calls are supported: clone(2), clone3(2), setns(2), and unshare(2).

-i

--instruction-pointer Prints the instruction pointer at the time of the system call.

-n

--syscall-number Prints the system call number.

-N

--arg-names Prints the system call argument names.

-o filename

--output=filename Writes the trace output to the file filename rather than to stderr. filename.pid form is used if -ff option is supplied. If the argument begins with '|' or '!', the rest of the argument is treated as a command and all output is piped to it. This is convenient for piping the debugging output to a program without affecting the redirections of executed programs. Piping output to a command is not currently compatible with the -ff option.

-A

--output-append-mode Opens the file provided in the -o option in append mode.

-q

--quiet --quiet=attach,personality Suppresses messages about attaching, detaching, and personality changes. This happens automatically when output is redirected to a file and the command is run directly instead of attaching.

-qq

--quiet=attach,personality,exit Suppresses messages about attaching, detaching, personality changes, and process exit status.

-qqq

--quiet=all Suppresses all suppressible messages (please refer to the --quiet option description for the full list of suppressible messages).

-r

--relative-timestamps[=precision] Prints a relative timestamp upon entry to each system call. This records the time difference between the beginning of successive system calls. precision can be one of s (for seconds), ms (milliseconds), us (microseconds), or ns (nanoseconds), and allows setting the precision of time value being printed. Default is us (microseconds). Note that because the -r option uses the monotonic clock, its measurements may differ from the time differences reported by the -t option, which uses the wall clock.

-s strsize

--string-limit=strsize Specifies the maximum string size to print (the default is 32). Note that filenames are not considered strings and are always printed in full.

--absolute-timestamps[=[[format:]format],[[precision:]precision]]

--timestamps[=[[format:]format],[[precision:]precision]] Prefixes each line of the trace with the wall clock time in the specified format with the specified precision. format can be one of the following:

none: No time stamp is printed. Can be used to override the previous setting. time Wall clock time (strftime(3) format string is %T). unix Number of seconds since the epoch (strftime(3) format string is %s).

precision can be one of s (for seconds), ms (milliseconds), us (microseconds), or ns (nanoseconds). Default arguments for the option are format:time,precision:s.

-t

--absolute-timestamps Prefixes each line of the trace with the wall clock time.

-tt

--absolute-timestamps=precision:us Prints the wall clock time with microsecond precision.

-ttt

--absolute-timestamps=format:unix,precision:us Prints the wall clock time as seconds since the epoch, with microsecond precision.

-T

--syscall-times[=precision] Shows the time spent in system calls. This records the time difference between the beginning and the end of each system call. precision can be one of s (for seconds), ms (milliseconds), us (microseconds), or ns (nanoseconds), and allows setting the precision of time value being printed. Default is us (microseconds).

-v

--no-abbrev Prints unabbreviated versions of environment, stat, termios, etc. calls. These structures are very common, so the default behavior is to display a reasonable subset of their members. Use this option to see all members in full detail.

--strings-in-hex[=option]

Controls the use of hexadecimal escape sequences when printing strings. This option alters the default escaping behavior.

Normally (when neither this option nor -x is used), strace introduces escape sequences in two situations: to represent no-printable and no-ASCII characters (i.e., those with character codes less than 32 or greater than 127), or to disambiguate output, for example, by escaping the quotation marks that enclose a string or the angle brackets used in file descriptor paths. When a character must be escaped, strace prioritizes symbolic -standard sequences if one exists: lqtrq (tab), lqnrq (newline), lqvrq (vertical tab), lqfrq (form feed), and lqrrq (carriage return). For all other characters that require escaping, strace defaults to using an octal representation of the character's byte value. This option allows you to override this default behavior and use hexadecimal escapes instead of octal ones.

option can be one of the following:

none: Hexadecimal numbers are not used in the output at all. When there is a need to emit an escape sequence, octal numbers are used. non-ascii-chars Hexadecimal numbers are used instead of octal in the escape sequences. non-ascii Strings that contain no-ASCII characters are printed using escape sequences with hexadecimal numbers. all All strings are printed using escape sequences with hexadecimal numbers.

When the option is supplied without an argument, all is assumed.

-x

--strings-in-hex=non-ascii Prints all no-ASCII strings in hexadecimal string format.

-xx

--strings-in-hex[=all] Prints all strings in hexadecimal string format.

-X format

--const-print-style=format Sets the format for printing of named constants and flags. Supported format values are:

raw: Raw number output, without decoding. abbrev Outputs a named constant or a set of flags instead of the raw number if they are found. This is the default strace behaviour. verbose Outputs both the raw value and the decoded string (as a comment).

-y

--decode-fds --decode-fds=path Prints paths associated with file descriptor arguments and with the AT_FDCWD constant.

-yy

--decode-fds=all Prints all available information associated with file descriptors: protoco-specific information associated with socket file descriptors, block/character device number associated with device file descriptors, and PIDs associated with pidfd file descriptors.

--pidns-translation

--decode-pids=pidns If strace and tracee are in different PID namespaces, print PIDs in strace's namespace, too.

-Y

--decode-pids=comm Prints command names for PIDs.

--always-show-pid

Shows PID prefix also for the process started by strace. Implied when -f and -o are both specified.

Statistics

-c

--summary-only Counts time, calls, and errors for each system call and report a summary on program exit, suppressing the regular output. This shows system time (CPU time spent in the kernel), which is independent of wall clock time. If -c is used with -f, only aggregate totals for all traced processes are kept.

-C

--summary Like -c, but also prints the regular output while processes are running.

-O overhead

--summary-syscall-overhead=overhead Sets the overhead for tracing system calls to overhead. This is useful for overriding the default heuristic, which estimates the time spent in the measurement process itself when timing system calls with the -c option. The accuracy of the heuristic can be gauged by timing a given program run without tracing (using time(1)) and comparing the accumulated system call time to the total produced using -c.

The format of overhead specification is described in section Time specification format description.

-S sortby

--summary-sort-by=sortby Sorts the output of the histogram printed by the -c option by the specified criterion. Valid values are time (or time-percent or time-total or total-time), min-time (or shortest or time-min), max-time (or longest or time-max), avg-time (or time-avg), calls (or count), errors (or error), name (or syscall or syscall-name), and nothing (or none); default is time.

-U columns

--summary-columns=columns Configures the set and order of columns shown in the call summary. The columns argument is a comm-separated list containing one or more of the following values:

time-percent (or time): Percentage of cumulative time consumed by a specific system call. total-time (or time-total) Total system (or wall clock, if -w option is provided) time consumed by a specific system call. min-time (or shortest or time-min) Minimum observed call duration. max-time (or longest or time-max) Maximum observed call duration. avg-time (or time-avg) Average call duration. calls (or count) Call count. errors (or error) Error count. name (or syscall or syscall-name) System call name.

The default value is time-percent,total-time,avg-time,calls,errors,name. If the name field is not supplied explicitly, it is added as the last column.

-w

--summary-wall-clock Summarizes the wall clock time for each system call, measured from its beginning to its end. The default is to summarize the system time.

Tampering

--inject=,syscall_set/[:error=,errno/|:retval=,value/]:[:signal=,sig/]:[:syscall=,syscall/]:[:delay_enter=,delay/]:[:delay_exit=,delay/]:[:poke_enter=,@argN=DATAN,@argM=DATAM.../]:[:poke_exit=,@argN=DATAN,@argM=DATAM.../]:[:when=,expr/] Performs system call tampering for the specified set of system calls.

The syntax of the syscall_set specification is the same as in the --trace option.

At least one of error, retval, signal, delay_enter, delay_exit, poke_enter, or poke_exit action options must be specified. error and retval are mutually exclusive.

If the error=,errno/ option is specified, a fault is injected into the system call. This is achieved by replacing the system call number with -1 (representing an invalid system call) and setting the error code to the specified errno. This behavior of replacing the syscall number with -1 can be overridden using the syscall= option. The errno can be a symbolic name like ENOSYS or a numeric value in the range 1..4095.

If the retval=,value/ option is specified, a success value is injected. The system call number is replaced as with the error= option, but instead of an error, the specified success value is returned to the caller process.

If the signal=,sig/ option is specified with either a symbolic value like SIGSEGV or a numeric value within 1..SIGRTMAX range, that signal is delivered on entering every system call specified by the syscall_set.

If the delay_enter=,delay/ or delay_exit=,delay/ options are specified, delay injection is performed: the tracee is delayed by time period specified by delay on entering or exiting the system call, respectively. The format of delay specification is described in section Time specification format description.

If the poke_enter=@argN=DATAN,@argM=DATAM... or poke_exit=@argN=DATAN,@argM=DATAM... options are specified, tracee's memory at locations, pointed to by system call arguments argN and argM (going from arg1 to arg7) is overwritten by data DATAN and DATAM (specified in hexadecimal format; for example poke_enter=@arg1=0000DEAD0000BEEF). The poke_enter option modifies memory on system call enter, while poke_exit does so on system call exit.

The injection actions are independent. For example, specifying only signal= delivers a signal without altering the system call's outcome or delaying it. Similarly, specifying only error= injects a system call fault without adding a signal or delay.

If the signal=,sig/ option is specified together with error=,errno/ or retval=,value/, then both injection of a fault or success and signal delivery are performed.

If the syscall=syscall option is specified, the given syscall is injected instead of the default -1. The specified syscall must have no side effects; currently, only system calls from the %pure set are supported.

Unless the when=,expr subexpression is specified, an injection is being made into every invocation of each system call from the syscall_set.

The format of the subexpression is:

: first/[..,last/][+[,step/]]

Number first stands for the first invocation number in the range, number last stands for the last invocation number in the range, and step stands for the step between two consecutive invocations. The following combinations are useful:

first: Injects into invocation number first only for each system call in the syscall_set. first/..,last Injects into invocations from first through last (inclusive) for each system call in the syscall_set. first/+ Injects into every invocation, starting with number first, for each system call in the syscall_set. first/+,step Injects into invocations number first, first+step, first+step+step, and so on, for each system call in the syscall_set. first/..,last+,step Same as the previous, but consider only invocations with numbers up to last (inclusive).

For example, to fail each third and subsequent chdir system calls with ENOENT, use --inject=,chdir/:error=,ENOENT/:when=,3/+.

The valid range for numbers first and step is 1..65535, and for number last is 1..65534.

An injection expression can contain at most one fault or return value specification (i.e., either error= or retval=) and at most one signal= specification. If an injection expression contains multiple when= specifications, the last one takes precedence.

Accounting of system calls that are subject to injection is done per system call and per tracee.

Specification of system call injection can be combined with other system call filtering options, for example, -P /dev/urandom --inject=,file/:error=,ENOENT.

-e inject=,args/

This is equivalent to --inject=,args/.

--fault=,syscall_set/[:error=,errno/][:when=,expr/]

Performs system call fault injection for the specified set of system calls.

This is a shortcut for the more general --inject= option, using a default errno of ENOSYS.

-e fault=,args/

This is equivalent to --fault=,args/.

Miscellaneous

-d

--debug Shows some debugging output of strace itself on the standard error.

-F

This option is deprecated. It is retained for backward compatibility only and may be removed in future releases. Using multiple -F options is equivalent to a single -f. This option is ignored entirely if used in conjunction with one or more -f options.

-h

--help Prints the help summary.

--seccomp-bpf

Attempts to use seccom-bpf (see seccomp(2)) to cause the kernel to stop the tracee only for the system calls that are being traced.

This option has no effect unless -f/--follow-forks is also specified. --seccomp-bpf is not compatible with --syscall-limit and -b/--detach-on options. It is also not applicable to processes attached using -p/--attach option.

An attempt to enable system calls filtering using seccom-bpf may fail for various reasons, e.g. there are too many system calls to filter, the seccomp API is not available, or strace itself is being traced. If the seccom-bpf filter setup fails, strace proceeds as usual, stopping traced processes on every system call.

When --seccomp-bpf is activated and -p/--attach option is not used, --kill-on-exit option is activated as well.

Note that in cases when the tracee has another seccomp filter that returns an action value with a precedence greater than SECCOMP_RET_TRACE, strace --seccomp-bpf will not be notified. That is, if another seccomp filter, for example, disables the system call or kills the tracee, then strace --seccomp-bpf will not be aware of that system call invocation at all.

--tips[=[[id:]id],[[format:]format]]

Shows strace tips, tricks, and tweaks before exit. The id can be a no-negative integer to print a specific tip (note: these IDs are not guaranteed to be stable). It can also be random (the default), in which case a random tip is printed. format can be one of the following:

none: No tip is printed. Can be used to override the previous setting. compact Prints the tip just big enough to contain all the text. full Prints the tip in its full glory.

Default is id:random,format:compact.

-V

--version Prints the version number of strace and the list of enabled optional features. Multiple instances of this option beyond specific threshold tend to increase der Strauss awareness.

Time specification format description

Time values are specified as a decimal floating point number (in a format accepted by strtod(3)), optionally followed by a suffix to indicate the unit of time: s (seconds), ms (milliseconds), us (microseconds), or ns (nanoseconds). If no suffix is specified, the value defaults to microseconds.

The described format is used for -O, --inject=delay_enter, and --inject=delay_exit options.

DIAGNOSTICS

When command exits, strace exits with the same exit status. If command is terminated by a signal, strace terminates itself with the same signal, so that strace can be used as a wrapper process transparent to the invoking parent process. Note that the paren-child relationship (signal stop notifications, the getppid(2) value, etc) between the traced process and its parent is not preserved unless -D is used.

When using -p without a command, the exit status of strace is zero unless no processes have been attached or an unexpected error occurred during tracing.

SETUID INSTALLATION

If strace is installed setuid to root, then the invoking user will be able to attach to and trace processes owned by any user. In addition, setuid and setgid programs will be executed and traced with the correct effective privileges. Since these capabilities should only be granted to users with full root privileges, installing strace as setuid to root is only appropriate when its use is restricted to such trusted users. For example, a special version of strace could be installed with mode 'rwsr-x---', user root, and group trace. In this configuration, only trusted users who are members of the trace group could execute it. If you use this feature, remember to also install a regular, no-setuid version of strace for ordinary users.

MULTIPLE PERSONALITIES SUPPORT

On some architectures, strace can decode system calls for processes that use a different Application Binary Interface (ABI) from the one strace uses. Specifically, in addition to decoding native ABI, strace can decode the following ABIs on the following architectures:

Architecture	ABIs supported
x86_64	i386, x32 [1]; i386 [2]
AArch64	ARM 3-bit EABI
PowerPC 6-bit [3]	PowerPC 3-bit
s390x	s390
SPARC 6-bit	SPARC 3-bit
TILE 6-bit	TILE 3-bit

[1]: When strace is built as an x86_64 application [2] When strace is built as an x32 application [3] Big endian only

This support is optional and depends on the ability to generate and parse structure definitions at build time. Refer to the output of the strace -V command to determine which ABIs are supported by your strace build. In this context, "no-native" refers to an ABI that differs from the one strace is using:

m32-mpers: strace can trace and properly decode no-native 3-bit binaries. no-m32-mpers strace can trace, but cannot properly decode no-native 3-bit binaries. mx32-mpers strace can trace and properly decode no-native 3-o-6-bit binaries. no-mx32-mpers strace can trace, but cannot properly decode no-native 3-o-6-bit binaries.

If the output contains neither m32-mpers nor no-m32-mpers, it means that support for decoding no-native 3-bit binaries is not applicable to the architecture.

Likewise, if the output contains neither mx32-mpers nor no-mx32-mpers, it means that support for decoding no-native 3-o-6-bit binaries is not applicable to the architecture.

NOTES

Systems that use shared libraries often produce a large amount of tracing output when loading them.

It is instructive to think about system call inputs and outputs as dat-flow across the user/kernel boundary. Because use-space and kerne-space are separate and addres-protected, it is sometimes possible to make deductive inferences about process behavior using inputs and outputs as propositions.

In some cases, a system call will differ from the documented behavior or have a different name. For example, the underlying faccessat(2) system call does not have a flags argument, and the setrlimit(2) library function is implemented using prlimit64(2) system call on modern (2.6.38+) kernels. These discrepancies are normal characteristics of the system call interface and are handled by C library wrapper functions.

Some system calls have different names in different architectures and personalities. In these cases, system call filtering and printing uses the names that match corresponding __NR_* kernel macros of the tracee's architecture and personality. There are two exceptions from this general rule: arm_fadvise64_64(2) ARM system call and xtensa_fadvise64_64(2) Xtensa system call are filtered and printed as fadvise64_64(2).

On the x32 ABI, some system calls are intended for 6-bit processes but can be invoked from x32 by setting the __X32_SYSCALL_BIT flag. When this occurs, strace designates these calls with a #64 suffix. An example is readv(2), which is syscall number 19 on x86_64, whereas its distinct x32 counterpart is syscall number 515.

On some platforms, a process attached with the -p option may receive a spurious EINTR error from a no-restartable system call. This can have an unpredictable effect on the process if it does not attempt to restart the call. Ideally, all system calls should be restarted on strace attach, making the attach invisible to the traced process, but a few system calls aren't. Arguably, every instance of such behavior is a kernel bug.

Since strace executes the specified command directly without a shell, scripts that lack a shebang line (e.g., #!/bin/sh) will fail with an ENOEXEC error, even if a shell could run them correctly. It is advisable to manually supply a shell as a command with the script as its argument.

BUGS

Programs that use the setuid bit do not have effective user ID privileges while being traced.

A traced process runs more slowly than a no-traced one. The performance impact can be mitigated by using the --seccomp-bpf option.

When tracing a command, its descendant processes may be left running after strace is terminated by an interrupt signal (such as CTR-C). This can be prevented by using the --kill-on-exit option, or by using --seccomp-bpf option in a way that implies --kill-on-exit.

A traced process can use the CLONE_UNTRACED flag with the clone system call to create a child process that is not traced by strace. This breaks a guarantee of the --seccomp-bpf option, as this untraced child may be left with an active seccomp filter after strace terminates.

HISTORY

The original strace was written by Paul Kranenburg for SunOS and was inspired by its trace utility. The SunOS version of strace was ported to Linux and enhanced by Branko Lankester, who also wrote the Linux kernel support. Even though Paul released strace 2.5 in 1992, Branko's work was based on Paul's strace 1.5 release from 1991.

In 1993, Rick Sladkey took on the project. He merged strace 2.5 for SunOS with the second release of strace for Linux, added many features from SVR4's truss(1), and produced a version of strace that worked on both platforms. In 1994 Rick ported strace to SVR4 and Solaris and wrote the automatic configuration support. In 1995 he ported strace to Irix (and became tired of writing about himself in the third person).

Beginning with 1996, strace was maintained by Wichert Akkerman. During his tenure, strace development migrated to CVS; ports to FreeBSD and many architectures on Linux (including ARM, I-64, MIPS, P-RISC, PowerPC, s390, SPARC) were introduced.

In 2002, responsibility for strace maintenance was transferred to Roland McGrath. Since then, strace gained support for several new Linux architectures (AMD64, s390x, SuperH), b-architecture support for some of them, and received numerous additions and improvements in system calls decoders on Linux; strace development migrated to Git during that period.

Since 2009, strace has been actively maintained by Dmitry Levin. During this period, strace has gained support for the AArch64, ARC, AVR32, Blackfin, -SKY, LoongArch, Meta, Nios II, OpenRISC 1000, RIS-V, Tile/TileGx, and Xtensa architectures. In 2012, unmaintained and apparently broken support for no-Linux operating systems was removed. Also, in 2012 strace gained support for path tracing and file descriptor path decoding. In 2014, support for stack trace printing was added. In 2016, system call tampering was implemented.

For the additional information, please refer to the NEWS file and strace repository commit log.