profile-manual: Second half of perf raw text entered.

Finished entering the rest of the "perf" section.

(From yocto-docs rev: 3df34a299ef9656362bf5b851b575528c646c02b)

Signed-off-by: Scott Rifenbark <scott.m.rifenbark@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
This commit is contained in:
Scott Rifenbark 2013-01-16 10:58:22 -08:00 committed by Richard Purdie
parent 89dbdec992
commit 7f64ad2423
1 changed files with 353 additions and 0 deletions

View File

@ -981,10 +981,363 @@
</para>
</section>
<section id='system-wide-tracing-and-profiling'>
<title>System-Wide Tracing and Profiling</title>
<para>
The examples so far have focused on tracing a particular program or
workload - in other words, every profiling run has specified the
program to profile in the command-line e.g. 'perf record wget ...'.
</para>
<para>
It's also possible, and more interesting in many cases, to run a
system-wide profile or trace while running the workload in a
separate shell.
</para>
<para>
To do system-wide profiling or tracing, you typically use
the -a flag to 'perf record'.
</para>
<para>
To demonstrate this, open up one window and start the profile
using the -a flag (press Ctrl-C to stop tracing):
<literallayout class='monospaced'>
root@crownbay:~# perf record -g -a
^C[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ]
</literallayout>
In another window, run the wget test:
<literallayout class='monospaced'>
root@crownbay:~# wget <ulink url='http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2'>http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2</ulink>
Connecting to downloads.yoctoproject.org (140.211.169.59:80)
linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA
</literallayout>
Here we see entries not only for our wget load, but for other
processes running on the system as well:
</para>
<para>
<imagedata fileref="figures/perf-systemwide.png" width="6in" depth="7in" align="center" scalefit="1" />
</para>
<para>
In the snapshot above, we can see callchains that originate in
libc, and a callchain from Xorg that demonstrates that we're
using a proprietary X driver in userspace (notice the presence
of 'PVR' and some other unresolvable symbols in the expanded
Xorg callchain).
</para>
<para>
Note also that we have both kernel and userspace entries in the
above snapshot. We can also tell perf to focus on userspace but
providing a modifier, in this case 'u', to the 'cycles' hardware
counter when we record a profile:
<literallayout class='monospaced'>
root@crownbay:~# perf record -g -a -e cycles:u
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ]
</literallayout>
</para>
<para>
<imagedata fileref="figures/perf-report-cycles-u.png" width="6in" depth="7in" align="center" scalefit="1" />
</para>
<para>
Notice in the screenshot above, we see only userspace entries ([.])
</para>
<para>
Finally, we can press 'enter' on a leaf node and select the 'Zoom
into DSO' menu item to show only entries associated with a
specific DSO. In the screenshot below, we've zoomed into the
'libc' DSO which shows all the entries associated with the
libc-xxx.so DSO.
</para>
<para>
<imagedata fileref="figures/perf-systemwide-libc.png" width="6in" depth="7in" align="center" scalefit="1" />
</para>
<para>
We can also use the system-wide -a switch to do system-wide
tracing. Here we'll trace a couple of scheduler events:
<literallayout class='monospaced'>
root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup
^C[ perf record: Woken up 38 times to write data ]
[ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ]
</literallayout>
We can look at the raw output using 'perf script' with no
arguments:
<literallayout class='monospaced'>
root@crownbay:~# perf script
perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120
swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000
swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120
perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
</literallayout>
</para>
<section id='perf-filtering'>
<title>Filtering</title>
<para>
Notice that there are a lot of events that don't really have
anything to do with what we're interested in, namely events
that schedule 'perf' itself in and out or that wake perf up.
We can get rid of those by using the '--filter' option -
for each event we specify using -e, we can add a --filter
after that to filter out trace events that contain fields
with specific values:
<literallayout class='monospaced'>
root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf &amp;&amp; prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf'
^C[ perf record: Woken up 38 times to write data ]
[ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ]
root@crownbay:~# perf script
swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000
swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
</literallayout>
In this case, we've filtered out all events that have 'perf'
in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice
that there are still events recorded for perf, but notice
that those events don't have values of 'perf' for the filtered
fields. To completely filter out anything from perf will
require a bit more work, but for the purpose of demonstrating
how to use filters, it's close enough.
</para>
<note>
Tying It Together: These are exactly the same set of event
filters defined by the trace event subsystem. See the
ftrace/tracecmd/kernelshark section for more discussion about
these event filters.
</note>
<note>
Tying It Together: These event filters are implemented by a
special-purpose pseudo-interpreter in the kernel and are an
integral and indispensable part of the perf design as it
relates to tracing. kernel-based event filters provide a
mechanism to precisely throttle the event stream that appears
in user space, where it makes sense to provide bindings to real
programming languages for postprocessing the event stream.
This architecture allows for the intelligent and flexible
partitioning of processing between the kernel and user space.
Contrast this with other tools such as SystemTap, which does
all of its processing in the kernel and as such requires a
special project-defined language in order to accommodate that
design, or LTTng, where everything is sent to userspace and
as such requires a super-efficient kernel-to-userspace
transport mechanism in order to function properly. While
perf certainly can benefit from for instance advances in
the design of the transport, it doesn't fundamentally depend
on them. Basically, if you find that your perf tracing
application is causing buffer I/O overruns, it probably
means that you aren't taking enough advantage of the
kernel filtering engine.
</note>
</section>
</section>
<section id='using-dynamic-tracepoints'>
<title>Using Dynamic Tracepoints</title>
<para>
perf isn't restricted to the fixed set of static tracepoints
listed by 'perf list'. Users can also add their own 'dynamic'
tracepoints anywhere in the kernel. For instance, suppose we
want to define our own tracepoint on do_fork(). We can do that
using the 'perf probe' perf subcommand:
<literallayout class='monospaced'>
root@crownbay:~# perf probe do_fork
Added new event:
probe:do_fork (on do_fork)
You can now use it in all perf tools, such as:
perf record -e probe:do_fork -aR sleep 1
</literallayout>
Adding a new tracepoint via 'perf probe' results in an event
with all the expected files and format in
/sys/kernel/debug/tracing/events, just the same as for static
tracepoints (as discussed in more detail in the trace events
subsystem section:
<literallayout class='monospaced'>
root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al
drwxr-xr-x 2 root root 0 Oct 28 11:42 .
drwxr-xr-x 3 root root 0 Oct 28 11:42 ..
-rw-r--r-- 1 root root 0 Oct 28 11:42 enable
-rw-r--r-- 1 root root 0 Oct 28 11:42 filter
-r--r--r-- 1 root root 0 Oct 28 11:42 format
-r--r--r-- 1 root root 0 Oct 28 11:42 id
root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format
name: do_fork
ID: 944
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int common_padding; offset:8; size:4; signed:1;
field:unsigned long __probe_ip; offset:12; size:4; signed:0;
print fmt: "(%lx)", REC->__probe_ip
</literallayout>
We can list all dynamic tracepoints currently in existence:
<literallayout class='monospaced'>
root@crownbay:~# perf probe -l
probe:do_fork (on do_fork)
probe:schedule (on schedule)
</literallayout>
Let's record system-wide ('sleep 30' is a trick for recording
system-wide but basically do nothing and then wake up after
30 seconds):
<literallayout class='monospaced'>
root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]
</literallayout>
Using 'perf script' we can see each do_fork event that fired:
<literallayout class='monospaced'>
root@crownbay:~# perf script
# ========
# captured on: Sun Oct 28 11:55:18 2012
# hostname : crownbay
# os release : 3.4.11-yocto-standard
# perf version : 3.4.11
# arch : i686
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz
# cpuid : GenuineIntel,6,38,1
# total memory : 1017184 kB
# cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30
# event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern
= 0, id = { 5, 6 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# ========
#
matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460)
matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460)
pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460)
pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460)
matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460)
matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460)
gthumb 1300 [001] 34217.697451: do_fork: (c1028460)
gthumb 1300 [001] 34219.085734: do_fork: (c1028460)
gthumb 1300 [000] 34219.121351: do_fork: (c1028460)
gthumb 1300 [001] 34219.264551: do_fork: (c1028460)
pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460)
matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460)
matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460)
matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460)
matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460)
matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460)
matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460)
matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460)
gaku 1312 [000] 34237.202388: do_fork: (c1028460)
</literallayout>
And using 'perf report' on the same file, we can see the
callgraphs from starting a few programs during those 30 seconds:
</para>
<para>
<imagedata fileref="figures/perf-probe-do_fork-profile.png" width="6in" depth="7in" align="center" scalefit="1" />
</para>
<note>
Tying It Together: The trace events subsystem accomodate static
and dynamic tracepoints in exactly the same way - there's no
difference as far as the infrastructure is concerned. See the
ftrace section for more details on the trace event subsystem.
</note>
<note>
Tying It Together: Dynamic tracepoints are implemented under the
covers by kprobes and uprobes. kprobes and uprobes are also used
by and in fact are the main focus of SystemTap.
</note>
</section>
<section id='perf-documentation'>
<title>Documentation</title>
<para>
Online versions of the man pages for the commands discussed in this
section can be found here:
<itemizedlist>
<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-stat'>'perf stat' manpage</ulink>.
</para></listitem>
<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-record'>'perf record' manpage</ulink>.
</para></listitem>
<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-report'>'perf report' manpage</ulink>.
</para></listitem>
<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-probe'>'perf probe' manpage</ulink>.
</para></listitem>
<listitem><para>The <ulink url='http://linux.die.net/man/1/perf-script'>'perf script' manpage</ulink>.
</para></listitem>
<listitem><para>Documentation on using the
<ulink url='http://linux.die.net/man/1/perf-script-python'>'perf script' python binding</ulink>.
</para></listitem>
<listitem><para>The top-level
<ulink url='http://linux.die.net/man/1/perf'>perf(1) manpage</ulink>.
</para></listitem>
</itemizedlist>
</para>
<para>
Normally, you should be able to invoke the man pages via perf
itself e.g. 'perf help' or 'perf help record'.
</para>
<para>
However, by default Yocto doesn't install man pages, but perf
invokes the man pages for most help functionality. This is a bug
and is being addressed by a Yocto bug:
<ulink url='https://bugzilla.yoctoproject.org/show_bug.cgi?id=3388'>Bug 3388 - perf: enable man pages for basic 'help' functionality</ulink>.
</para>
<para>
The man pages in text form, along with some other files, such as
a set of examples, can be found in the 'perf' directory of the
kernel tree:
<literallayout class='monospaced'>
tools/perf/Documentation
</literallayout>
There's also a nice perf tutorial on the perf wiki that goes
into more detail than we do here in certain areas:
<ulink url='https://perf.wiki.kernel.org/index.php/Tutorial'>Perf Tutorial</ulink>
</para>
</section>
</section>
</chapter>
<!--