Michael Petlan
2017-06-16 11:22:02 UTC
Hello,
Many days ago you had helped me a lot in using Oprofile in Virtual Machine. That time, I also had a query regarding
getting information from 'opreport --symbols --debuginfo' command.
Hello,Many days ago you had helped me a lot in using Oprofile in Virtual Machine. That time, I also had a query regarding
getting information from 'opreport --symbols --debuginfo' command.
yeah, I remember...
Actually, I am profiling a file1.c file with the following commands...
1. gcc -g file1.c -o file1
2. operf ./file1 --events=CPU_CLK_UNHALTED
3. opreport --symbols --debuginfo
samples %       linenr info                image name              symbol name
2      12.5000 rsaimp.c:16                rsaimp                  MM
2       12.5000 rsaimp.c:34                rsaimp                  main
When I execute commands 2 and 3 multiple times, each time I gwt the output for symbole name MM (which is UDF in the
program) but I don't get output for main every time.
This is kind of random, see below.1. gcc -g file1.c -o file1
2. operf ./file1 --events=CPU_CLK_UNHALTED
3. opreport --symbols --debuginfo
samples %       linenr info                image name              symbol name
2      12.5000 rsaimp.c:16                rsaimp                  MM
2       12.5000 rsaimp.c:34                rsaimp                  main
When I execute commands 2 and 3 multiple times, each time I gwt the output for symbole name MM (which is UDF in the
program) but I don't get output for main every time.
Additionally,'' the sample % also vary every time I file operf command. I do not understand the working of operf that
how it gives different outputs for the same program.
How _counting_ works:how it gives different outputs for the same program.
Each time an event happens in CPU (in our case -- CPU_CLK_UNHALTED -- one
CPU tick), its counter is incremented by one. This way, you can get the
whole number of how many times the event happened:
$ ocount ls
Events were actively counted for 1863267 nanoseconds.
Event counts (actual) for /bin/ls:
Event Count % time counted
CPU_CLK_UNHALTED 2,672,309 100.00
This means that there were 2.672 million CPU ticks necessary to execute
`ls`.
The tool for counting is *ocount*.
How _profiling_ works:
For profiling, events are counted the same way as when counting, but only
once per N event occurrences a "sample" is taken. A sample is a piece of
information about where in the code it happened, what was the context and
potentially call-stack, etc. This sample is saved into the oprofile_data
dir structure. When you finish profiling, opreport generates some overview
from the samples recorded.
Of course, taking sample causes some overhead. Thus, you cannot take it
each time the event counter increments. For this purpose, there is "sample
rate" value which divides the event counter and takes a sample only once
per "sample rate" occurrences. Thus, it is pretty random, which event
occurrence is picked for a sample.
Of course, your main() consumes CPU ticks and the counter is incremented,
but sometimes, one of these incrementations produces sample, sometimes not,
thus the function is missing.
You have 2 samples on each of the line in the report. That is really a low
number, so it is very probable that sometimes the samples hit main() and
sometimes not.
$ operf ls
Profiling done.
$ opreport
Using /root/oprofile_data/samples/ for samples directory.
CPU: Core 2, speed 2267 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
CPU_CLK_UNHALT...|
samples| %|
------------------
--> 37 100.000 ls
CPU_CLK_UNHALT...|
samples| %|
------------------
27 72.9730 kallsyms
5 13.5135 libc-2.19.so
2 5.4054 ld-2.19.so
1 2.7027 ls
1 2.7027 ext4
1 2.7027 libattr.so.1.1.0
You see that out of (estimated) 2.5-3.5 mil of event occurrences (which is
the scale `ls` usually needs on my PC), I got only 37 samples.
The sample rate for each event are in `ophelp` as "min count".
Why did I get 37 samples? My default setting of default event is:
$ ophelp -d
CPU_CLK_UNHALTED:100000:0:1:1
-------------------^^
This means that there is 1 sample per 100000 event occurrences.
You can tweak this value a bit, but note that oprofile won't let you set
the value to be too low, due to the overhead reasons.
Let's try 10000 (sampling 10 times more often):
$ operf -e CPU_CLK_UNHALTED:10000 ls
WARNING: Lost samples detected! See /root/oprofile_data/samples/operf.log for details.
Lowering the sampling rate may reduce or eliminate lost samples.
See the '--events' option description in the operf man page for help.
Profiling done.
$ opreport
Using /root/oprofile_data/samples/ for samples directory.
WARNING: Lost samples detected! See /root/oprofile_data/samples/operf.log for details.
CPU: Core 2, speed 2267 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10000
CPU_CLK_UNHALT...|
samples| %|
------------------
429 100.000 ls
CPU_CLK_UNHALT...|
samples| %|
------------------
313 72.9604 kallsyms
55 12.8205 ld-2.19.so
34 7.9254 libc-2.19.so
17 3.9627 ls
8 1.8648 ext4
2 0.4662 binfmt_misc
You see that both operf and opreport warned about lost samples which means
that the overhead in kernel was too high and some samples were lost. But
probably not that many, since you see that I got 429 samples instead of
37 which is very roughly 10 times more.
Profiling is not for exact measurement of how many CPU ticks happened in
a function, it is rather designed for relative comparison across your code.
If you are getting too few samples, try to decrease the sample rate as I
did in the example. Have a look at `ophelp` what is the lowest value for
your event.
Also please note that the min sample rates are rough and depend on the load,
so under low load, lower rates can be used without issues. When you profile
systemwide on all CPUs, it's very high chance that you'll run into an overhead
and lose many samples...
I hope I explained it a bit.
Cheers,
Michael
Thanks,
Dhara Buch
Dhara Buch