Devel::NYTProf v5 at YAPC::NA 201406

Devel::NYTProfPerl Source Code Profiler

Tim Bunce - YAPC::NA - 2014

Devel::DProf Is Broken$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl

$ perl -d:DProf x.pl

$ dprofpp -rTotal Elapsed Time = 0.108 Seconds Real Time = 0.108 SecondsExclusive Times%Time ExclSec CumulS #Calls sec/call Csec/c Name 9.26 0.010 0.010 1 0.0100 0.0100 main::s76 9.26 0.010 0.010 1 0.0100 0.0100 main::s323 9.26 0.010 0.010 1 0.0100 0.0100 main::s626 9.26 0.010 0.010 1 0.0100 0.0100 main::s936 0.00 - -0.000 1 - - main::s77 0.00 - -0.000 1 - - main::s82

Profiling 101The Basics

CPU Time Real Time

Subroutines

Statements

? ?? ?

What To Measure?

Subroutine vs Statement• Subroutine Profiling- Measures time between subroutine entry and exit

- That’s the Inclusive time. Exclusive by subtraction.- Reasonably fast, reasonably small data files

• Problems- Can be confused by funky control flow (goto &sub)- No insight into where time spent within large subs

- Doesn’t measure code outside of a sub

Subroutine vs Statement

• Line/Statement profiling- Measure time from start of one statement to the start

of the next statement, whereever that might be- Fine grained detail

• Problems- Very expensive in CPU & I/O- Assigns too much time in some cases- Too much detail for large subs- Hard to get overall subroutine times

CPU Time vs Real Time• CPU Time- Measures time the CPU sent executing your code- Not (much) affected by other load on system- Doesn’t include time spent waiting for i/o etc.

• Real Time- Measures the elapsed time-of-day- Your time is affected by other load on system- Includes time spent waiting for i/o etc.

Devel::NYTProf

Public Service Announcement!

The NYTProf name is an accident of history I do not work for the New York TimesI have never worked for the New York TimesI have no affiliation with the New York TimesThe New York Times last contributed in 2008

Running NYTProfperl -d:NYTProf ...

perl -MDevel::NYTProf ...

Configure profiler via the NYTPROF env varperldoc Devel::NYTProf for the details

To profile code that’s invoked elsewhere:PERL5OPT=-d:NYTProf

NYTPROF=file=/tmp/nytprof.out:addpid=1:...

Reporting: KCachegrind

• KCachegrind call graph - new and cool- contributed by C. L. Kao.

- requires KCachegrind

$ nytprofcg # generates nytprof.callgraph

$ kcachegrind # load the file via the gui

KCachegrind

Reporting: HTML• HTML report

- page per source file, annotated with times and links

- subroutine index table with sortable columns- interactive Treemap of subroutine times

- generates Graphviz dot file of call graph- -m (--minimal) faster generation but less detailed

$ nytprofhtml # writes HTML report in ./nytprof/...

$ nytprofhtml --file=/tmp/nytprof.out.793 --open

Summary

Links to annotatedsource code

Timings for perl builtins

Link to sortable tableof all subs

Inclusive vs Exclusive Time

Inclusivesub foo

Exclusive

sub bar

bar() bar()

Inclusive

Inclusive vs. Exclusive

• Inclusive Time is best for Top Down- Overview of time spent “in and below this sub”

- Useful to prioritize structural optimizations

• Exclusive Time is best for Bottom Up- Detail of time spent “in the code of this sub”- Where the time actually gets spent

- Useful for localized (peephole) optimization

Annotated Source View

Overall time spent in and below this sub

(in + below)

Color coding based onMedian Average Deviationrelative to rest of this file

Timings for each location that calls this subroutine

Time between starting this perl statement and starting the next.So includes overhead of calls to

perl subs.Timings for each subroutine

called by each line

Boxes represent subroutinesColors only used to show

packages (and aren’t pretty yet)

Hover over box to see detailsClick to drill-down one level

in package hierarchy

Treemap showing relative proportions of exclusive time

Calls between packages

Calls to/from/within package

Let’s take a look...

OptimizingHints & Tips

Do your own testingWith your own perl binary

On your own hardware

Beware My Examples!

Take care comparing code fragments!Edge-effects at loop and scope boundaries.Statement time includes time getting to the next perl statement, wherever that may be.

Beware 2!

Consider effect of CPU-level data and code cachingTends to make second case look faster!Swap the order to double-check alternatives

Beware Your Examples!

Phase 0 Before you start

DON’TDO IT!

“The First Rule of Program Optimization: Don't do it.

The Second Rule of Program Optimization (for experts only!): Don't do it yet.”

- Michael A. Jackson

Why not?

“More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.”

- W.A. Wulf

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

- Donald Knuth

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.Yet we should not pass up our opportunities in that critical 3%.”

- Donald Knuth

“Throw hardware at it!”

Hardware == CheapProgrammers == Expensive (& error prone)

Hardware upgrades are usually much less risky than software optimizations.

“Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.”

- Rob Pike

“Measure twice, cut once.”

- Old Carpenter’s Maxim

Phase 1Low Hanging Fruit

Low Hanging Fruit1. Profile code running representative workload.2. Look at Exclusive Time of subroutines.3. Do they look reasonable?4. Examine worst offenders.5. Fix only simple local problems.6. Profile again.7. Fast enough? Then STOP!8. Rinse and repeat once or twice, then move on.

“Simple Local Fixes”

Changes unlikely to introduce bugs

Move invariant expressionsout of loops

Avoid->repeated->chains->of->accessors(...);

Use a temporary variable

Use faster accessorsClass::Accessor-> Class::Accessor::Fast--> Class::Accessor::Faster---> Class::Accessor::Fast::XS----> Class::XSAccessor

These aren’t all compatible so consider your actual usage.

(The list above is out of date.)

Avoid calling subs that don’t do anything!my $unused_variable = $self->get_foo;

my $is_logging = $log->info(...);while (...) { $log->info(...) if $is_logging; ...}

Exit subs and loops earlyDelay initializations

return if not ...a cheap test...;return if not ...a more expensive test...;my $foo = ...initializations...;...body of subroutine...

Fix silly code

- return exists $nav_type{$country}{$key}- ? $nav_type{$country}{$key}- : undef;

+ return $nav_type{$country}{$key};

Beware pathological regular expressions

Devel::NYTProf shows regular expression opcodes.Consider using no feature 'unicode_strings';

Avoid unpacking argsin very hot subs

sub foo { shift->delegate(@_) }

sub bar { return shift->{bar} unless @_; return $_[0]->{bar} = $_[1]; }

Avoid unnecessary(capturing parens)

in regex

Retest.

Fast enough?

STOP!Put the profiler down and walk away

Phase 2Deeper Changes

Profile with aknown workload

E.g., 1000 identical requests

Check subroutinecall counts

Reasonablefor the workload?

Check Inclusive Times(especially top-level subs)

Reasonable percentagefor the workload?

Add cachingif appropriate

to reduce calls

Remember cache invalidation!

Walk up call chainto find good spots

for caching

Remember cache invalidation!

Creating many objectsthat don’t get used?

Try a lightweight proxye.g. DateTime::Tiny, DateTimeX::Lite, DateTime::LazyInit

Reconfigure your Perlcan yield useful gains with little effort

thread support costs ~2..30%debugging support costs ~15%

Also consider: usemymalloc, use64bitint, use64bitall, uselongdouble, optimize, disable taint mode.

Consider using a different compiler.

Upgrade your Perl

Newer versions often faster at some things(though occasionally slower at others)

Sometimes have specific micro-optimizations

Many memory usage and performanceimprovements from 5.8 thru 5.20

Retest.

Fast enough?

STOP!Put the profiler down and walk away.

Phase 3Structural Changes

Push loops down

- $object->walk($_) for @dogs;

+ $object->walk_these(\@dogs);

Use faster modulessort ! Sort::Key

Storable ! Sereal

LWP ! HTTP::Tiny ! HTTP::Lite ! *::Curl ! Hijk

These aren’t all compatible or full-featured or ‘better’Consider your actual needsSee http://neilb.org/reviews/

Change the data structure

hashes <–> arrays

Change the algorithm

What’s the “Big O”?O(n2) or O(logn) or ...

Rewrite hot-spotsin XS / C

Consider Inline::Cbut beware of deployment issues

Small changes add up!

“I achieved my fast times by multitudes of 1% reductions”

- Bill Raymond

See also “Top 10 Perl Performance Tips”

• A presentation by Perrin Harkins• Covers higher-level issues, including- Good DBI usage- Fastest modules for serialization, caching,

templating, HTTP requests etc.• http://docs.google.com/present/view?id=dhjsvwmm_26dk9btn3g

Questions?

Tim.Bunce@pobox.comhttp://blog.timbunce.org

@timbunce on twitter

Devel::NYTProf v5 at YAPC::NA 201406

Software

Transcript of Devel::NYTProf v5 at YAPC::NA 201406

RapidApp - YAPC::NA 2014

Profiling with Devel::NYTProf

Yapc -asia 2012 lt @studio3104

201406 Spring MIT EI EnergyFutures

Moose - YAPC::NA 2012

YAPC::Europe 2013

WWW:::Mechanize YAPC::BR 2008

Chekiri Rafik 201406 MAS Thesis

YAPC::TV essentials

Introduction to YAPC::NA::2014

Про YAPC::TV

Gun World 201406

YAPC Asia 2010 - lowreal.net

YAPC::2014 Accountability

201406 reporteronline

Catalyst patterns-yapc-eu-2016

YAPC 2007 "Module Kwalitee" Talk

Yapc Asia 2008 TMTOWTMS

Devel::NYTProf 2009-07 (OUTDATED, see 201008)

201406 Compuware Catalog Small