shithub: aacenc

ref: 06e2bdb10725e93bfdfae7c8c5744692d5f5b49b
dir: /profiling.txt/

View raw version
PROFILING RESULTS
-----------------

This is an attempt to cronolog all the profiles done to the code, and the speed at which teh code runs after steps have been taken to optimize it. Think of it  as a diary.

NOTE: After ever optmiztion and code change, md5sum MUST be run on the output   files to determine that they are exact matches of teh pre-optimized code.       PLEASE remember to do this, it'll save alot of heartache in the future.
Also please dont include any Flat data that is less than 1% of the total time.

Also NOTE: All tests are run at the DEFAULT encoder settings, ie, faac track.wav

Format:
*Person profiling
*Version/cvs date
*gcc flags
*runtime
*data track used
*notes
*FLAT gprof profile. We may include the Call graph at a later date if we need   to.

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 11:04:16 CST 2000
-O6 -march=i686 -funroll-loops -finline-functions -fomit-frame-pointer
10 min, 47.88 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
This is the default fully optimized non profiled compile on my box.
NO PROFILE DATA

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 11:04:16 CST 2000
-O6 -march=i686 -funroll-loops -finline-functions -g -pg -a
11 min, 12.98 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
This is the default fully optimized, profiled compile on my box.
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 58.80    385.25   385.25    41048  9385.35  9385.35  psy_step6
  3.92    410.93    25.68    41048   625.61   625.61  psy_step14
  3.91    436.53    25.60    41048   623.66   623.66  psy_step11andahalf
  3.72    460.87    24.34    20524  1185.93  1787.90  psy_step2
  3.01    480.59    19.72    20524   960.83  2928.46  tf_encode_spectrum_aac
  2.53    497.14    16.55    41048   403.19 13119.15  EncTf_psycho_acoustic
  2.17    511.39    14.25    19506   730.54   730.54  estimate_delay
  2.13    525.32    13.93    20524   678.72   678.72  psy_step4
  2.02    538.53    13.21 22839582     0.58     0.61  output_bits
  1.40    547.73     9.20    41104   223.82   223.98  calc_noise
  1.17    555.37     7.64    41048   186.12   186.12  psy_step9
  1.09    562.53     7.16 42029056     0.17     0.17  double_to_int

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 13:00:03 CST 2000
-O6 -march=i686 -funroll-loops -finline-functions -fomit-frame-pointer
10 min, 44.19 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
New fully Optimized non profile with a macroized double_to_int.

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 13:00:03 CST 2000
-O6 -march=i686 -funroll-loops -finline-functions -g -pg -a
11 min, 5.36 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Fully optimized, Profiled with macroized double_to_int. Notice no double_to_int in the top 1% of the flat profile.
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 58.99    385.02   385.02    41048  9379.75  9379.75  psy_step6
  3.94    410.71    25.69    41048   625.85   625.85  psy_step11andahalf
  3.91    436.24    25.53    20524  1243.91  1822.18  psy_step2
  3.83    461.23    24.99    41048   608.80   608.80  psy_step14
  3.06    481.23    20.00    20524   974.47  2941.86  tf_encode_spectrum_aac
  2.47    497.37    16.14    41048   393.20 13116.31  EncTf_psycho_acoustic
  2.19    511.64    14.27    20524   695.28   695.28  psy_step4
  2.10    525.36    13.72    19506   703.37   703.37  estimate_delay
  2.06    538.82    13.46 22839582     0.59     0.62  output_bits
  1.35    547.63     8.81    41104   214.33   215.15  calc_noise
  1.19    555.37     7.74    41048   188.56   188.56  psy_step9
  1.04    562.15     6.78    20524   330.34   597.84  nok_ltp_reconstruct

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
-finline-functions 
5 min, 38.96 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Ok, this is wierd. It jsut dawned on me that since the x86 isnt a risc box with a ton of registers (like im used to) it might be register starved. That seems to be the case. With no optimization except inlining, we are running with a 200% speed increase. Time to pin this down.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
-O -finline-function
11 min, 6.09 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Looks like -O is still starving registers.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
No Optimization
5 min, 39.84 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
a bit slower than the non inlined version.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
-march=i686 -finline-functions
5 min, 38.25 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Still Not getting anywhere.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
-march=i686 -finline-functions -funroll-loops
5 min, 37.04 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
More Nothing.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
-march=i686 -finline-functions -funroll-loops -fomit-frame-pointer
5 min, 39.46 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Still Nothing.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
 -march=i686 -finline-functions -funroll-loops -fomit-frame-pointer -fthread-jumps
5 min, 41.00 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Still Nothing.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
-march=i686 -finline-functions -funroll-loops -fomit-frame-pointer -fthread-jumps -fdefer-pop
5 min, 42.24 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Still Nothing.
NO PROFILE

------------------------------------------------------------------------------
thebard
CVS Tue Feb 22 19:18:25 CST 2000
-O6 -march=i686 -finline-functions -funroll-loops -fomit-frame-pointer -ffast-math
3 min, 12.13 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
Bingo! Got it. -O core dumps, and -O2 to -O6 are slower than thuer non optimized counterparts, unless you add -ffast-math. then things are speedy. The assembly it produces still has room for speed. Still more work to do.

------------------------------------------------------------------------------
thebard
CVS Wed Feb 23 20:30:01 CST 2000
-O6 -march=i686 -finline-functions -funroll-loops -ffast-math -g -pg -a
3 min, 44.97 sec.
The Outfield Bangin': track 2: Bangin' On My Heart
This is the profiled fully optimized version. As you can see, there isnt a whole lot letf to optimize...
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
  9.76     20.63    20.63 32619291     0.63     0.66  output_bits
  9.18     40.05    19.42    20524   946.21  3941.14  tf_encode_spectrum_aac
  7.83     56.60    16.55    20524   806.37  1361.39  psy_step2
  7.72     72.93    16.33    41048   397.83  2266.16  EncTf_psycho_acoustic
  6.20     86.05    13.12   102792   127.64   128.13  calc_noise
  5.14     96.91    10.86    20524   529.14   529.14  psy_step4
  4.51    106.44     9.53    41048   232.17   232.17  psy_step6
  3.33    113.48     7.04    41048   171.51   171.51  psy_step9
  3.22    120.30     6.82   223693    30.49    96.05  noiseless_bit_count
  3.22    127.11     6.81   188315    36.16    36.16  PulseCoder
  3.16    133.79     6.68    41048   162.74   162.74  psy_step14
  3.03    140.20     6.41   203169    31.55    31.55  quantize
  2.62    145.75     5.55     7931   699.79   699.79  estimate_delay
  2.55    151.14     5.39    20524   262.62   463.36  nok_ltp_reconstruct
  2.50    156.42     5.28   751457     7.03     7.03  pfftw_twiddle_4
  2.38    161.46     5.04    41048   122.78   122.78  psy_step5
  2.03    165.76     4.30   223693    19.22   115.27  bit_search
  1.89    169.75     3.99    61089    65.31   102.13  MDCT
  1.62    173.18     3.43    10263   334.21 20456.01  EncTfFrame
  1.44    176.23     3.05  9744296     0.31     0.31  BsPutBit
  1.32    179.03     2.80    20524   136.43   339.21  realft2048
  1.28    181.74     2.71  1037456     2.61     2.61  pfftw_32
  1.05    183.96     2.22    28455    78.02   297.27  buffer2freq