ref: e4b3ab8d26e91a37ed9757b9a6e2e54dc5423309
dir: /profiling.txt/
PROFILING RESULTS ----------------- This is an attempt to cronolog all the profiles done to the code, and the speed at which teh code runs after steps have been taken to optimize it. Think of it as a diary. NOTE: After ever optmiztion and code change, md5sum MUST be run on the output files to determine that they are exact matches of teh pre-optimized code. PLEASE remember to do this, it'll save alot of heartache in the future. Also please dont include any Flat data that is less than 1% of the total time. Also NOTE: All tests are run at the DEFAULT encoder settings, ie, faac track.wav Format: *Person profiling *Version/cvs date *gcc flags *runtime *data track used *notes *FLAT gprof profile. We may include the Call graph at a later date if we need to. ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 11:04:16 CST 2000 -O6 -march=i686 -funroll-loops -finline-functions -fomit-frame-pointer 10 min, 47.88 sec. The Outfield Bangin': track 2: Bangin' On My Heart This is the default fully optimized non profiled compile on my box. NO PROFILE DATA ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 11:04:16 CST 2000 -O6 -march=i686 -funroll-loops -finline-functions -g -pg -a 11 min, 12.98 sec. The Outfield Bangin': track 2: Bangin' On My Heart This is the default fully optimized, profiled compile on my box. Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 58.80 385.25 385.25 41048 9385.35 9385.35 psy_step6 3.92 410.93 25.68 41048 625.61 625.61 psy_step14 3.91 436.53 25.60 41048 623.66 623.66 psy_step11andahalf 3.72 460.87 24.34 20524 1185.93 1787.90 psy_step2 3.01 480.59 19.72 20524 960.83 2928.46 tf_encode_spectrum_aac 2.53 497.14 16.55 41048 403.19 13119.15 EncTf_psycho_acoustic 2.17 511.39 14.25 19506 730.54 730.54 estimate_delay 2.13 525.32 13.93 20524 678.72 678.72 psy_step4 2.02 538.53 13.21 22839582 0.58 0.61 output_bits 1.40 547.73 9.20 41104 223.82 223.98 calc_noise 1.17 555.37 7.64 41048 186.12 186.12 psy_step9 1.09 562.53 7.16 42029056 0.17 0.17 double_to_int ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 13:00:03 CST 2000 -O6 -march=i686 -funroll-loops -finline-functions -fomit-frame-pointer 10 min, 44.19 sec. The Outfield Bangin': track 2: Bangin' On My Heart New fully Optimized non profile with a macroized double_to_int. ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 13:00:03 CST 2000 -O6 -march=i686 -funroll-loops -finline-functions -g -pg -a 11 min, 5.36 sec. The Outfield Bangin': track 2: Bangin' On My Heart Fully optimized, Profiled with macroized double_to_int. Notice no double_to_int in the top 1% of the flat profile. Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 58.99 385.02 385.02 41048 9379.75 9379.75 psy_step6 3.94 410.71 25.69 41048 625.85 625.85 psy_step11andahalf 3.91 436.24 25.53 20524 1243.91 1822.18 psy_step2 3.83 461.23 24.99 41048 608.80 608.80 psy_step14 3.06 481.23 20.00 20524 974.47 2941.86 tf_encode_spectrum_aac 2.47 497.37 16.14 41048 393.20 13116.31 EncTf_psycho_acoustic 2.19 511.64 14.27 20524 695.28 695.28 psy_step4 2.10 525.36 13.72 19506 703.37 703.37 estimate_delay 2.06 538.82 13.46 22839582 0.59 0.62 output_bits 1.35 547.63 8.81 41104 214.33 215.15 calc_noise 1.19 555.37 7.74 41048 188.56 188.56 psy_step9 1.04 562.15 6.78 20524 330.34 597.84 nok_ltp_reconstruct ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -finline-functions 5 min, 38.96 sec. The Outfield Bangin': track 2: Bangin' On My Heart Ok, this is wierd. It jsut dawned on me that since the x86 isnt a risc box with a ton of registers (like im used to) it might be register starved. That seems to be the case. With no optimization except inlining, we are running with a 200% speed increase. Time to pin this down. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -O -finline-function 11 min, 6.09 sec. The Outfield Bangin': track 2: Bangin' On My Heart Looks like -O is still starving registers. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 No Optimization 5 min, 39.84 sec. The Outfield Bangin': track 2: Bangin' On My Heart a bit slower than the non inlined version. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -march=i686 -finline-functions 5 min, 38.25 sec. The Outfield Bangin': track 2: Bangin' On My Heart Still Not getting anywhere. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -march=i686 -finline-functions -funroll-loops 5 min, 37.04 sec. The Outfield Bangin': track 2: Bangin' On My Heart More Nothing. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -march=i686 -finline-functions -funroll-loops -fomit-frame-pointer 5 min, 39.46 sec. The Outfield Bangin': track 2: Bangin' On My Heart Still Nothing. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -march=i686 -finline-functions -funroll-loops -fomit-frame-pointer -fthread-jumps 5 min, 41.00 sec. The Outfield Bangin': track 2: Bangin' On My Heart Still Nothing. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -march=i686 -finline-functions -funroll-loops -fomit-frame-pointer -fthread-jumps -fdefer-pop 5 min, 42.24 sec. The Outfield Bangin': track 2: Bangin' On My Heart Still Nothing. NO PROFILE ------------------------------------------------------------------------------ thebard CVS Tue Feb 22 19:18:25 CST 2000 -O6 -march=i686 -finline-functions -funroll-loops -fomit-frame-pointer -ffast-math 3 min, 12.13 sec. The Outfield Bangin': track 2: Bangin' On My Heart Bingo! Got it. -O core dumps, and -O2 to -O6 are slower than thuer non optimized counterparts, unless you add -ffast-math. then things are speedy. The assembly it produces still has room for speed. Still more work to do. ------------------------------------------------------------------------------ thebard CVS Wed Feb 23 20:30:01 CST 2000 -O6 -march=i686 -finline-functions -funroll-loops -ffast-math -g -pg -a 3 min, 44.97 sec. The Outfield Bangin': track 2: Bangin' On My Heart This is the profiled fully optimized version. As you can see, there isnt a whole lot letf to optimize... Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 9.76 20.63 20.63 32619291 0.63 0.66 output_bits 9.18 40.05 19.42 20524 946.21 3941.14 tf_encode_spectrum_aac 7.83 56.60 16.55 20524 806.37 1361.39 psy_step2 7.72 72.93 16.33 41048 397.83 2266.16 EncTf_psycho_acoustic 6.20 86.05 13.12 102792 127.64 128.13 calc_noise 5.14 96.91 10.86 20524 529.14 529.14 psy_step4 4.51 106.44 9.53 41048 232.17 232.17 psy_step6 3.33 113.48 7.04 41048 171.51 171.51 psy_step9 3.22 120.30 6.82 223693 30.49 96.05 noiseless_bit_count 3.22 127.11 6.81 188315 36.16 36.16 PulseCoder 3.16 133.79 6.68 41048 162.74 162.74 psy_step14 3.03 140.20 6.41 203169 31.55 31.55 quantize 2.62 145.75 5.55 7931 699.79 699.79 estimate_delay 2.55 151.14 5.39 20524 262.62 463.36 nok_ltp_reconstruct 2.50 156.42 5.28 751457 7.03 7.03 pfftw_twiddle_4 2.38 161.46 5.04 41048 122.78 122.78 psy_step5 2.03 165.76 4.30 223693 19.22 115.27 bit_search 1.89 169.75 3.99 61089 65.31 102.13 MDCT 1.62 173.18 3.43 10263 334.21 20456.01 EncTfFrame 1.44 176.23 3.05 9744296 0.31 0.31 BsPutBit 1.32 179.03 2.80 20524 136.43 339.21 realft2048 1.28 181.74 2.71 1037456 2.61 2.61 pfftw_32 1.05 183.96 2.22 28455 78.02 297.27 buffer2freq