ref: ae35425ae64a3d9573f85a4a92c5638a58044057
parent: b3a36f7946f930caa0e96448648db60d7330c98d
author: Kyle Siefring <kylesiefring@gmail.com>
date: Sun Oct 22 15:34:19 EDT 2017
Optimize convolve8 SSSE3 and AVX2 intrinsics Changed the intrinsics to perform summation similiar to the way the assembly does. The new code diverges from the assembly by preferring unsaturated additions. Results for haswell SSSE3 Horiz/Vert Size Speedup Horiz x4 ~32% Horiz x8 ~6% Vert x8 ~4% AVX2 Horiz/Vert Size Speedup Horiz x16 ~16% Vert x16 ~14% BUG=webm:1471 Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668