ref: 64f728caef5d9f019222c6989a9c6df17464dd69
parent: 60d1a5299576649f6db38714319b5845683ff0ab
author: Yunqing Wang <yunqingwang@google.com>
date: Tue Nov 12 11:51:15 EST 2013
Do horizontal loopfiltering in parallel This patch followed "Rewrite filter_selectively_horiz for parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. Also, corrected the declaration of aligned arrays. For 8-pixel-in-parallel case, improved the calculation of the masks and filters. Updated the threshold loading since the thresholds were already duplicated. Updated neon C functions to call neon loopfilters twice. Using tulip clip, tests showed it gave a ~1.5% decoder speed gain. Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35