shithub: libvpx

Download patch

ref: f3c97ed32ef6a9419df3b68de895af75b70d6166
parent: d204c4bf017dc8313fc315f5c4da4492acd7641f
author: Johann <johannkoenig@google.com>
date: Wed May 24 07:52:42 EDT 2017

subpel variance neon: reduce stack usage

Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).

But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.

There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.

Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1