shithub: openh264

Download patch

ref: b6c4a5447c6ec67daf2a394b7927ea15bedbc5f7
parent: 98042f1600790bf80937ca9bbc3458539d3e6ed8
author: Sindre Aamås <saamas@cisco.com>
date: Wed Mar 16 15:33:33 EDT 2016

[Decoder/x86] IDCT one block at a time with SSE2

At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.

Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.