[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian) (original) (raw)

Michael Niedermayer michaelni
Wed Feb 1 21:30:15 CET 2006


Hi

On Wed, Feb 01, 2006 at 01:56:21AM +0100, Pawe?? Sikora wrote:

Dnia Wednesday, 1 of February 2006 01:39, Aurelien Jacobs napisa??: > Pawe?? Sikora wrote:

> > hmmm, the 4.1/4.0 fixedtranspose4x4 are equal but benchmarks differs. > > maybe origtranspose4x4 has different prologue? > > seems so. > > > [ 4.1 / -O2 ] > > origtranspose4x4: > > leal (%rdx,%rdx), %r9d > > leal (%rcx,%rcx), %eax > > movslq %edx,%r11 > > movslq %ecx,%r8 > > movslq %r9d,%r10 > > addl %edx, %r9d > > movslq %eax,%rdx > > addl %ecx, %eax > > movslq %r9d,%r9 > > cltq > [ 4.0 / -O2 ] > origtranspose4x4: > leal (%rdx,%rdx), %r8d > movslq %edx,%r10 > leaq (%rcx,%rcx,2), %rax > movslq %r8d,%r9 > addl %edx, %r8d > movslq %r8d,%r8 yeah, the 4.1 gives worse code and my first benchmark can be send to /dev/null. moreover the second fix (s/int/long/) simplifies x86-64 prologue and gives measurable gain.

maybe we should typedef int int64_t; on x86-64? arrays where space matters should be of the intXX_t type or similar anyway

opinions?

benchmarks?

[...]

Michael



More information about the ffmpeg-devel mailing list