Uros Bizjak - Re: generic and i386 bswap improvements (original) (raw)

This is the mail archive of the gcc-patches@gcc.gnu.orgmailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Hello Richard!

(3) Implement bswapsi for 80386, which doesn't have the bswap instruction. For this we generate

xchgb %ch, %cl roll 16,16, 16,ecx xchgb %ch, %cl

According to pentium optimization guide, this is a win only for pentium4 (1.5clk vs 4clk), other targets should use rolw 8,8, 8,cx or (rorw 8,8, 8,cx) instead of xchgb.

Perhaps we should generate rolw as default (it also operates on registers, other than Q) and split it after reload into xchgb when appropriate?

Attahced to this message, please find a patch (diffed to a couple of days old mainline!) that implements the second part of above suggestion. Due to the granularity of rdtsc, I was not able to measure any runtime difference on pentium4, but it is clearly a code size win.

2007-02-14 Uros Bizjak ubizjak@gmail.com

   * config/i386/i386.h (x86_use_xchgb): New.
   (TARGET_USE_XCHGB): New macro.
   * config/i386/i386.c (x86_use_xchgb): Set for PENT4.


* config/i386/i386.md (*rotlhi3_1, *rotrhi3_1): For TARGET_USE_XCHGB
or when optimizing for size, split into bswaphi after reload for shifts of 8.
(*bswaphi): New insn pattern.

Uros.

Attachment:i386-xchgb.diff
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]