Paolo Carlini - Re: Call for compiler help/advice: atomic builtins for v3 (original) (raw)

This is the mail archive of the gcc@gcc.gnu.orgmailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

From: Paolo Carlini
To: Mark Mitchell
Cc: gcc at gcc dot gnu dot org, libstdc++ at gcc dot gnu dot org, rth at redhat dot com,Ian Lance Taylor
Date: Sun, 06 Nov 2005 19:56:04 +0100
Subject: Re: Call for compiler help/advice: atomic builtins for v3
References: <436DDC36.8070308@suse.de> <436E4DF0.3070004@codesourcery.com>

Hi Mark,

I think this is a somewhat difficult problem because of the tension between performance and functionality. In particular, as you say, the code sequence you want to use varies by CPU.

I don't think I have good answers; this email is just me musing out loud.

You probably don't want to inline the assembly code equivalent of:

if (cpu == i386) ... else if (cpu == i486) ... else if (cpu == i586) ... ...

On the other hand, if you inline, say, the i486 variant, and then run on a i686, you may not get very good performance.

So, the important thing is to weigh the cost of a function call plus run-time conditionals (when using a libgcc routine that would contain support for all the CPUs) against the benefit of getting the fastest code sequences on the current processors.

Actually, the situation is not as bad, as far as I can see: the worst case is i386 vs i486+, and Old-Sparc vs New-Sparc. More generally, a targer either cannot implement the builtin at all (a trivial fall back using locks or no MT support at all) or can in no more than 1 non-trivial way. Then libgcc would contain at most 2 versions: the trivial one, and another piece of assembly, absolutely identical in principle to what the builtin is expanded too in case the inline version is actually desired.

And in a workstation distribution you may be concerned about supporting multiple CPUs; if you're building for a specific hardware board, then you only care about the CPU actually on that board.

What do you propose that the libgcc routine do for a CPU that cannot support the builtin at all? Just do a trivial implementation that is safe only for a single-CPU, single-threaded system?

Either that or a very low performance one, using locks. The issue it's still open, we can resolve it rather easily, I think.

I think that to satisfy everyone, you may need a configure option to decide between inlining support for a particular processor (for maximum performance when you know the target performance) and making a library call (when you don't).

Yes, let's consider for simplicity the obnoxious i686: if the user doesn't passes any -march then the fallback using locks is picked from libgcc or the non-trivial implementation if the specific target (i486+) supports it; if the user passes -march=i486+ then the builtin is expanded inline by the compiler, no use of libgcc at all. Similarly for Sparc.

Paolo.

Follow-Ups:
- Re: Call for compiler help/advice: atomic builtins for v3
  * From: Mark Mitchell
- Re: Call for compiler help/advice: atomic builtins for v3
  * From: Florian Weimer
References:
- Call for compiler help/advice: atomic builtins for v3
  * From: Paolo Carlini
- Re: Call for compiler help/advice: atomic builtins for v3
  * From: Mark Mitchell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]