Jan Hubicka - Re: [PATCH, x86_64]: Fix PR target/30778 (original) (raw)
This is the mail archive of the gcc-patches@gcc.gnu.orgmailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
- From: Jan Hubicka
- To: Uros Bizjak
- Cc: GCC Patches , Jan Hubicka
- Date: Sun, 25 Feb 2007 18:44:48 +0100
- Subject: Re: [PATCH, x86_64]: Fix PR target/30778
- References: <45E06E47.7000503@gmail.com>
Hello!
This patch fixes a failure, described in PR target/30778. The problem was, that new stringop code expanded copy loop even when expected size was less than minimum size that loop can handle. Copy loop was still generated, but memory was again set in stringop prologue expansion.
The patch fixes the case of zero iterations, where we exit from the function early. Also, it implements a small optimization for one iteration. Compare and jump is not needed in this case.
Unfortunatelly, this fix is not enough to fix pr target/30770, where BOOT_CFLAGS="-O2 -mtune=nocona". Nocona and k8 are heavy users of expand_set_or_movmem_via_loop() for small structures, so it is still something wrong there.
The patch was bootstrapped (defalt BOOT_CFLAGS) on x86_64-linux-gnu and regression tested for all default languages. It still fails for BOOT_CFLAGS="-O2 -mtune=nocona", but it is IMO a step in right direction. OK for mainline?
2007-02-24 Uros Bizjak ubizjak@gmail.com
PR target/30778 * config/i386/i386.md (expand_set_or_movmem_via_loop): Return if GET_MODE_SIZE (mode) * unroll is less than expected_size. Do not emit compare and jump if number of iterations is less than 2.
Hi, as I've commented already on the PR audit trail, the problem with this patch is that expected_size is really just the expected size and not actual size when profile feedback is available. Disabling the code to emit loop is wrong then, since the actual size might be bigger even if the average is small.
The code is supposed to emit unwound version of memcpy, that for the testcase in question with static size of 31 bytes is pretty meaningless. Fortunately RTL constant propagation cleans it up easilly and I believe that instead of adding special cases to this code (that is complex already) we should teach move_by_pieces and set_by_pieces to deal with it.
The reason why wrong code is generated is that for constant operand I emit from the basic form:
if (size > minimum) do alignment prologue do main copying loop do epilogue copying up to minimum
I emit the first conditional. This was because the code got confused by VOIDmode CONST_INT and I conviced myself that it won't be hit since move_by_pieces will do the job, that has turned out to be wrong.
The attached patch cleans up the code to figure out what mode the copying loop should operate in and makes expander to always do the conditional. THis should avoid the wrong code cases and I will try to look into if the move_by_pieces heuristic can be improved.
I've bootstrapped/regtested it nocona-linux and k8-linux with earlier incarnation of the patch and comitted. I apologize for very slow reaction to this problem, I will try to adress mine problems in bugzilla next week with a priority as there is quite a bit cummulated.
I would also like to thank Uros and Andrew for analyzing this boostrap miscompare
Honza
Index: ChangeLog
--- ChangeLog (revision 122312) +++ ChangeLog (working copy) @@ -1,3 +1,13 @@ +2007-02-24 Uros Bizjak ubizjak@gmail.com + Jan Hubicka jh@suse.cz + + PR target/30778 + * i386.c (counter_mode): New function. + (expand_set_or_movmem_via_loop): Use it. + (expand_movmem_epilogue): Likewise; fix pasto. + (ix86_expand_movmem): Do emit guard even for constant counts. + (ix86_expand_setmem): Likewise. + 2007-02-25 Nick Clifton nickc@redhat.com * config/frv/frv.h (ASM_OUTPUT_CASE_LABEL): Delete. Index: testsuite/gcc.c-torture/execute/pr30778.c
--- testsuite/gcc.c-torture/execute/pr30778.c (revision 0) +++ testsuite/gcc.c-torture/execute/pr30778.c (revision 0) @@ -0,0 +1,34 @@ +extern void *memset (void *, int, unsigned long); +extern void abort (void); + +struct reg_stat { + void *last_death; + void *last_set; + void *last_set_value; + int last_set_label; + char last_set_sign_bit_copies; + int last_set_mode : 8; + char last_set_invalid; + char sign_bit_copies; + long nonzero_bits; +}; + +static struct reg_stat *reg_stat; + +void attribute((noinline)) +init_reg_last (void) +{ + memset (reg_stat, 0, __builtin_offsetof (struct reg_stat, sign_bit_copies)); +} + +int main (void) +{ + struct reg_stat r; + + reg_stat = &r; + r.nonzero_bits = -1; + init_reg_last (); + if (r.nonzero_bits != -1) + abort (); + return 0; +} Index: testsuite/ChangeLog
--- testsuite/ChangeLog (revision 122312) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,7 @@ +2007-02-24 Jan Hubicka jh@suse.cz + + * gcc.c-torture/execute/pr30778.c: New testcase. + 2007-02-24 Jerry DeLisle jvdelisle@gcc.gnu.org PR libgfortran/30918 Index: config/i386/i386.c
--- config/i386/i386.c (revision 122312) +++ config/i386/i386.c (working copy) @@ -13314,6 +13314,21 @@ scale_counter (rtx countreg, int scale) return sc; } +/* Return mode for the memcpy/memset loop counter. Preffer SImode over DImode + for constant loop counts. / + +static enum machine_mode +counter_mode (rtx count_exp) +{ + if (GET_MODE (count_exp) != VOIDmode) + return GET_MODE (count_exp); + if (GET_CODE (count_exp) != CONST_INT) + return Pmode; + if (TARGET_64BIT && (INTVAL (count_exp) & ~0xffffffff)) + return DImode; + return SImode; +} + / When SRCPTR is non-NULL, output simple loop to move memory pointer to SRCPTR to DESTPTR via chunks of MODE unrolled UNROLL times, overall size is COUNT specified in bytes. When SRCPTR is NULL, output the @@ -13330,7 +13345,7 @@ expand_set_or_movmem_via_loop (rtx destm int expected_size) { rtx out_label, top_label, iter, tmp; - enum machine_mode iter_mode; + enum machine_mode iter_mode = counter_mode (count); rtx piece_size = GEN_INT (GET_MODE_SIZE (mode) * unroll); rtx piece_size_mask = GEN_INT (~((GET_MODE_SIZE (mode) * unroll) - 1)); rtx size; @@ -13338,10 +13353,6 @@ expand_set_or_movmem_via_loop (rtx destm rtx y_addr; int i; - iter_mode = GET_MODE (count); - if (iter_mode == VOIDmode) - iter_mode = word_mode;
top_label = gen_label_rtx (); out_label = gen_label_rtx (); iter = gen_reg_rtx (iter_mode); @@ -13555,8 +13566,8 @@ expand_movmem_epilogue (rtx destmem, rtx emit_strmov (destmem, srcmem, destptr, srcptr, DImode, offset); else { - emit_strmov (destmem, srcmem, destptr, srcptr, DImode, offset); - emit_strmov (destmem, srcmem, destptr, srcptr, DImode, offset + 4); + emit_strmov (destmem, srcmem, destptr, srcptr, SImode, offset); + emit_strmov (destmem, srcmem, destptr, srcptr, SImode, offset + 4); } offset += 8; } @@ -13673,8 +13684,8 @@ expand_setmem_epilogue_via_loop (rtx des rtx count, int max_size) { count = - expand_simple_binop (GET_MODE (count), AND, count, GEN_INT (max_size - 1), - count, 1, OPTAB_DIRECT); + expand_simple_binop (counter_mode (count), AND, count, + GEN_INT (max_size - 1), count, 1, OPTAB_DIRECT); expand_set_or_movmem_via_loop (destmem, NULL, destptr, NULL, gen_lowpart (QImode, value), count, QImode, 1, max_size / 2); @@ -14134,11 +14145,9 @@ ix86_expand_movmem (rtx dst, rtx src, rt gcc_assert (desired_align >= 1 && align >= 1); /* Ensure that alignment prologue won't copy past end of block. */ - if ((size_needed > 1 || (desired_align > 1 && desired_align > align)) - && !count) + if (size_needed > 1 || (desired_align > 1 && desired_align > align)) { epilogue_size_needed = MAX (size_needed - 1, desired_align - align);
/* Epilogue always copies COUNT_EXP & EPILOGUE_SIZE_NEEDED bytes.
Make sure it is power of 2. */
epilogue_size_needed = smallest_pow2_greater_than (epilogue_size_needed);
@@ -14146,8 +14155,10 @@ ix86_expand_movmem (rtx dst, rtx src, rt label = gen_label_rtx (); emit_cmp_and_jump_insns (count_exp, GEN_INT (epilogue_size_needed), - LTU, 0, GET_MODE (count_exp), 1, label); - if (expected_size == -1 || expected_size < epilogue_size_needed) + LTU, 0, counter_mode (count_exp), 1, label); + if (GET_CODE (count_exp) == CONST_INT) + ; + else if (expected_size == -1 || expected_size < epilogue_size_needed) predict_jump (REG_BR_PROB_BASE * 60 / 100); else predict_jump (REG_BR_PROB_BASE * 20 / 100); @@ -14247,7 +14258,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt if (size_needed < epilogue_size_needed) { tmp = - expand_simple_binop (GET_MODE (count_exp), AND, count_exp, + expand_simple_binop (counter_mode (count_exp), AND, count_exp, GEN_INT (size_needed - 1), count_exp, 1, OPTAB_DIRECT); if (tmp != count_exp) @@ -14403,7 +14414,7 @@ ix86_expand_setmem (rtx dst, rtx count_e return 0; gcc_assert (alg != no_stringop); if (!count) - count_exp = copy_to_mode_reg (GET_MODE (count_exp), count_exp); + count_exp = copy_to_mode_reg (counter_mode (count_exp), count_exp); destreg = copy_to_mode_reg (Pmode, XEXP (dst, 0)); switch (alg) { @@ -14446,11 +14457,9 @@ ix86_expand_setmem (rtx dst, rtx count_e promoted_val = promote_duplicated_reg_to_size (val_exp, size_needed, desired_align, align); /* Ensure that alignment prologue won't copy past end of block. */ - if ((size_needed > 1 || (desired_align > 1 && desired_align > align)) - && !count) + if (size_needed > 1 || (desired_align > 1 && desired_align > align)) { epilogue_size_needed = MAX (size_needed - 1, desired_align - align);
/* Epilogue always copies COUNT_EXP & EPILOGUE_SIZE_NEEDED bytes.
Make sure it is power of 2. */
epilogue_size_needed = smallest_pow2_greater_than (epilogue_size_needed);
@@ -14464,8 +14473,10 @@ ix86_expand_setmem (rtx dst, rtx count_e label = gen_label_rtx (); emit_cmp_and_jump_insns (count_exp, GEN_INT (epilogue_size_needed), - LTU, 0, GET_MODE (count_exp), 1, label);
if (expected_size == -1 || expected_size <= epilogue_size_needed)
LTU, 0, counter_mode (count_exp), 1, label);
if (GET_CODE (count_exp) == CONST_INT)
- ;
predict_jump (REG_BR_PROB_BASE * 60 / 100); else predict_jump (REG_BR_PROB_BASE * 20 / 100); @@ -14475,7 +14486,7 @@ ix86_expand_setmem (rtx dst, rtx count_e rtx hot_label = gen_label_rtx (); jump_around_label = gen_label_rtx (); emit_cmp_and_jump_insns (count_exp, GEN_INT (dynamic_check - 1),else if (expected_size == -1 || expected_size <= epilogue_size_needed)
LEU, 0, GET_MODE (count_exp), 1, hot_label);
LEU, 0, counter_mode (count_exp), 1, hot_label); predict_jump (REG_BR_PROB_BASE * 90 / 100); set_storage_via_libcall (dst, count_exp, val_exp, false); emit_jump (jump_around_label);
@@ -14558,7 +14569,7 @@ ix86_expand_setmem (rtx dst, rtx count_e if (size_needed < desired_align - align) { tmp = - expand_simple_binop (GET_MODE (count_exp), AND, count_exp, + expand_simple_binop (counter_mode (count_exp), AND, count_exp, GEN_INT (size_needed - 1), count_exp, 1, OPTAB_DIRECT); size_needed = desired_align - align + 1;
- References:
- [PATCH, x86_64]: Fix PR target/30778
* From: Uros Bizjak
- [PATCH, x86_64]: Fix PR target/30778
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |