bad double for loop optimization · Issue #8261 · esp8266/Arduino (original) (raw)

Basic Infos

Platform

Settings in IDE

Problem Description

With the newer "xtensa-lx106-elf-gcc (GCC) 10.3.0" I have had problems from time to time, that I have dismissed as being an error in my use of Extended ASM. It now appears that may not have been completely true. The MCVE below builds and executes correctly with -O0 or the old "xtensa-lx106-elf-gcc (GCC) 4.8.2" compiler with -O0 or -Os.
However, for the case of "xtensa-lx106-elf-gcc (GCC) 10.3.0" and -Os it generates bad code that does not properly update the array as it executes a double for loop.

This MCVE is an extreme minimization of the original bearssl experiment that was failing for me. There is always a chance I minimized out the failure case and added my own bug. However, I think I got everything cleaned up.

MCVE Sketch

uint16_t x[] = {1, 2, 3};

extern "C" void test(uint16_t *x);

void setup() { Serial.begin(115200); delay(200); Serial.println("\r\nTesting 1, 2, 3");

test(x);

for (size_t i = 0; i < sizeof(x)/sizeof(uint16_t); i++) ets_uart_printf("addr(%p), x%u\n", &x[i], i, x[i]); }

void loop() { }

Avoid the compiler defeating our test and optimizing to an answer by place this code with sketch In a separate .c file:

#include <Arduino.h> #include <pgmspace.h> // make sure dependencies catch debug edits to file

#if 1 // Used this one to get a simplified ASM for inspection #undef pgm_read_word #define pgm_read_word get_uint16

static inline attribute((always_inline)) uint16_t get_uint16(const uint16_t *p16) { uint32_t val; asm volatile ("l16ui %[val], %[base], 0;" :[val]"=r"(val) :[base]"r"(p16) :); return val; }

#elif 0 // Extended ASM is not an issue, this will also fail. #undef pgm_read_word #define pgm_read_word get_uint16

static inline attribute((always_inline)) uint16_t get_uint16(const uint16_t p16) { uint32_t val = ((uint32_t *)((uintptr_t)p16 & ~0x3u)); uint32_t pos = ((uintptr_t)p16 & 0x3u) * 8u; val >>= pos; return (uint16_t)val; }

#else // Use PROGMEM, the original context the failure was observed with. #endif

// #pragma GCC optimize("O0") // Works #pragma GCC optimize("Os") // Fails void test(uint16_t *x) { size_t len = 3; for (size_t u = 0; u < len; u++) { uint16_t x1 = pgm_read_word(&x[0]); for (size_t v = 0; v < len; v++) { x[v] = pgm_read_word(&x[v]) + x1; } } }

Debug Messages

Sketch output when working will look something like this:

  Testing 1, 2, 3
  addr(0x3ffe85c8), x[0](0x0008)
  addr(0x3ffe85ca), x[1](0x0009)
  addr(0x3ffe85cc), x[2](0x000A)

Sketch failing output will look something like this (-Os):

  Testing 1, 2, 3
  addr(0x3ffe85c8), x[0](0x0002)
  addr(0x3ffe85ca), x[1](0x0003)
  addr(0x3ffe85cc), x[2](0x0004)

Working code generated under Arduino ESP8266 Core 2.7.4
xtensa-lx106-elf-gcc (GCC) 4.8.2 with -Os option

Disassembly of section .text.test:

00000000 <test>:
   0:	340c                	movi.n	a4, 3
   2:	001272               	l16ui	a7, a2, 0
   5:	030c                	movi.n	a3, 0
   7:	f47070               	extui	a7, a7, 0, 16
   a:	523a                	add.n	a5, a2, a3
   c:	001562               	l16ui	a6, a5, 0
   f:	676a                	add.n	a6, a7, a6
  11:	005562               	s16i	a6, a5, 0
  14:	332b                	addi.n	a3, a3, 2
  16:	f06366               	bnei	a3, 6, a <test+0xa>
  19:	440b                	addi.n	a4, a4, -1
  1b:	fe3456               	bnez	a4, 2 <test+0x2>
  1e:	f00d                	ret.n

Failing code generated under Arduino ESP8266 Core current master
xtensa-lx106-elf-gcc (GCC) 10.3.0 with -Os option

Disassembly of section .text.test:

00000000 <test>:
   0:	822b                	addi.n	a8, a2, 2
   2:	04c272               	addi	a7, a2, 4
   5:	03a062               	movi	a6, 3
   8:	001232               	l16ui	a3, a2, 0
   b:	f43030               	extui	a3, a3, 0, 16
   e:	001252               	l16ui	a5, a2, 0
  11:	535a                	add.n	a5, a3, a5
  13:	f45050               	extui	a5, a5, 0, 16
  16:	001842               	l16ui	a4, a8, 0
  19:	434a                	add.n	a4, a3, a4
  1b:	f44040               	extui	a4, a4, 0, 16
  1e:	001792               	l16ui	a9, a7, 0
  21:	339a                	add.n	a3, a3, a9
  23:	660b                	addi.n	a6, a6, -1
  25:	f43030               	extui	a3, a3, 0, 16
  28:	fdc656               	bnez	a6, 8 <test+0x8>
  2b:	005252               	s16i	a5, a2, 0
  2e:	015242               	s16i	a4, a2, 2
  31:	025232               	s16i	a3, a2, 4
  34:	f00d                	ret.n