bad double for loop optimization · Issue #8261 · esp8266/Arduino (original) (raw)
Basic Infos
- This issue complies with the issue POLICY doc.
- I have read the documentation at readthedocs and the issue is not addressed there.
- I have tested that the issue is present in current master branch (aka latest git).
- I have searched the issue tracker for a similar issue.
- If there is a stack dump, I have decoded it.
- I have filled out all fields below.
Platform
- Hardware: [ESP-12]
- Core Version: [5f04fbb]
- Development Env: [Arduino IDE]
- Operating System: [Ubuntu]
Settings in IDE
- Module: [Nodemcu]
- Flash Mode: [dio]
- Flash Size: [4MB]
- lwip Variant: [v2 Higher Bandwidth]
- Reset Method: [nodemcu]
- Flash Frequency: [40Mhz]
- CPU Frequency: [80Mhz]
- Upload Using: [SERIAL]
- Upload Speed: [460800] (serial upload only)
Problem Description
With the newer "xtensa-lx106-elf-gcc (GCC) 10.3.0" I have had problems from time to time, that I have dismissed as being an error in my use of Extended ASM. It now appears that may not have been completely true. The MCVE below builds and executes correctly with -O0
or the old "xtensa-lx106-elf-gcc (GCC) 4.8.2" compiler with -O0
or -Os
.
However, for the case of "xtensa-lx106-elf-gcc (GCC) 10.3.0" and -Os
it generates bad code that does not properly update the array as it executes a double for loop.
This MCVE is an extreme minimization of the original bearssl
experiment that was failing for me. There is always a chance I minimized out the failure case and added my own bug. However, I think I got everything cleaned up.
MCVE Sketch
uint16_t x[] = {1, 2, 3};
extern "C" void test(uint16_t *x);
void setup() { Serial.begin(115200); delay(200); Serial.println("\r\nTesting 1, 2, 3");
test(x);
for (size_t i = 0; i < sizeof(x)/sizeof(uint16_t); i++) ets_uart_printf("addr(%p), x%u\n", &x[i], i, x[i]); }
void loop() { }
Avoid the compiler defeating our test and optimizing to an answer by place this code with sketch In a separate .c
file:
#include <Arduino.h> #include <pgmspace.h> // make sure dependencies catch debug edits to file
#if 1 // Used this one to get a simplified ASM for inspection #undef pgm_read_word #define pgm_read_word get_uint16
static inline attribute((always_inline)) uint16_t get_uint16(const uint16_t *p16) { uint32_t val; asm volatile ("l16ui %[val], %[base], 0;" :[val]"=r"(val) :[base]"r"(p16) :); return val; }
#elif 0 // Extended ASM is not an issue, this will also fail. #undef pgm_read_word #define pgm_read_word get_uint16
static inline attribute((always_inline)) uint16_t get_uint16(const uint16_t p16) { uint32_t val = ((uint32_t *)((uintptr_t)p16 & ~0x3u)); uint32_t pos = ((uintptr_t)p16 & 0x3u) * 8u; val >>= pos; return (uint16_t)val; }
#else // Use PROGMEM, the original context the failure was observed with. #endif
// #pragma GCC optimize("O0") // Works #pragma GCC optimize("Os") // Fails void test(uint16_t *x) { size_t len = 3; for (size_t u = 0; u < len; u++) { uint16_t x1 = pgm_read_word(&x[0]); for (size_t v = 0; v < len; v++) { x[v] = pgm_read_word(&x[v]) + x1; } } }
Debug Messages
Sketch output when working will look something like this:
Testing 1, 2, 3
addr(0x3ffe85c8), x[0](0x0008)
addr(0x3ffe85ca), x[1](0x0009)
addr(0x3ffe85cc), x[2](0x000A)
Sketch failing output will look something like this (-Os
):
Testing 1, 2, 3
addr(0x3ffe85c8), x[0](0x0002)
addr(0x3ffe85ca), x[1](0x0003)
addr(0x3ffe85cc), x[2](0x0004)
Working code generated under Arduino ESP8266 Core 2.7.4
xtensa-lx106-elf-gcc (GCC) 4.8.2 with -Os
option
Disassembly of section .text.test:
00000000 <test>:
0: 340c movi.n a4, 3
2: 001272 l16ui a7, a2, 0
5: 030c movi.n a3, 0
7: f47070 extui a7, a7, 0, 16
a: 523a add.n a5, a2, a3
c: 001562 l16ui a6, a5, 0
f: 676a add.n a6, a7, a6
11: 005562 s16i a6, a5, 0
14: 332b addi.n a3, a3, 2
16: f06366 bnei a3, 6, a <test+0xa>
19: 440b addi.n a4, a4, -1
1b: fe3456 bnez a4, 2 <test+0x2>
1e: f00d ret.n
Failing code generated under Arduino ESP8266 Core current master
xtensa-lx106-elf-gcc (GCC) 10.3.0 with -Os
option
Disassembly of section .text.test:
00000000 <test>:
0: 822b addi.n a8, a2, 2
2: 04c272 addi a7, a2, 4
5: 03a062 movi a6, 3
8: 001232 l16ui a3, a2, 0
b: f43030 extui a3, a3, 0, 16
e: 001252 l16ui a5, a2, 0
11: 535a add.n a5, a3, a5
13: f45050 extui a5, a5, 0, 16
16: 001842 l16ui a4, a8, 0
19: 434a add.n a4, a3, a4
1b: f44040 extui a4, a4, 0, 16
1e: 001792 l16ui a9, a7, 0
21: 339a add.n a3, a3, a9
23: 660b addi.n a6, a6, -1
25: f43030 extui a3, a3, 0, 16
28: fdc656 bnez a6, 8 <test+0x8>
2b: 005252 s16i a5, a2, 0
2e: 015242 s16i a4, a2, 2
31: 025232 s16i a3, a2, 4
34: f00d ret.n