Optimizing object access in C (original) (raw)

November 28, 2025, 6:22pm 1

Hello everyone!
I’m using the latest version of the CLang compiler for the ARM architecture.
The microcontroller I’m compiling for can handle unaligned memory access.
I have code that (without any unnecessary details) does the following:

static unsigned char buffer[32];

static void ReadDataWithDMA(unsigned char *ptr) {
  *(volatile unsigned int *)DMA_BUFFER_READ = (unsigned int)ptr;
}

static void Test(void) {
  ReadDataWithDMA(buffer);

  int x;
  memcpy(&x, &buffer[1], sizeof(x));

  watch_value_in_debugger(x);
}

The contents of all functions are visible to the compiler.

I intentionally use memcpy() to avoid UB related to the limitations of the C language’s aliasing rules.

Furthermore, this method gives the compiler more flexibility in how it copies data from the buffer.
This allows it to easily allow it to use an unaligned load instruction instead of a byte-by-byte copy.

When enabling the -O3 optimization, as well as IDE-specific combinations (-Os, -Obalanced, -Ofast, -Osize), I see that memcpy() actually generates a load instruction from the buffer.

If I change the line in the ReadDataWithDMA() function to:

*(volatile unsigned int *)DMA_BUFFER_READ = ptr[0]; // or simple (void)*ptr;

then the result of compiling memcpy() is replaced with the usual assignment of x to 0.
That is, The compiler understands that ReadDataWithDMA() doesn’t change the buffer’s contents, meaning it has the default value (after initialization == 0).

I’d really like to know what underlying cause is causing such a radically different result?
How does writing the ptr value in ReadDataWithDMA() to a specific memory location affect the compiler’s assumption that this might affect the buffer’s contents?
Is this behavior defined by the C standard, or is it behavior defined by the CLang implementation? Or is it just a coincidence?

If this is behavior defined by the standard or the CLang implementation, that’s very good news for me, since there’s no need to declare buffer as volatile or reference the buffer through a compatible volatile-qualified type at the read location. This will allow a simple memcpy() to use any memory access instruction, and I expect that it will most likely not be a series of byte accesses, but a single 4-byte word load instruction.
If this behavior is purely coincidental, it’s unfortunate, since declaring a buffer or accessing it via a volatile-compatible type (volatile unsigned char *) will cause the compiler to generate suboptimal series of byte loads.

Another question is how to answer these kinds of questions by himself — maybe this is implementation-dependent behavior (in terms of the C standard), and I need to read about it in the documentation?

zsrkmyn December 1, 2025, 3:34am 2

Note that (unsigned int)ptrand ptr[0]are different.

The former one ‘captures’ the pointer to an global varibale which may be accessed out of this TU, so the compiler cannot assume the content of the buffer remaining unchanged.

The latter one only dereferences the pointer, and the pointer itself isn’t captured, and as it’s static, so compiler knows that the content of the buffer will never be modified.

arlleex December 1, 2025, 6:12pm 3

If a buffer is considered modifiable by pointer, at what points should the compiler assume this when accessing the buffer? Between sequence points? Experimental evidence shows this is not the case.

zsrkmyn December 2, 2025, 6:25am 4

Between 2 calls of Test().

arlleex December 2, 2025, 6:47am 5

Can you tell me where this behavior is documented? I understand that a pointer accessible to something outside the scope can change data - but it can do so in a very asynchronous manner. In that case, the compiler should (seemingly) adhere to volatile semantics, but this seems somewhat at odds with both common sense and reality. The “between Test() calls” option seems very likely, but I’d like to see some documentation of this assertion.

arlleex December 2, 2025, 7:09am 6

I’d also like to point out that when accessing the buffer contents for subsequent writing to the debug address of a volatile object (MMIO register) (to avoid optimizations that would discard the value), for each line instructions are generated to access the buffer memory: Compiler Explorer

The code under the link is slightly different from the test one, but the essence is the same.

This seems odd to me. You can’t optimize writes to volatile memory, so there’s no question about that. But you can optimize reads from a buffer (after all, its contents can’t change!). Isn’t that right?