[RFC] Introduce sentinel pointer value to DataLayout (original) (raw)

Regarding the interaction with nonnull, dereferenceable_or_null, and null_pointer_is_valid, it depends on how much we want to preserve the current semantics of null, which currently represents zero in LLVM.

I think @arichardson made a great point. Rather than introducing a new concept called “sentinel pointer”, using nullptr and deprecating null would make things clearer and less confusing.

Proposed Changes

Interaction with metadata

nonnull and dereferenceable_or_null will be replaced with nonnullptr and dereferenceable_or_nullptr, respectively.

My experience and knowledge in this area are fairly limited, but my understanding is that the key concern behind these attributes is probably not whether the pointer literally holds a zero value but whether it implies the actual nullptr. If nullptr in address space N is not zero, does it really matter whether a pointer is literally zero? We probably care more about whether it is actually a nullptr in that context.

Interaction with attributes

null_pointer_is_valid is a bit trickier. Based on my understanding, it only applies to address space 0 and is specifically used for the null address. I think we should keep the name but adjust its semantics to refer to nullptr instead of null.

I expect this change will not have any actual effect on existing code because:

  1. We always initialize the pointer specification for address space 0, and the default sentinel value is 0, unless it is override by data layout string.
  2. All existing upstream LLVM targets currently use 0 for nullptr.

Handling Legacy Attributes

If I remember correctly, we recently removed nocapture and replaced it with captures(...). How do we currently handle cases where we encounter the old nocapture attribute? @nikic

An “Easier” Alternative

The proposal above is to avoid confusion by replacing a “deceiving” terminology with clearer alternatives. However, as @nikic and @arichardson pointed out, we could also modify the semantics of existing terms instead of introducing new ones.

A more straightforward approach would be to redefine the meaning of null pointer across LLVM to represent the actual nullptr in its corresponding address space, while still keeping the null spelling.

We will still introduce the new nullptr representation in DataLayout, ensuring each address space has a well-defined nullptr valuek, but we modify the semantics of null to match the nullptr value defined for each address space.

This is an enhancement to the existing approach and doesn’t make things worse, even if we redefine null to mean nullptr. In most places, LLVM already avoids assuming null is always 0, though there are exceptions, such as this bug.

For handling ConstantPointerNull, we first replace all existing uses of ConstantPointerNull with Constant::getNull(PtrTy), and then put back ConstantPointerNull only in contexts where the pointer is not intended to represent a literal zero.

After making this change, we can safely assume that null represents nullptr in all contexts, and use it for futher development.

What do you think? @arsenm @arichardson @nikic