What is the correct way to pass multidimensional arrays into a function and declare them? (original) (raw)

October 13, 2024, 12:27pm 1

Hey, everybody. I am writing my own simple programming language and faced the following situation.

@test.1 = private unnamed_addr constant [4 x i8] c"%i\0A\00", align 1

declare i32 @printf(ptr, ...)

define i64 @main() {
entry:
  %array = alloca [3 x [3 x i64]], align 8
  store [3 x [3 x i64]] [[3 x i64] [i64 0, i64 1, i64 3], [3 x i64] [i64 4, i64 5, i64 6], [3 x i64] [i64 7, i64 8, i64 9]], ptr %array, align 4
  %calltmp = call i64 @test(ptr %array)
  %calltmp1 = call i32 (ptr, ...) @printf(ptr @test.1, i64 %calltmp)
  ret i64 0
}

define i64 @test(ptr %0) {
entry:
  %gepus = getelementptr i64, ptr %0, i64 8
  %1 = load i64, ptr %gepus, align 4
  ret i64 %1
}

In main I declare a two-dimensional array and pass it to the arguments of my test function. In test I tried to take both a pointer and a regular LLVMArrayType as an argument, but as a result the array seems to become one-dimensional, despite the fact that if you address it directly without passing the function it remains two-dimensional.

Could you please tell me what I did wrong and how to correctly create and pass multidimensional arrays to a function in LLVM IR (LLVM C++ API)?

kparzysz October 14, 2024, 1:29pm 2

What the main function is passing to test is actually the address of the array. In LLVM IR pointers do not carry type information (they used to, but not anymore). When the test function wants to access an element of that array, all it needs is the type of the element (loads and stores do have types), and the offset of it from the beginning of the array. The block of memory that the initial pointer points to is still the storage for the multi-dimensional array, but how that offset is calculated is not important as long as it’s correct. If your frontend generates a gep with the original array type (in test), it can still be optimized into something like the above.

REDGAR October 14, 2024, 5:11pm 3

Is there any way to pass this type? Or can you please tell me how to implement the offset calculation in this case?

kparzysz October 14, 2024, 5:34pm 4

You can just create the gep with the original array type, e.g.

%gepus = getelementptr [3 x [3 x i64]], ptr %0, i64 0, i64 0, i64 1
%1 = load i64, ptr %gepus, align 4

REDGAR October 14, 2024, 6:15pm 5

Thanks for the answers! I already understand how to do it in case the array will be given a strictly defined size, but what to do in case I want to pass a two-dimensional array of variable size to the function, how do I get this size? I can try to add additional arguments with information about the array size to the function, but as if it would be one big crutch? So I was wondering how it is usually implemented in LLVM C++ API?

kparzysz October 14, 2024, 7:53pm 6

The LLVM types are static, there is no single type that will represent an n-dimensional array with variable extents.

If you want to allow dynamic array extents, you’d need to represent the type as a run-time object. For example {i32, ptr}, where the i32 represents the number of dimensions, and the ptr is the address of an array with the extents for each dimension.

Once you have that, then you need to pair the address of your array with the type object—these two together will allow you to access any element. You’d then need to generate code that walks over the type objects and calculates the element offset for a given set of coordinates.

REDGAR October 14, 2024, 9:24pm 7

I roughly understand the idea, but it would be great if you could show an example on LLVM IR or LLVM C++ API (preferably), so that I could come to the solution of my problem sooner. By the way, in the {i32, ptr} structure under the pointer in the case of a multidimensional array will be the same structure, am I correct in my assumption?
Thanks for understanding!