[RFC] Should We Restrict the Usage of 0-D Vectors in the Vector Dialect? (original) (raw)

December 9, 2024, 11:26am 1

Tl;Dr

This RFC proposes an experiment to restrict some vector dialect operations (vector.insert and vector.extract) to require non-0D vectors or scalars. This does not propose changes to VectorType.

Motivation

0-D vectors introduce ambiguity in the representation of scalar values:

f32 vs vector<f32> vs vector<1xf32>.

While vector<1xf32> aligns with the semantics of 1D vectors as containers with at least one element, vector<f32> and f32 both (effectively) represent a scalar element, raising the question:

Which representation should be used whenever creating an Op that accepts both?

Restricting 0-D vectors in certain contexts would help reduce ambiguity and enforce clearer semantics in the Vector dialect.

Proposed Experiment

I propose restricting the use of 0-D vectors in vector.extract and vector.insert. Specifically:

Allow (the non-indexed type is f32 (*)):

%0 = vector.extract %src[0, 0] : f32 from vector<2x2xf32>
%1 = vector.insert %v, %dst[0, 0] : f32 into vector<2x2xf32>

Disallow (the non-indexed type is vector<f32>):

%0 = vector.extract %arg0[0, 0] : vector<f32> from vector<2x2xf32>
%1 = vector.insert %v, %dst[0, 0] : vector<f32> into vector<2x2xf32>

This restriction would simplify semantics and clarify the role of 0-D vectors in the dialect without broader changes to VectorType.

(*) What’s non-indexed type? See the taxonomy proposed by @dcaballe in [mlir][vector] Add verification for incorrect vector.extract by dcaballe · Pull Request #115824 · llvm/llvm-project · GitHub.

Context

Vector Dialect

The Vector dialect has not fully embraced 0-D vectors:

AnyVectorOfAnyRank is used 24 times in VectorOps.td, while
AnyVectorOfNonZeroRank is used 26 times.

Also, vector.mask explicitely avoids 0-D vectors:

At this point, 0-D vectors are not supported by vector.mask.

This suggests that the usage of 0-D vectors is already limited and might remain so without significant impact.

SPIR-V

SPIR-V is an important consumer of the Vector dialect. Note that (from SPIR-V docs):

SPIR-V only supports vectors of 2/3/4 elements;

(see also spirv::CompositeType::isValid). This restriction aligns well with the proposed changes and should not impact SPIR-V usage.

LLVM

Unsurprisngly, lowering to LLVM requires converting 0-D vectors to something else. From LLVMTypeConverter::convertVectorType:

/// * 0-D vector<T> are converted to vector<1xT>

Let’s look at a specific example:

func.func @shuffle(%arg0: vector<f32>) -> vector<3xf32> {
  %1 = vector.shuffle %arg0, %arg0 [0, 1, 0] : vector<f32>, vector<f32>
  return %1 : vector<3xf32>
}

Compiling that to AArch64 gives:

shuffle:                                // @shuffle
        dup     v0.4s, v0.s[0]
        ret

Basically, a 0-D vector was converted to an “element” within a 4-element vector (v0.s[0]). Here’s a Compiler Explorer link for you:

Compiler Explorer (and the dup instruction).

Similar example with vector.extract and vector.insert fails to lower to LLVM, see Compiler Explorer. I guess that’s a manifestation of how poorly 0-D vectors are supported.

In any case, from the point of view of the LLVM dialect, we could just replace 0-D vectors with 1-D vectors. Importantly, restricting vector.extract/vector.insert shouldn’t make much difference here.

Why Do We Have 0-D Vectors?

The Vector Dialect docs list 2 benefits of 0-D vectors:

The benefit of 0D vectors, tensors, and memrefs is that they make it easier to lower code from various frontends such as TensorFlow

Given that the main source of Vector Ops is the Linalg vectorizer, which we control, the restriction proposed here is unlikely to be a problem for frameworks like Tensorflow.

and make it easier to handle corner cases such as unrolling a loop from 1D to 0D.

Interestingly, ConvertVectorToSCF is the only transformation that I was able to find that “cares” about 0-D vectors. It’s unclear to me whether restricing vector.extract/vector.insert would be a problem here? Wouldn’t replacing vector<f32> with f32 be equally good?

For more context, here’s the original RFC that introduced 0-D vectors:

https://discourse.llvm.org/t/should-we-have-0-d-vectors

Previous discussions

Recent discussions related to 0-D vectors:

Additionally, see this related RFC on replacing vector.extractelement and vector.insertelement with vector.extract and vector.insert:

https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops

Note: This RFC will be coordinated with the one above to ensure progress on one does not block the other.

Next Steps

Restrict 0-D vectors in vector.extract and vector.insert for scalar extraction.
Observe the impact over several months to determine whether we can further limit 0-D vectors or if they remain essential.
Use this experiment to clarify and enforce stricter semantics around 0-D vectors in the Vector dialect.

Acknowledgments

Special thanks to @dcaballe and @Groverkss for their efforts to improve these areas

Thank you,
-Andrzej

I think we need to have two separate conversations.

Support for 0-D vectors in specific operations.
General support for 0D vectors.

I am strongly in favor of supporting 0D vectors ubiquitously. This makes the vector dialect consistent without unnecessary type changes. Some reasons for this

Tensor dialect supports 0D vectors. Since tensor → vector is one important path of targeting vector dialect, suddenly having vector dialect not support 0D vector seems like a gap.
Not having 0D vectors seems like a gap in “completeness” of vector dialect. If you have a rank-reducing vector operation like vector.extract is, then not support 0D vectors seems like ill-defined spec to me. It adds complication everywhere vector dialect is used. Any transformation involving vector.extract will have to check for scalar type or vector type and handle it appropriately. Wtih vector of zero rank, all types would be vector types, and zero-rank is just the degenerate case.
It is true that LLVM for example does not handle 0-rank vectors. This can be easily handled during lowering to LLVM (which I think it does already). You can lower a vector of zero-rank to scalar during the conversion.

So in this sense I am -1 on having vector.extract/vector.insert disallowing 0D vectors. It would also then be redundant with vector.extract_element/vector.insert_element while I think they serve different purposes. The former is slicing/inserting a vector from/to another vector, while the latter is extracting/inserting an element from/into a vector.

Thank you for reviewing and providing detailed feedback, @MaheshRavishankar!

Agreed. This RFC is intentionally limited to the first point.

My overarching goal is to improve the Vector dialect. Whether we ultimately decide to support or restrict 0-D vectors, there’s significant work needed to clarify semantics and reduce complexity. There’s already been great progress, so thanks to @Groverkss for these contributions:

And @dcaballe has been tackling the unfortunate vector.extract/vector.insert verifier (thank you!):

[mlir][vector] Add verification for incorrect vector.extract by dcaballe · Pull Request #115824 · llvm/llvm-project · GitHub

Despite this progress, the sheer number of edge cases remains a maintenance burden. To my knowledge, 0-D vector support does not address any compelling practical needs to justify this overhead.

To clarify, my proposal only restricts the non-indexed argument case. For instance, this remains valid:

%0 = vector.extract %src[] : f32 from vector<f32>
%1 = vector.insert %v, %dst[] : f32 into vector<f32>

Does the Tensor dialect’s support for 0-D tensors mean we must support 0-D vectors universally in Vector?
This RFC does not propose removing 0-D vector support entirely. The scope is far more limited.

I disagree My experience suggests that supporting 0-D vectors introduces more complexity than not supporting them.

That said, I recognize this is a nuanced topic. 0-D vectors address some challenges but create others, such as the ambiguity described in my original post. The goal here is to start defining and enforcing semantics around 0-D vectors to better understand their role.

True, but conversely, there’s also substantial code checking:

if (vectorType.getRank() == 0)

This wouldn’t be there without 0-D vectors

Now, I do agree with you and your concerns. But I just have a feeling that we will discover that in practice this isn’t such an issue. The whole point of labelling this as an experiment is check and understand the impact. Right now I feel that this is my gut feeling versus yours

Indeed, and I referenced the relevant LLVM code in my post (and provided an e2e example lowering to assembly). However, the point stands: Introducing 0-D vectors in one place necessitates their elimination elsewhere.

Note that vector.extractelement and vector.insertelement are being removed:

https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/

As mentioned in my RFC:

Final thoughts:

This RFC is not about removing 0-D vectors.
- It’s about understanding where they solve problems and assessing whether their complexity is justified.
- The patchy current support makes it hard to evaluate their role properly.
Ubiquitous support for 0-D vectors is neither good nor necessary, IMO.
- However, to explore the problem space, we need to experiment.

Thank you
-Andrzej

If I had a dimension for every conclusive outcome of a debate about generalizing to 0-D, I’d have… Zero dimensions. Ba-dum-bum

In all seriousness, I have no horse in this particular race, but I’ve been around long enough to know that there is no right answer here – only what can be lived with. And I’ve seen enough arcs on this topic to know that the net complexity and special casing is about the same either way (although sometimes you get to ignore that if you export the special casing to a layer above or below, depending on your perspective).

Thanks to the folks cleaning this up. Success here is everyone feeling slightly put out but the IR being locally consistent and well defined without having exported a special casing issue to a neighbor which cannot be well handled there.

sjarus December 10, 2024, 4:47pm 5

We recently had this conversation around TOSA for the intended v1.0 specification and dialect release that’ll happen in the very near future.

tl;dr - the decision was to keep support for rank 0.

There were a significant number of framework carried constructs that a dim-sensitive, e.g. numpy.dot, requiring tensor ranks to be faithfully expressed downstream.

Type promotion in this case would error out or silently emit numerical mismatches. While they could be mitigated with type promotion at legalization time, the tosa.custom trapdoor would leave us unable to fully address that. We’ll have more details in the rationale document update as part of the RFC process for the 1.0 release.

As a result I’m inclined to agree with @MaheshRavishankar here, but I recognize that this topic took a lot of time to bottom out and has plenty of pitfalls.

dcaballe December 10, 2024, 7:28pm 6

You beat me to it! Thanks for the RFC and for nudging me to speak up .

A few quick points I wanted to make:

The motivation behind this change is not based on personal preference but on years (?) of dealing with instability and pain. Currently, there’s a mix of operations that accept and do not accept 0-D vectors. I want to emphasize accept because that doesn’t mean they properly support them. Most fixes over the years have been about bailing out for 0-D vectors rather than offering proper support (and with this I really don’t want to undermine the great work that @Groverkss has been doing lately!).
This change aims to remove redundancy in the IR, which has a negative impact all over the Vector dialect code. While from a theoretical standpoint having a representation for a scalar in the vector domain might be interesting, supporting that has proven far too costly. We need to make a practical decision here. Again, this is not creating a gap in the representation since we would still have scalars and single-element vectors (e.g., vector<1xf32>) available in the Vector dialect.
My understanding is that the concept of 0-D vectors is inherited from the tensor world, where some frameworks may want to work exclusively with tensors and not scalars. I may also understand that some frameworks we want to do the same for vectors. This proposal shouldn’t prevent that as it’s not removing 0-D vectors from the vector type and the lowering of a 0-D vector to a scalar or single-element vectors would still be possible when getting into the Vector dialect. This proposal only tightens the constraints within the internal Vector dialect layer.
Vectors have always gone hand-in-hand with scalars and operations to cross the scalar/vector boundary (e.g., vector insert, extract, broadcast, bitcast, horizontal reduction, …) have been there from day one. This approach has proven successful for many years.

Strong +1 from me. We tried, it didn’t work, we should learn from it and move forward to a better state. IMO, this pragmatic call has been long overdue.

Thanks!

Correction: all frameworks. Every present day lowering path to vector has to do something sane with 0-D tensors when vectorizing. They are an intrinsic part of the programming model. I do just want to underline that this aspect cannot just be ignored or left to chance as that just shifts the debt. I think if you all work incrementally and keep that part in view, it likely works out. But I’m extending trust here that in the zeal to reduce the trauma of the 0-D vector years (I feel like there could be a support group for this), this part be kept in view and not just theoretically solved for. As you say, there are some plausible ways to do it… Just make sure they are sound and actually work at each step.

Agreed.

That said, I believe we can still achieve this while making the Vector dialect more opinionated about 0-D vectors. Given that the Linalg Vectorizer remains the primary path towards Vector, this adjustment affects a relatively small and self-contained area, making it feasible to update, test, and refine as needed.

Absolutely. I agree that incremental progress is essential here. Reaching a state where every operation has well-defined and consistently enforced semantics will take a series of smaller steps, and this RFC represents one of those steps.

Fly-by comment as I catch up on unread stuff: the RFC explicitly mentions.

There are a bunch of tradeoffs involved that I don’t think are
worth unpacking just yet if there is a simple consensus that can
emerge.

Consensus was immediate so we did not go into tradeoff discussion.

My gut feeling based on old memories is you will find many fun places where you’ll have to check for vector<1xf32> and treat it as is it were vector<f32> for the purpose of getting your multi-dimensional indexings right. You may even get into ambiguities IIRC. This will occur on every vector operation.

In contrast, the current points of weirdness are:

NYI support for vector<t>, there are quite a few, they are expected to be simple starter tasks
explicitly have to think about vector<t> vs vector<1xt> (i.e. in conversion to LLVM)

IMO the key aspect to consider is that the cognitive overhead of point 2. will propagate everywhere in MLIR and to everyone who uses it vs at the boundary of conversion to LLVM atm.

TL;DR my recollection is this will simply displace problems in other places that will much more scattered.

Regarding this pont:

I appreciate the intent but I see that in [mlir][vector] Add verification for incorrect vector.extract by dcaballe · Pull Request #115824 · llvm/llvm-project · GitHub we now allow:

vector.insert %arg1, %arg0[0, 0, 0] : vector<1xf32> into vector<4x8x3xf32>

and the additional code that goes with it:

    bool isSingleElem1DNonIndexedVec =
        (nonIndexedRank == 1 && nonIndexedVecType.getDimSize(0) == 1);
    bool isSingleElem1DIndexedVec =
        (indexedRank == 1 && indexedType.getDimSize(0) == 1);
    // Verify 0-D -> single-element 1-D supported cases.
    if ((indexedRank == 0 && isSingleElem1DNonIndexedVec) ||
        (nonIndexedRank == 0 && isSingleElem1DIndexedVec)) {
      return op->emitOpError("expected source and destination vectors with "
                             "different number of elements");
    }
    // Verify indices for all the cases.
    int64_t indexedRankMinusIndices = indexedRank - numIndices;
    if (indexedRankMinusIndices != nonIndexedRank &&
        (!isSingleElem1DNonIndexedVec || indexedRankMinusIndices != 0)) {
      return op->emitOpError()
             << "expected " << indexedStr
             << " vector rank minus number of indices to match the rank of the "
             << nonIndexedStr << " vector";
    }

This is the problem displacement I am talking about, that I suspect will propagate everywhere.

Possibly - if we were to completely ban 0-D vectors. However, IMO, we shouldn’t go that far, not today (perhaps never). This proposal is much smaller in scope.

But I appreciate that you will know the context much better than me. I’m hoping that through this RFC we could identify specific pain points to address.

That’s effectively the situation today, and the idea is to reduce these ambiguities.

Note that the referenced PR hasn’t been agreed upon or merged yet - there’s been a long discussion leading up to this RFC. The goal is to gather feedback to inform the design and implementation.

By the way, I agree that the burden of enforcing constraints should be limited to Vector and, for example, the Linalg vectorizer, which serves as the main entry point to the Vector dialect. Are there any other areas we should consider?

Now, let me share one specific “vectorization” example with 0-D tensor.

Example: Vectorization with 0-D tensors

Before vectorization

func.func @generic_0d(%arg0: tensor<f32>, %arg1: tensor<f32>,
                      %arg2: tensor<f32>) -> tensor<f32> {
vector<f32>, tensor<f32>
  %res = linalg.generic {
    indexing_maps = [#map, #map, #map],
    iterator_types = []
  } ins(%arg0, %arg1 : tensor<f32>, tensor<f32>)
    outs(%arg2 : tensor<f32>) {
  ^bb(%a: f32, %b: f32, %c: f32) :
    %d = arith.mulf %a, %b: f32
    %e = arith.addf %c, %d: f32
    linalg.yield %e : f32
  } -> tensor<f32>

  return %res : tensor<f32>
}

After vectorization

  func.func @generic_0d(%arg0: tensor<f32>, %arg1: tensor<f32>, %arg2: tensor<f32>) -> tensor<f32> {
    %cst = arith.constant 0.000000e+00 : f32
    %0 = vector.transfer_read %arg0[], %cst : tensor<f32>, vector<f32>
    %1 = vector.extract %0[] : f32 from vector<f32>
    %2 = vector.transfer_read %arg1[], %cst : tensor<f32>, vector<f32>
    %3 = vector.extract %2[] : f32 from vector<f32>
    %4 = vector.transfer_read %arg2[], %cst : tensor<f32>, vector<f32>
    %5 = vector.extract %4[] : f32 from vector<f32>
    %6 = arith.mulf %1, %3 : f32
    %7 = arith.addf %5, %6 : f32
    %8 = vector.broadcast %7 : f32 to vector<f32>
    %9 = vector.transfer_write %8, %arg2[] : vector<f32>, tensor<f32>
    return %9 : tensor<f32>
  }

In this example, 0-D tensors are supported while keeping vector.extract constrained. This demonstrates that it should be safe to restrict vector.extract without sacrificing support for 0-D tensors…

Thank you all for your comments so far. I appreciate that this is a busy season for MLIR RFCs.

-Andrzej

rengolin December 11, 2024, 12:47pm 11

I fully support this proposal.

I also fully support a (potential) future proposal to remove support for 0D shaped types. This can begin with changing the semantics of AnyVectorOfAnyRank to mean non-zero and see what breaks. My guess is that nothing that isn’t already broken will crash.

Agreed.

Thank you all for picking this up. I meant to look into it last year and completely forgot about it. My findings back then were exactly the same as yours.

[Edited to remove unnecessary harsh wording, expanded my points in the comment below.]

rengolin December 11, 2024, 1:01pm 12

Honest question: isn’t this the job of frameworks to lower into a common (lower) representation for transformation?

I see one-time costs to:

changing all lowering code to add [oneAttr] to the builder of every 0D shaped type lowering to 1D (or just lower a scalar). (ex. torch-mlir et al)
making sure this propagates through transforms correctly (what @banach-space seems to be proposing)

But I don’t have visibility (lack of experience in that area) to know if lowering those 0D shapes from all frameworks to scalar/1D would break the frameworks’ expectations. This is slightly different than break the actual transform (which can still be valid, but not implement the original code’s expectation).

My point just now was that, without further knowledge on the use cases, shaped<type> is equivalent to either type or shaped<1xtype> for all purposes, and the main cost is code complexity in converting them to the right ones at the right time. The main cost savings, though, as Andrzej points out, is removing the code that checks for both in either cases in other places.

My claim that this is a reasonable trade-off is because the code that lowers stuff is simple and mechanical, and where the code semantics really is (vector doesn’t know about XLA), and it’s ok to be a bit bloated (as long as it’s correct). But transform code is hard to understand and it needs to apply to all patterns, and not create special cases for ambiguous types (it doesn’t know if that shaped<type> is a scalar or a 1D at that point anymore).

rengolin December 11, 2024, 2:39pm 13

Had a long discussion with @nicolasvasilache and wanted to follow up on our agreement here.

First, I only “fully support” this work because I’m biased to want to get rid of shaped<type> types in the first place. That’s why I went on that tangent above, and only after talking to Nico that I realized that.

As Nico put it, “if we just remove the ability of tensor.insert and tensor.extract to handle 0D shapes, it will break all of the other things that still support them”, as he exposed above with the code that checks for their equivalence. I fully agree here.

So, if we want to start removing support for 0D shapes in some operations, we need to look at the impact in all the other operations. Even if we mean to carve a section out, we need to make sure that the section we remove is fully independent and will not add complexity to the other ops.

Second, we need to balance what is the cost of adding those checks versus removing the current checks. It may be worth increasing some code, but only if this is an interim state and if the proposers are fully aware and committed to reaching the final goal, where there is less bloat than there is today.

Also, the proposers need to work with the folks raising issues (Mahesh, Stella, others) to make sure the use actually does not impact external projects in a way that they are not able to work on.

Finally, I still think 0D tensors are the wrong abstraction for what we use it for and would like to fix it, but not in this thread.

Sorry for the noise, I hope it’s clear now what I meant.

Groverkss December 11, 2024, 2:49pm 14

I think this RFC is overall in the right direction, but I’m -1 on the change proposed. I’m going to list a mental model of the current state of vector dialect I have and based on that talk about the proposed change.

General vector dialect support for 0-d vectors

The problem with 0-d vector support in vector dialect isn’t 0-d vectors themselves, it’s how operations handle them as special cases. Generally, you can split vector dialect operations into 3 categories:

Category 1: Operations defined on N-D vector space
Category 2: Operations defined on 1-D vectors, extended to N-D vectors by
treating them as a “stack of 1-D vectors”.
Category 3: Operations defined on 1-D vectors (usually mapping to llvm/spirv instrinsics)

Note that Category 2 operations are a restriction over Category 1, and Category 3 operations are a restriction on Category 2 operations. An operation defined as Category 1 can be used as a Category 2 or Category operation, but the other way around is not true.

Operations in Category 1 require 0-D vectors to be defined properly, since they work on a N-D space. Treating 0-D vectors as scalars for these operations is special casing and causes multiple bugs ([mlir][Vector] Add support for 0-d shapes in extract-shape_cast folder by Groverkss · Pull Request #116650 · llvm/llvm-project · GitHub, [mlir][Vector] Fix vector.insert folder for scalar to 0-d inserts by Groverkss · Pull Request #113828 · llvm/llvm-project · GitHub, [mlir][Vector] Support 0-d vectors natively in TransferOpReduceRank by Groverkss · Pull Request #112907 · llvm/llvm-project · GitHub).

Operations in Category 2 simply do not support 0-D vectors by definition, and should not have 0-D vector support.

Operations in Category 3 should decide 0-D vector support or not based on what intrinsic they are targetting.

Treating a Category 2 operation as Category 1 generally leads to abstraction mismatch and bugs.

I’m going to give some examples of some operations and show how every operation can be grouped into these 3 categories, and that operations falling in Category 1 need 0-d vectors to be defined properly and operations in Category 2, when trying to behave like Category 1 operations cause problems.

vector.contract

vector.contract is a classic example of Category 1 operation, which is defined on a N-D vector space.

Computes the sum of products of vector elements along contracting dimension pairs from 2 vectors of rank M and N respectively, adds this intermediate result to the accumulator argument of rank K, and returns a vector result of rank K (where K = num_lhs_free_dims + num_rhs_free_dims + num_batch_dims (see dimension type descriptions below)). For K = 0 (no free or batch dimensions), the accumulator and output are a scalar.

The operation needs to special case itself to scalars, because it’s a Category 1 operation, which needs 0-D vectors to be defined properly. This operation should support 0-D vectors and it will reduce special casing and bugs.

Example special casing in vector.contract:

The same logic applies for vector.multi_reduction, vector.transfer_read, vector.transfer_write, … these operations behave on a N-D vector space and require 0-D vectors to be defined properly.

vector.shuffle

vector.shuffle is a classic example of Category 2 operation (and the one shown in your original post having problems). From the docs:

The legality rules are:
- the two operands must have the same element type as the result
- Either, the two operands and the result must have the same rank and trailing
  dimension sizes, viz. given two k-D operands v1 : <s_1 x s_2 x .. x s_k x
  type> and v2 : <t_1 x t_2 x .. x t_k x type> we have s_i = t_i for all 1 < i
  <= k
 - Or, the two operands must be 0-D vectors and the result is a 1-D vector.

...

Examples:

%2 = vector.shuffle %a, %b[3, 2, 1, 0]
           : vector<2xf32>, vector<2xf32>       ; yields vector<4xf32>
%3 = vector.shuffle %a, %b[0, 1]
           : vector<f32>, vector<f32>           ; yields vector<2xf32>

The op needs to be special cased for 0-D vectors, because it falls in Category 2 and is being forced to work with 0-D vectors. It should disallow 0-D vectors, which will reduce bugs and special casing for it.

vector.reduction

From the docs:

Note that these operations are restricted to 1-D vectors to remain close to the corresponding LLVM intrinsics:
LLVM Language Reference Manual — LLVM 20.0.0git documentation

vector.reduction is a classic Category 3 operation. It is meant to target a LLVM intrinsic and should not support 0-D vectors to match the corressponding LLVM intrinsic.

Another good example for such an operation is vector.matrix_multiply and vector.outer_product.

vector.extract / vector.insert

These operations are special. They were originally designed to work as Category 2 operations, but with the addition of 0-D vectors, were extended to be a mix of Category 1 and Category 2 operations. When they act as Category 2 operations (returning scalar instead of a 0-D vector by default, for example) Category 1 operations have to special case and this causes multiple bugs. There is an ambiguity in which Category these operations fall. My understanding is that this RFC is trying to remove this ambiguity, and make it fall into one of these categories, which is a good thing.

However, unlike vector.shuffle, vector.extract/vector.insert are very core to the vector dialect and act as glue for all operations in the dialect. The current RFC is trying to make these operations strictly Category 2, which is why I’m -1 on this RFC. It will mean more bugs and special casing for us on Category 1 operations. We will just have a different set of bugs and the problem will just get displaced elsewhere (as @nicolasvasilache mentions).

For example, when lowering vector.multi_reduction, the lowering has to special case if it sees a scalar accumulator because it extracted a lower dimensional vector:

The proper solution to have would be to convert vector.extract/vector.insert to be Category 1 operations, so it works well with every vector dialect operation. I’m going to propose how to do this below.

A better charter for vector dialect

The above definitions make it clear when a operation should support 0-D vectors
(Category 1), when it shouldn’t (Category 2) and when it depends on what it’s targetting (Category 3).

We should start by splitting the operations into which category they belong and it would make it much clearer how they need to be defined, and would eliminate most of the bugs that we face today. This would also bring out operations that are poorly defined, and would give us a chance to define them better.

Problems with vector.extract / vector.insert

I will take vector.extract as an example. The same argument applies to vector.insert. From the docs for vector.extract:

Takes an n-D vector and a k-D position and extracts the (n-k)-D vector at the proper position. Degenerates to an element type if n-k is zero.

The problem with this operation is that it is defined for Category 2 operations (stack of 1-d vectors, decompose to scalar if we go below 1-d vectors), which means that if used for Category 1 operations, it will cause special casing and usual bugs for missing that (as shown in examples above).

The problem is fixed if we split the operation into two:

vector.extract: Extract a trailing (n-k)-D vector
vector.extract_scalar: Extracting a scalar from a (N)-D vector

This leads to a consistent definition of the semantics of the operation over a
N-D vector space, and makes it a Category 1 operation (which means it can be used with Category 2 and Category 3 operations as well). This also makes it clear that when working with Category 2 operations and using vector.extract, you must explicitly use vector.extract_scalar, because 0-D vectors do not make sense for Category 2 operations.

Proposed action points

I’m proposing two things here, which are in spirit of this RFC, but a different solution to the problem:

Put each operation in one of these two categories, and only allow 0-d vectors
for Category 1.
Split vector.extract into two operations and make both of them Category 1.

(I wrote this based on discussions with @qed @kuhar @hanchung @MaheshRavishankar @manupak to understand vector dialect operations better and why we face 0-D vector bugs)

In particular, folks, if I am misinterpreting/misunderstanding your concerns, please correct me!

You have just proposed a much needed update to this section of the Vector dialect on “Hardware Vector Ops” vs “Virtual Vector Ops” (that document has served us very well, but is a bit out-dated):

'vector' Dialect - MLIR

As in, we should use the taxonomy that you proposed to update that document.

This is precisely what I had in mind, but takes things much further. Thank you for sharing!

TBH, I’m a bit concerned about introducing a new Vector Op - this feels like vector.extractelement/vector.insertelement, and we should be mindful of:

https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops

Perhaps we can avoid that? See below.

No Let me clarify (by special-casing my example):

Allow:

// 1. %src + %dst are the _indexed_ inputs and can be 0-D
// 2. if (numIndices == srcRank) --> result is _always_ f32
// 3. The restriction only applies to the non-indexed argument.
%0 = vector.extract %src[] : f32 from vector<f32>
%1 = vector.insert %v, %dst[] : f32 into vector<f32>

Disallow:

// 1. %src + %dst are the _indexed_ inputs that can be 0-D
// 2. if (numIndices == srcRank) --> result is f32 _or_ vector<f32>
// 3. No restriction on the non-index arguments.
%0 = vector.extract %src[] : vector<f32> from vector<f32>
%1 = vector.insert %v, %dst[] : vector<f32> into vector<f32>

The ambiguity is caused by the fact that, when numIndices == srcRank, the non-indexed argument could either be f32 or vector<f32>. We need to restrict that. I proposed f32, you are suggesting vector<f32>. Whichever one we pick, we should stick to it consistently.

Put differently, I am still hoping that we can re-use vector.insert/vector.extract. If we discover otherwise, then we can just introduce vector.extract_scalar as you proposed.

Btw, I find the split into the “indexed” and “non-indexed” arguments very helpful. To me, we have always been missing that when reasoning about “read”/“write” operations. We should incorporate that into your taxonomy.

Hm, if Category 1 allows 0-D vectors and Category 2 does not allow 0-D vectors, shouldn’t this statement be reversed?

An operation defined as Category 3 can be used as a Category 2 or Category 1 operation, but the other way around is not true.

As in, Category 3 is the strictest and hence these Ops are also valid Category 2 and 1 Ops? Or:

Bucket 1 = Category 1 + Category 2 + Category 3
Bucket 2 = Category 2 + Category 3
Bucket 3 = Category 3

Also, @Groverkss, thanks again for all your effort going into improving this

-Andrzej

Honest answer: I once thought so perhaps, but this is the kind of problem where experience matters, and at some point you just decide “everyone who went down that hole died – there must be some hazard that defies abstract analysis”. The closest analogy I could make is if someone came into the C type system and started making equivalences between int and long that bypass tradition. All sorts of things are leaned up against that which pop up in surprising ways. It’s not a perfect analogy but it’s the best I’ve got.

However, I do intuitively agree that especially as you proceed downward, it should be possible to not propagate this ambiguity. Right now, linalg is “the framework” in this context and it has the representation and tools to manage this. That’s why I said that I’m supportive of proceeding incrementally so long as the linalg vectorizer is in scope and sound at each step. That adds the design pressure to ensure that the complexity which results from the ambiguity aren’t merely moved but reduced.

There are lots of concrete thoughts on this thread I won’t weigh. I just don’t want to lose anyone else down this quest or have an outcome that just shifts the debt. So that’s why my request is to proceed from the bottom, incrementally, and with the design scope including where debt would shift to if getting it wrong. And extend a little trust that 0-D normalization at the tensor programming level is a sticky problem and has so far been immune to hand waving (and is just one type of redundancy that needs to be considered).

I also want to say that there are other voices like @sjarus who speak up privately with a similar experience but are not prepared to enter an abstract debate on this topic: there is an appeal to eliminating the redundancy, but the end result of tool chains that embraced/tamed it are much easier to use and reason about. This is because the programming model itself is based around this redundancy and redundantly shaped tensors arise as easy as breathing… They are not an anomaly, and it is the job of the compiler stack to deal with them, not just hope they can go away at the top somewhere.

sjarus December 11, 2024, 7:10pm 17

Well, since @stellaraccident pulled me into this… IMHO, the technical discussions here mirror what were discussed around TOSA.

There’s no question that as a purely technical rationale, eliminating redundancy around this ambiguity makes great sense. For a dialect with a specification, particularly so; the functional definition would be simpler. The test suites would be saner. It was an easy win from that perspective.

We even went the way of considering multiple categories of ops - some of which (e.g. reshape) support 0-d for the purposes of recharacterization, but others do not. In that regard, a lot of the debate here is eerily similar.

But… we can’t control the ingress effectively. That was the gating problem. Frameworks and model authors cannot really be expected to do this. We could not compel this, and TOSA unlike Vector sits framework-adjacent. As much as making this go away had great technical value, it was difficult to achieve in engineering deliverable terms - there is immediate software and hardware impact to us.

It’s potentially easier for Vector sitting so much further down in abstraction, to pursue this path. But my own suggestion would be to address the ingress semantics as crucial, if not breaking. The technical case in favor of lesser ambiguity within is the easier one to make. Managing ambiguity into the dialect - including from use cases that may be very unlike your own - is another matter.

kuhar December 11, 2024, 7:57pm 18

No, the original text seems to me consistent with how you typically see program semantics discussed. Category 1 supports more inputs than Category 2 and subsumes/implements/refines it. For example, we can say two’s complement representation subsumes/implements signed integers in C, even though C does not allow wraparound.
We are probably thinking about the same thing but coming at it from the opposite directions.

rengolin December 11, 2024, 8:35pm 19

Ack. I think that’s the consensus I see forming.

I agree, and think we can control this on the ingress dialects (hlo, torch, onnx) and their lowering to linalg, rather than on the frameworks themselves. It’s a matter of propagating the expected semantics in a way that we can represent in linalg.

We don’t need to be ambiguous just because the framework is ambiguous. If frameworks are ambiguous is “different ways”, then any assumption in the transforms (tiling, vectorization) become impossible to choose from.

We shall see. I agree there is an intellectual appeal to such normalization. But I think Suraj’s point was that in this exercise, at the vector level, the design constraint is that it must not lose the property that it handles ingress. And I would add that it must do so without just propagating complexity to the edges where it cannot be controlled (which, iiuc, was also one of Nicholas’ caution). I trust everyone here to do the right thing on that if the scope is set properly and the work is done incrementally.

And if, in a future world, you/someone also manages to tame the frontend(s), then that opens up more possibilities for simplification. But that is a separate discussion and one that would need to pull in a number of other folks who have the experience at that level but are not tuned in here.