My experience with mlir-reduce (original) (raw)

I wrote a mlir-reduce tool for a downstream dialect. It seems that using the reduction-tree pass, almost no reducing occurs out of the box. I decided to write some RewritePatterns and register them with my dialect:

Replace operands with values defined earlier in the program with the same type
Remove operations with no uses
Remove optional attributes

On small programs, this seems to be okay at reducing. But I was really surprised that we didn’t do this out of the box. These all seem like patterns that are generic enough to be shared across all dialects. I didn’t see any “shared patterns” checked into the repo. Did I miss something?

Next, I encountered some real-world-sized programs and I’ve noticed that the reduction times are unbearably slow, to the point that I could reduce it faster by hand.

I thought it would be a good idea to write two new patterns:

Reduce the module to contain the first N / 2 operations (and some extra work to add a new terminator)
Reduce the module to contain the second N / 2 operations (and some extra work to make sure all uses on the second half were live in)

I noticed that I kept running into the Reduced module is not interesting message in ReductionTreePass’s findOptimal function. I’m not sure why this is happening, since the three patterns I mentioned at the start are still registered, and those are capable of reducing the module at least one time.

I’m wondering if anyone has found an approach to “bulk reduce”, similar to the binary search approach I describe here? Alternatively, is there something that I’m missing in order to get the binary search approach working?

I think it is an easy addition (well for no uses, I think you can register canonicalize as one of the patterns and it would do that for you, although you probably want this on invalid IR too). And unfortunately just not added - some local patterns were added for a domain, downstream and these simple ones not noticed as it did a good enough job. None of these were upstreamed as they were too specific and also a little bit half baked.

I think another simple one was: just mark some of the functions extern. So one can drop their entire body, no need for constants etc. And often still sufficient to repro error.

I actually thought one of the iteration settings did some of these … But I may also be conflating it with the other delta debugging one (which could actually be executed as one of the tactics here as it is effectively a pass).

I wonder: should we perhaps just have a repro somewhere with a test instance that is interesting and then try a few patterns there?