Add matrix-style preprocessing to lit to reuse tests across backends (original) (raw)

Hey everyone,

I’ve been updating a handful of codegen tests across backends and am wondering if the cross-platform testing story could be a more robust. The biggest problem here is many tests for basic functionality only exist for x86 and/or aarch64, and getting them running on other backends means a whole lot of copying and pasting.

My thought is that there could be a “source of truth” test file that lit preprocesses into a number of derived test files based on directives. Something like the following:

; located at llvm/test/CodeGen/Generic/float-artihmetic.ll

; MATRIX-RUN: llc %s -o- -mtriple=aarch64-linux  | FileCheck--check-prefixes=ALL,LINUX
; MATRIX-RUN: llc %s -o- -mtriple=aarch64-darwin | FileCheck --check-prefixes=ALL,DARWIN
;            ↓ an identifier can be specified for special casing
; MATRIX-RUN-X86: llc %s -o- -mtriple=x86_64-linux    | FileCheck --check-prefixes=ALL,LINUX
; MATRIX-RUN-X86: llc %s -o- -mtriple=x86_64-windows  | FileCheck --check-prefixes=ALL,WIN
; MATRIX-RUN-PPC: llc %s -o- -mtriple=powerpc64-linux | FileCheck --check-prefixes=ALL,LINUX
;
; MATRIX-GEN-BF16:     sed 's/fTy/bfloat/g' %s
; MATRIX-GEN-F16:      sed 's/fTy/half/g' %s
; MATRIX-GEN-F32:      sed 's/fTy/float/g' %s
; MATRIX-GEN-F64:      sed 's/fTy/double/g' %s
; MATRIX-GEN-F128:     sed 's/fTy/fp128/g' %s
; MATRIX-GEN-PPC_F128: sed 's/fTy/ppc_fp128/g' %s
; MATRIX-GEN-X86_F80:  sed 's/fTy/x86_fp80/g' %s
; ↑ MATRIX-GEN is optional if only a single test is needed
; 
; By default, don't test ppc or x87 floats
; MATRIX-EXCLUDE: GEN-PPC_F128
; MATRIX-EXCLUDE: GEN-X86_F80
; ... but do on the relevant platforms
; MATRIX-INCLUDE: RUN-X86 + GEN-X86_F80
; MATRIX-INCLUDE: RUN-PPC + GEN-PPC_F128

define fTy @fadd(fTy %a, fTy %b) {
  %res = fadd fTy %a, %b
  ret float %res
}

; fsub, fmul, fdiv ...

This is loosely based on CI testing matrix. Directives are:

  1. MATRIX-GEN-IDENT indicates a preprocessing step to do. Within a backend-specific directory, each IDENT turns into a separate .ident file.
  2. MATRIX-RUN or MATRIX-RUN-IDENT turns into a RUN command (IDENT is only for inlclude/exclude)
  3. MATRIX-EXCLUDE and MATRIX-INCLUDE allow removing or adding specific combinations to the list

A new command lit generate Generic/fadd.ll generates the following files for the above example:

# Assuming lit can match triple/target args to select
# the correct directory
llvm/test/CodeGen/AArch64/xgen/float-artihmetic.f16.ll
llvm/test/CodeGen/AArch64/xgen/float-artihmetic.f32.ll
llvm/test/CodeGen/AArch64/xgen/float-artihmetic.f64.ll
llvm/test/CodeGen/AArch64/xgen/float-artihmetic.f128.ll
llvm/test/CodeGen/PowerPC/xgen/float-artihmetic.f16.ll
# ...
llvm/test/CodeGen/X86/xgen/float-artihmetic.f16.ll
llvm/test/CodeGen/X86/xgen/float-artihmetic.f32.ll
llvm/test/CodeGen/X86/xgen/float-artihmetic.f64.ll
llvm/test/CodeGen/X86/xgen/float-artihmetic.f128.ll
llvm/test/CodeGen/X86/xgen/float-artihmetic.x86_f80.ll

Each similar to:

; NOTE: Test file autogenerated by llvm-lit; do not edit non-comment lines
; Source: llvm/test/CodeGen/Generic/float-artihmetic.ll.

; RUN: llc %s -o- -mtriple=x86_64-linux   | FileCheck --check-prefixes=ALL,LINUX
; RUN: llc %s -o- -mtriple=x86_64-windows | FileCheck --check-prefixes=ALL,WIN

define float @fadd(float %a, float %b) {
  %res = fadd float %a, %b
  ret float %res
}

; fsub, fmul, fdiv ...

At this point, the file can get either handwritten or autogenerated filecheck annotations.

Keeping the files in sync is enforced in two places:

  1. When lit is run on the source at llvm/test/CodeGen/Generic/float-artihmetic.ll, it runs the preprocess steps. The output gets checked against the xgen/ files (ignoring non-RUN comments) to make sure they match up.
  2. When lit is run on a xgen/ file, it asserts that the there is a file in Generic/ to match.

A few advantages:

  1. Enabling a test for a backend is trivial: add MATRIX-RUN and update the generated file
  2. Similar classes of behavior can be sure to get all the same tests cases, nothing to miss in copy+paste errors
  3. The matrix file without filecheck comments gives a nice overview of the test cases. Some vector tests have thousands of lines to scroll through and it’s easy to lose track of the actual input IR.
  4. The exact test input after preprocessing is checked in. This is a concern @arsenm brought up about using sed in tests currently.

I have to disclose that unfortunately I would not be able to work on this, just seemed like an idea worth bringing up.

Anyway, does anybody have any thoughts?