@ggerganov It's because you use fp16 and use a (B,S,C) data format on the ANE. You want to map the seq to the last axis because the last axis on an ANE buffer is not packed. By using fp16 and having your embed dim last, the compiler will pad it to 64bytes which means 32x memory cost in fp16 (original) (raw)
It's because you use fp16 and use a (B,S,C) data format on the ANE. You want to map the seq to the last axis because the last axis on an ANE buffer is not packed. By using fp16 and having your embed dim last, the compiler will pad it to 64bytes which means 32x memory cost in fp16