Some unicode APIs like PyUnicode_Contains get a short path comparing kinds. But this get a problem cannot apply to ascii and latin1. PyUnicode_MAX_CHAR_VALUE could be used instead to make the short path also apply to ascii and latin1. This skill is already used in PyUnicode_Replace.
PyUnicode_KIND() just extracts three bits from the state word. PyUnicode_MAX_CHAR_VALUE() extracts bits multiple times and does few conditional branching. I think it is much slower that PyUnicode_KIND(). In common case you search ASCII needle or the needle of the same kind as a string, therefore checking for fast path just adds the overhead. It is appropriate while the overhead is tiny. Optimize common cases, not rare and obscure cases.