Optimize Wtf8Buf::into_string for the case where it contains UTF-8. by sunfishcode · Pull Request #96869 · rust-lang/rust (original) (raw)

Add a is_known_utf8 flag to Wtf8Buf, which tracks whether the
string is known to contain UTF-8. This is efficiently computed in many
common situations, such as when a Wtf8Buf is constructed from a String
or &str, or with Wtf8Buf::from_wide which is already doing UTF-16
decoding and already checking for surrogates.

This makes OsString::into_string O(1) rather than O(N) on Windows in
common cases.

And, it eliminates the need to scan through the string for surrogates in
Args::next and Vars::next, because the strings are already being
translated with Wtf8Buf::from_wide.

Many things on Windows construct OsStrings with Wtf8Buf::from_wide,
such as DirEntry::file_name and fs::read_link, so with this patch,
users of those functions can subsequently call .into_string() without
paying for an extra scan through the string for surrogates.

r? @ghost