as_mut_vec_for_path_buf in windows breaks UTF-8 is_known_utf8 assumption · Issue #126291 · rust-lang/rust (original) (raw)

pub struct Wtf8Buf {
bytes: Vec<u8>,
/// Do we know that `bytes` holds a valid UTF-8 encoding? We can easily
/// know this if we're constructed from a `String` or `&str`.
///
/// It is possible for `bytes` to have valid UTF-8 without this being
/// set, such as when we're concatenating `&Wtf8`'s and surrogates become
/// paired, as we don't bother to rescan the entire string.
is_known_utf8: bool,
}
pub(crate) fn as_mut_vec_for_path_buf(&mut self) -> &mut Vec<u8> {
&mut self.bytes
}

I tried this code:

use std::{ffi::OsString, os::windows::ffi::OsStringExt, path::PathBuf};

fn f() -> Result<String, OsString> { let mut utf8 = PathBuf::from(OsString::from("utf8".to_owned())); let non_utf8: OsString = OsStringExt::from_wide(&[0x6e, 0x6f, 0x6e, 0xd800, 0x75, 0x74, 0x66, 0x38]); utf8.set_extension(&non_utf8); utf8.into_os_string().into_string() }

fn main() { dbg!(f()); }

I expected to see this happen:

[1.rs:11:5] f() = Err(
    "utf8.non\xED\xA0\x80utf8",
)

Instead, this happened:

[1.rs:11:5] f() = Ok(
    "utf8.non\u{d800}utf8",
)

(Obviously, Strings can't contain \u{d800}.)

1

Meta

rustc --version --verbose:

rustc 1.81.0-nightly (d0227c6a1 2024-06-11)
binary: rustc
commit-hash: d0227c6a19c2d6e8dceb87c7a2776dc2b10d2a04
commit-date: 2024-06-11
host: x86_64-pc-windows-gnu
release: 1.81.0-nightly
LLVM version: 18.1.7