as_mut_vec_for_path_buf
in windows breaks UTF-8 is_known_utf8
assumption · Issue #126291 · rust-lang/rust (original) (raw)
pub struct Wtf8Buf { |
---|
bytes: Vec<u8>, |
/// Do we know that `bytes` holds a valid UTF-8 encoding? We can easily |
/// know this if we're constructed from a `String` or `&str`. |
/// |
/// It is possible for `bytes` to have valid UTF-8 without this being |
/// set, such as when we're concatenating `&Wtf8`'s and surrogates become |
/// paired, as we don't bother to rescan the entire string. |
is_known_utf8: bool, |
} |
pub(crate) fn as_mut_vec_for_path_buf(&mut self) -> &mut Vec<u8> { |
---|
&mut self.bytes |
} |
I tried this code:
use std::{ffi::OsString, os::windows::ffi::OsStringExt, path::PathBuf};
fn f() -> Result<String, OsString> { let mut utf8 = PathBuf::from(OsString::from("utf8".to_owned())); let non_utf8: OsString = OsStringExt::from_wide(&[0x6e, 0x6f, 0x6e, 0xd800, 0x75, 0x74, 0x66, 0x38]); utf8.set_extension(&non_utf8); utf8.into_os_string().into_string() }
fn main() { dbg!(f()); }
I expected to see this happen:
[1.rs:11:5] f() = Err(
"utf8.non\xED\xA0\x80utf8",
)
Instead, this happened:
[1.rs:11:5] f() = Ok(
"utf8.non\u{d800}utf8",
)
(Obviously, String
s can't contain \u{d800}
.)
Meta
rustc --version --verbose
:
rustc 1.81.0-nightly (d0227c6a1 2024-06-11)
binary: rustc
commit-hash: d0227c6a19c2d6e8dceb87c7a2776dc2b10d2a04
commit-date: 2024-06-11
host: x86_64-pc-windows-gnu
release: 1.81.0-nightly
LLVM version: 18.1.7