ACP: Pattern methods for OsStr without OsStr patterns · Issue #311 · rust-lang/libs-team (original) (raw)

Proposal

Problem statement

With rust-lang/rust#115443, developers, like those writing CLI parsers, can now perform (limited) operations on OsStr but it requires unsafe to get an OsStr back, requiring the developer to understand and follow some very specific safety notes that cannot be checked by miri.

RFC #2295 exists for improving this but its been stalled out. The assumption here is that part of the problem with that RFC is how wide its scope is and that by shrinking the scope, we can get some benefits now.

Motivating examples or use cases

Mostly copied from #306

Argument parsers need to extract substrings from command line arguments. For example, --option=somefilename needs to be split into option and somefilename, and the original filename must be preserved without sanitizing it.

clap currently implements strip_prefix and split_once using transmute (equivalent to the stable encoded_bytes APIs).

The os_str_bytes and osstrtools crates provides high-level string operations for OS strings. os_str_bytes is in the wild mainly used to convert between raw bytes and OS strings (e.g. 1, 2, 3). osstrtools enables reasonable uses of split() to parse $PATH and replace() to fill in command line templates.

Solution sketch

Provide strs Pattern-accepting methods on &OsStr.

Defer out OsStr being used as a Pattern and OsStr indexing support which are specified in RFC #2295.

Example of methods to be added:

impl OsStr { pub fn contains<'a, P>(&'a self, pat: P) -> bool where P: Pattern<&'a Self>;

pub fn starts_with<'a, P>(&'a self, pat: P) -> bool
where
    P: Pattern<&'a Self>;

pub fn ends_with<'a, P>(&'a self, pat: P) -> bool
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

pub fn find<'a, P>(&'a self, pat: P) -> Option<usize>
where
    P: Pattern<&'a Self>;

pub fn rfind<'a, P>(&'a self, pat: P) -> Option<usize>
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

// (Note: these should return a concrete iterator type instead of `impl Trait`.
//  For ease of explanation the concrete type is not listed here.)
pub fn split<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>;

pub fn split_inclusive<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>;

pub fn rsplit<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

pub fn split_terminator<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>;

pub fn rsplit_terminator<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

pub fn splitn<'a, P>(&'a self, n: usize, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>;

pub fn rsplitn<'a, P>(&'a self, n: usize, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

pub fn split_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a Self, &'a Self)>where
    P: Pattern<&'a Self>;

pub fn rsplit_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a Self, &'a Self)>where
    P: Pattern<&'a Self>;

pub fn matches<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>;

pub fn rmatches<'a, P>(&self, pat: P) -> impl Iterator<Item = &'a Self>
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

pub fn match_indices<'a, P>(&self, pat: P) -> impl Iterator<Item = (usize, &'a Self)>
where
    P: Pattern<&'a Self>;

pub fn rmatch_indices<'a, P>(&self, pat: P) -> impl Iterator<Item = (usize, &'a Self)>
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

pub fn trim_matches<'a, P>(&'a self, pat: P) -> &'a Self
where
    P: Pattern<&'a Self>,
    P::Searcher: DoubleEndedSearcher<&'a Self>;

pub fn trim_start_matches<'a, P>(&'a self, pat: P) -> &'a Self
where
    P: Pattern<&'a Self>;

pub fn strip_prefix<'a, P>(&'a self, prefix: P) -> Option<&'a Self> where
P: Pattern<&'a Self>;

pub fn strip_suffix<'a, P>(&'a self, prefix: P) -> Option<&'a Self> where
P: Pattern<&'a Self>;

pub fn trim_end_matches<'a, P>(&'a self, pat: P) -> &'a Self
where
    P: Pattern<&'a Self>,
    P::Searcher: ReverseSearcher<&'a Self>;

pub fn replace<'a, P>(&'a self, from: P, to: &'a Self) -> Self::Owned
where
    P: Pattern<&'a Self>;

pub fn replacen<'a, P>(&'a self, from: P, to: &'a Self, count: usize) -> Self::Owned
where
    P: Pattern<&'a Self>;

}

impl Pattern<&OsStr> for char {} impl Pattern<&OsStr> for &str {} impl Pattern<&OsStr> for &String {} impl Pattern<&OsStr> for &[char] {} impl Pattern<&OsStr> for &&str {} impl Pattern<&OsStr> for &[char; N] {} impl<F: FnMut(char) -> bool> Pattern<&OsStr> for F {} impl Pattern<&OsStr> for [char; N] {}

This should work because

From an API design perspective, there is strong precedence for it

Alternatives

#306 proposes a OsStr::slice_encoded_bytes

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution: