ACP: Pattern methods for OsStr
without OsStr
patterns · Issue #311 · rust-lang/libs-team (original) (raw)
Proposal
Problem statement
With rust-lang/rust#115443, developers, like those writing CLI parsers, can now perform (limited) operations on OsStr
but it requires unsafe
to get an OsStr
back, requiring the developer to understand and follow some very specific safety notes that cannot be checked by miri.
RFC #2295 exists for improving this but its been stalled out. The assumption here is that part of the problem with that RFC is how wide its scope is and that by shrinking the scope, we can get some benefits now.
Motivating examples or use cases
Mostly copied from #306
Argument parsers need to extract substrings from command line arguments. For example, --option=somefilename
needs to be split into option and somefilename
, and the original filename must be preserved without sanitizing it.
clap currently implements strip_prefix
and split_once
using transmute (equivalent to the stable encoded_bytes
APIs).
The os_str_bytes and osstrtools crates provides high-level string operations for OS strings. os_str_bytes
is in the wild mainly used to convert between raw bytes and OS strings (e.g. 1, 2, 3). osstrtools
enables reasonable uses of split()
to parse $PATH and replace()
to fill in command line templates.
Solution sketch
Provide str
s Pattern
-accepting methods on &OsStr
.
Defer out OsStr
being used as a Pattern
and OsStr
indexing support which are specified in RFC #2295.
Example of methods to be added:
impl OsStr { pub fn contains<'a, P>(&'a self, pat: P) -> bool where P: Pattern<&'a Self>;
pub fn starts_with<'a, P>(&'a self, pat: P) -> bool
where
P: Pattern<&'a Self>;
pub fn ends_with<'a, P>(&'a self, pat: P) -> bool
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
pub fn find<'a, P>(&'a self, pat: P) -> Option<usize>
where
P: Pattern<&'a Self>;
pub fn rfind<'a, P>(&'a self, pat: P) -> Option<usize>
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
// (Note: these should return a concrete iterator type instead of `impl Trait`.
// For ease of explanation the concrete type is not listed here.)
pub fn split<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>;
pub fn split_inclusive<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>;
pub fn rsplit<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
pub fn split_terminator<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>;
pub fn rsplit_terminator<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
pub fn splitn<'a, P>(&'a self, n: usize, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>;
pub fn rsplitn<'a, P>(&'a self, n: usize, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
pub fn split_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a Self, &'a Self)>where
P: Pattern<&'a Self>;
pub fn rsplit_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a Self, &'a Self)>where
P: Pattern<&'a Self>;
pub fn matches<'a, P>(&'a self, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>;
pub fn rmatches<'a, P>(&self, pat: P) -> impl Iterator<Item = &'a Self>
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
pub fn match_indices<'a, P>(&self, pat: P) -> impl Iterator<Item = (usize, &'a Self)>
where
P: Pattern<&'a Self>;
pub fn rmatch_indices<'a, P>(&self, pat: P) -> impl Iterator<Item = (usize, &'a Self)>
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
pub fn trim_matches<'a, P>(&'a self, pat: P) -> &'a Self
where
P: Pattern<&'a Self>,
P::Searcher: DoubleEndedSearcher<&'a Self>;
pub fn trim_start_matches<'a, P>(&'a self, pat: P) -> &'a Self
where
P: Pattern<&'a Self>;
pub fn strip_prefix<'a, P>(&'a self, prefix: P) -> Option<&'a Self> where
P: Pattern<&'a Self>;
pub fn strip_suffix<'a, P>(&'a self, prefix: P) -> Option<&'a Self> where
P: Pattern<&'a Self>;
pub fn trim_end_matches<'a, P>(&'a self, pat: P) -> &'a Self
where
P: Pattern<&'a Self>,
P::Searcher: ReverseSearcher<&'a Self>;
pub fn replace<'a, P>(&'a self, from: P, to: &'a Self) -> Self::Owned
where
P: Pattern<&'a Self>;
pub fn replacen<'a, P>(&'a self, from: P, to: &'a Self, count: usize) -> Self::Owned
where
P: Pattern<&'a Self>;
}
impl Pattern<&OsStr> for char {} impl Pattern<&OsStr> for &str {} impl Pattern<&OsStr> for &String {} impl Pattern<&OsStr> for &[char] {} impl Pattern<&OsStr> for &&str {} impl Pattern<&OsStr> for &[char; N] {} impl<F: FnMut(char) -> bool> Pattern<&OsStr> for F {} impl Pattern<&OsStr> for [char; N] {}
- This is meant to match
str
and if there are any changes between the writing of this ACP and implementation, the focus should be on whatstr
has at the time of implementation (e.g. not adding a deprecated variant but the new one) - We likely want to add
trim
,trim_start
, andtrim_end
to be consistent withtrim_start_matches
/trim_end_matches
- for more details, see Add pattern matching API to OsStr rust#109350
This should work because
- Allow limited access to OsStr bytes rust#109698 already established that operations on UTF-8 / 7-bit ASCII boundaries are safe
- It was decided to seal Pattern and, for now,
Pattern
is nightly only, allowing a lot of flexibility for how we implementOsStr
support in the future (e.g. we could go as far as creating aOsPattern
trait and switching to it without breaking anyone)
From an API design perspective, there is strong precedence for it
- Its copying methods over from
str
- The design is a subset of RFC #2295 (approved) and RFC #1309 (postponed)
- By deferring support for
OsStr
as a pattern, we bypass the main dividing point between proposals (split APIs, panic on unpaired surrogates, switching away from WTF-8)
- By deferring support for
Alternatives
#306 proposes a OsStr::slice_encoded_bytes
- Still requires writing higher level operations on top, but at least its without
unsafe
- Either takes a performance hit to be consistent across platforms or has per-platform caveats that will be similarly hard to get right for less common platforms among developers (e.g. Windows)
- As far as I can tell, there isn't precedence for an API design like this meaning more new ground has to be set (naming, deciding the above preconditions, etc)
Links and related work
- ACP: A substring API for OsStr #306
- ACP: Method to split OsStr into (str, OsStr) #114
- os_str_bytes
- osstrtools
- RFC #1309
- RFC #2295
- Decision to keep Pattern private
- Tracking Issue for os_str_bytes rust#111544
- ACP: Method to split OsStr into (str, OsStr) #114
- Add pattern matching API to OsStr rust#109350
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.