slice contains subslice · Issue #499 · rust-lang/libs-team (original) (raw)
Proposal
Problem statement
Determining whether a slice is contained within another slice, analogous to "foobar".contains("foo")
, but for slices.
Motivating examples or use cases
I recently had cause to write
fn contains_osstr(haystack: impl AsRef, needle: impl AsRef) -> bool { let needle = needle.as_ref().as_encoded_bytes(); let haystack = haystack.as_ref().as_encoded_bytes();
haystack.windows(needle.len()).any(|h| h == needle)
}
Partially the problem here is the limited API on OsStr
(and similarly Path
and CStr
), but I've wanted a "contains slice" operation on just standard slices too. It is especially odd that the contains operation is defined on &str
, but not when you drop down to raw byte values.
I see two problems
- I can't neatly express the intuitive "contains" operation that I want
- The implementation is less performant than it could be
Solution sketch
I think the nicest solution is to mirror the core::str::pattern
design, so that we could have
impl [T] { pub fn contains(&self, pat: P) -> bool where T: PartialEq P: core::slice::pattern::Pattern;
pub fn find(&self, pat: P) -> Option<usize>
where
T: PartialEq
P: core::slice::pattern::Pattern<T>;
}
This appears to work out
trait Pattern<T:PartialEq>: Sized { /* ... */ }
impl<T: PartialEq> Pattern for T { /* ... */ }
impl<'b, T: PartialEq> Pattern for &'b [T] { /* ... */ }
// potentially arrays too, maybe even str, OsStr, CStr, and so on
And it looks backwards-compatible to me, but I'm not 100% sure that it is.
Some final notes:
- an algorithm like KMP should be used.
- specialization could be used for better performance (e.g. for
u8
andT: Copy
) - SIMD could be used for better performance
Alternatives
This rustc issue rust-lang/rust#54961 proposed instead
fn contains_subslice<T: PartialEq>(data: &[T], needle: &[T]) -> bool { data .windows(needle.len()) .any(|w| w == needle) }
fn position_subslice<T: PartialEq>(data: &[T], needle: &[T]) -> Option { data .windows(needle.len()) .enumerate() .find(|&(_, w)| w == needle) .map(|(i, _)| i) }
This idea works fine too if the pattern idea above has backwards compatibility issues. (though I'd suggest find_subslice
for consistency).
Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.