Implement placement-in protocol for HashMap
by F001 · Pull Request #40390 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation42 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
While this works technically, the implementation is not correct. The point of the placement-in protocol is to put value directly into some place, in this case into the HashMap
in such a way that copies are avoided. So the Place::pointer
should return a pointer to some place directly inside the HashMap
allocated storage, and not field in EntryPlace
.
To implement this you will likely need to do some internal changes to the Entry(-ies), so it would be possible to obtain a pointer for both Vacant
and Occupied
entry.
mattico added a commit to mattico/rust that referenced this pull request
nrc self-assigned this
Thank you for the review comment!
cc @arthurprs Please correct me if anything is wrong.
I used a temporary field to store the value because of panic safety.
AFAK, if the Place::pointer
need to return a pointer to some place directly inside the HashMap allocated storage, I have to do robin_hood
first in make_place
phase. This will affect existing elements in HashMap. If panic occurs later, I don't know any roll back mechanism to restore valid state. It is the main difference of the implementation of placement-in between VecDeque
and HashMap
.
I'm looking forward to your suggestions.
Your suggestion sounds ok to me. It will avoid unnecessary V copies for Entry::Vacant. To avoid any unnecessary V copies for Entry::Occupied you probably need a variant of robin_hood that will make space without copying the uninitialized V into the bucket.
For rollback you can implement Drop for EntryPlace (drop still runs in case of panics) and use pop_internal to fix the table if it comes to that (forget what it returns). BinaryHeap uses a similar strategy to avoid corrupting the structure if T comparisons panics.
Thanks for your suggestion. I have updated the implementation. For now, it can avoid unnecessary V copy for Entry::Vacant. I'll continue to investigate more optimization.
issue = "30172")] |
---|
pub struct EntryPlace<'a, K: 'a, V: 'a> { |
bucket: Option<FullBucketMut<'a, K, V>>, |
panicked: Cell, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using a finalized
flag instead. Also, the flag should probably be the last field as it may save 7 bytes of stack.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested below, using forget
can avoid the flag.
reason = "struct name and placement protocol is subject to change", |
---|
issue = "30172")] |
pub struct EntryPlace<'a, K: 'a, V: 'a> { |
bucket: Option<FullBucketMut<'a, K, V>>, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something obvious but do we really need to wrap the bucket with Option?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just lazy that I want to use existing FullBucket::take
to remove the entry. It takes a self
parameter. But in the drop
method, there is only &mut self
, the bucket field can't move.
It is fixed by adding another FullBucket::remove
method, which takes a &mut self
parameter. In drop
method, I can call this remove
now.
impl<'a, K, V> InPlace for EntryPlace<'a, K, V> { |
---|
type Owner = (); |
unsafe fn finalize(self) { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more about this you can forget(self)
here, avoiding the flag altogether.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Fixed.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement over the previous version! @arthurprs’ notes seem very relevant (and they are also much more familiar with the HashMap
code), so these should be fixed.
issue = "30172")] |
---|
pub struct EntryPlace<'a, K: 'a, V: 'a> { |
bucket: Option<FullBucketMut<'a, K, V>>, |
panicked: Cell, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.
I realised there’s one possible alternative in behaviour. Current implementation tries to recover the previous value if the placement expression fails, however it is not obvious to me whether this is a better approach compared to, say, simply making the key vacant in case of panic.
Here are some points in favour of leaving the entry vacant instead of restoring the value if panic happens:
- Saving the old value involves a copy, thus negating most/all of the point of placement-in (as few copies as possible);
- Panicking is a very exceptional situation that is not supposed to be recovered from. This means that all the
Drop
should be responsible for is restoringHashMap
into a state that’s safe toDrop
, that’s all. Making entry vacant seems equivalent to restoring the old value in that sense.
Very good points, leaving a previous filled bucket empty on panic sounds reasonable.
cc @rust-lang/libs, anyone have feedback on @nagisa's last comment?
I agree that the precise state of the value being modified doesn't matter too much.
self.table.size -= 1; |
---|
unsafe { |
*self.raw.hash = EMPTY_BUCKET; |
ptr::read(self.raw.pair); // drop right now |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possibly incorrect. I think you’ll notice why if you add a test that looks like this (you probably should one similar to it):
struct Banana<'a>(&'a mut bool);
impl Drop for Banana {
fn drop(&mut self) {
if !*self.0 { panic!("double drop!"); }
*self.0 = false;
}
}
let mut hm = HashMap::new();
let mut can_drop = true;
hm.insert(0, Banana(&mut can_drop));
hm.entry(0) <- panic!("boom") ;
// first drop happens in `make_place`, where the `Banana(true)` gets dropped and `can_drop` is set to false
// then a `*place.pointer() = panic!("boom")` is executed, which unwinds, thus dropping the place
// place destructor drops the `Banana(false)`, and thus double-panic occurs and the process aborts.
//
// In other words, current implementation of Drop reads uninitialized memory.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Good point. Fixed.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 more, likely final, tweaks.
self.table.size -= 1; |
---|
unsafe { |
*self.raw.hash = EMPTY_BUCKET; |
ptr::read(self.raw.pair); // drop right now |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.
let b = match self { |
---|
Occupied(mut o) => { |
let uninit = unsafe { mem::uninitialized() }; |
o.insert(uninit); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid doing this mem::uninitialized
dance by simply doing a
std::ptr::drop_in_place(o.elem.bucket.read_mut().1)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
issue = "30172")] |
---|
impl<'a, K, V> Drop for EntryPlace<'a, K, V> { |
fn drop(&mut self) { |
self.bucket.remove(); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will drop and uninitialized V as you only inserted the key?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, nagisa has mentioned this. I'm fixing it.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve only got nits left. Marking the functions internal functions as unsafe
makes sense as they leave around uninitialized data which the caller should handle appropriately.
r=me once nits are fixed
assert_eq!(map.len(), 9); |
---|
assert!(!map.contains_key(&100)); |
// correctly drop |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be factored out into a separate test. (i.e. a different #[test]
function)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
/// Remove this bucket's key and value from the hashtable. |
/// Only used for inplacement insertion. |
pub fn remove_key(&mut self) { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, whole function unsafe.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
/// Puts given key, remain value uinitialized. |
/// It is only used for inplacement insertion. |
pub fn put_key(mut self, hash: SafeHash, key: K) -> FullBucket<K, V, M> { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d probably make this whole function unsafe
. (i.e. pub unsafe fn put key
)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} |
---|
// Only used for InPlacement insert. Avoid unnecessary value copy. |
fn insert_key(self) -> FullBucketMut<'a, K, V> { |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be unsafe fn
too.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Oh, bors didn’t notice the delegation above :/
✌️ @nagisa can now approve this pull request
📌 Commit 584c798 has been approved by nagisa
frewsxcv added a commit to frewsxcv/rust that referenced this pull request
bors added a commit that referenced this pull request