Implement placement-in protocol for HashMap by F001 · Pull Request #40390 · rust-lang/rust (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation42 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

F001

@nagisa

While this works technically, the implementation is not correct. The point of the placement-in protocol is to put value directly into some place, in this case into the HashMap in such a way that copies are avoided. So the Place::pointer should return a pointer to some place directly inside the HashMap allocated storage, and not field in EntryPlace.

To implement this you will likely need to do some internal changes to the Entry(-ies), so it would be possible to obtain a pointer for both Vacant and Occupied entry.

mattico added a commit to mattico/rust that referenced this pull request

Mar 9, 2017

@mattico

@nrc

@nrc nrc self-assigned this

Mar 9, 2017

@F001

Thank you for the review comment!

cc @arthurprs Please correct me if anything is wrong.

I used a temporary field to store the value because of panic safety.

AFAK, if the Place::pointer need to return a pointer to some place directly inside the HashMap allocated storage, I have to do robin_hood first in make_place phase. This will affect existing elements in HashMap. If panic occurs later, I don't know any roll back mechanism to restore valid state. It is the main difference of the implementation of placement-in between VecDeque and HashMap.

I'm looking forward to your suggestions.

@arthurprs

Your suggestion sounds ok to me. It will avoid unnecessary V copies for Entry::Vacant. To avoid any unnecessary V copies for Entry::Occupied you probably need a variant of robin_hood that will make space without copying the uninitialized V into the bucket.

For rollback you can implement Drop for EntryPlace (drop still runs in case of panics) and use pop_internal to fix the table if it comes to that (forget what it returns). BinaryHeap uses a similar strategy to avoid corrupting the structure if T comparisons panics.

@F001

Thanks for your suggestion. I have updated the implementation. For now, it can avoid unnecessary V copy for Entry::Vacant. I'll continue to investigate more optimization.

arthurprs

issue = "30172")]
pub struct EntryPlace<'a, K: 'a, V: 'a> {
bucket: Option<FullBucketMut<'a, K, V>>,
panicked: Cell,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using a finalized flag instead. Also, the flag should probably be the last field as it may save 7 bytes of stack.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested below, using forget can avoid the flag.

reason = "struct name and placement protocol is subject to change",
issue = "30172")]
pub struct EntryPlace<'a, K: 'a, V: 'a> {
bucket: Option<FullBucketMut<'a, K, V>>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something obvious but do we really need to wrap the bucket with Option?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just lazy that I want to use existing FullBucket::take to remove the entry. It takes a self parameter. But in the drop method, there is only &mut self, the bucket field can't move.

It is fixed by adding another FullBucket::remove method, which takes a &mut self parameter. In drop method, I can call this remove now.

arthurprs

impl<'a, K, V> InPlace for EntryPlace<'a, K, V> {
type Owner = ();
unsafe fn finalize(self) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about this you can forget(self) here, avoiding the flag altogether.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Fixed.

nagisa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement over the previous version! @arthurprs’ notes seem very relevant (and they are also much more familiar with the HashMap code), so these should be fixed.

issue = "30172")]
pub struct EntryPlace<'a, K: 'a, V: 'a> {
bucket: Option<FullBucketMut<'a, K, V>>,
panicked: Cell,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.

@nagisa

I realised there’s one possible alternative in behaviour. Current implementation tries to recover the previous value if the placement expression fails, however it is not obvious to me whether this is a better approach compared to, say, simply making the key vacant in case of panic.

Here are some points in favour of leaving the entry vacant instead of restoring the value if panic happens:

  1. Saving the old value involves a copy, thus negating most/all of the point of placement-in (as few copies as possible);
  2. Panicking is a very exceptional situation that is not supposed to be recovered from. This means that all the Drop should be responsible for is restoring HashMap into a state that’s safe to Drop, that’s all. Making entry vacant seems equivalent to restoring the old value in that sense.

@arthurprs

Very good points, leaving a previous filled bucket empty on panic sounds reasonable.

@nagisa

@aturon

cc @rust-lang/libs, anyone have feedback on @nagisa's last comment?

@sfackler

I agree that the precise state of the value being modified doesn't matter too much.

nagisa

self.table.size -= 1;
unsafe {
*self.raw.hash = EMPTY_BUCKET;
ptr::read(self.raw.pair); // drop right now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly incorrect. I think you’ll notice why if you add a test that looks like this (you probably should one similar to it):

struct Banana<'a>(&'a mut bool);
impl Drop for Banana {
    fn drop(&mut self) {
        if !*self.0 { panic!("double drop!"); }
        *self.0 = false;
    }
}

let mut hm = HashMap::new();
let mut can_drop = true;
hm.insert(0, Banana(&mut can_drop));
hm.entry(0) <- panic!("boom") ;
// first drop happens in `make_place`, where the `Banana(true)` gets dropped and `can_drop` is set to false
// then a `*place.pointer() = panic!("boom")` is executed, which unwinds, thus dropping the place
// place destructor drops the `Banana(false)`, and thus double-panic occurs and the process aborts.
//
// In other words, current implementation of Drop reads uninitialized memory.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Good point. Fixed.

nagisa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 more, likely final, tweaks.

self.table.size -= 1;
unsafe {
*self.raw.hash = EMPTY_BUCKET;
ptr::read(self.raw.pair); // drop right now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.

let b = match self {
Occupied(mut o) => {
let uninit = unsafe { mem::uninitialized() };
o.insert(uninit);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid doing this mem::uninitialized dance by simply doing a

std::ptr::drop_in_place(o.elem.bucket.read_mut().1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

arthurprs

issue = "30172")]
impl<'a, K, V> Drop for EntryPlace<'a, K, V> {
fn drop(&mut self) {
self.bucket.remove();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will drop and uninitialized V as you only inserted the key?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nagisa has mentioned this. I'm fixing it.

nagisa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve only got nits left. Marking the functions internal functions as unsafe makes sense as they leave around uninitialized data which the caller should handle appropriately.

r=me once nits are fixed

assert_eq!(map.len(), 9);
assert!(!map.contains_key(&100));
// correctly drop

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be factored out into a separate test. (i.e. a different #[test] function)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// Remove this bucket's key and value from the hashtable.
/// Only used for inplacement insertion.
pub fn remove_key(&mut self) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, whole function unsafe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// Puts given key, remain value uinitialized.
/// It is only used for inplacement insertion.
pub fn put_key(mut self, hash: SafeHash, key: K) -> FullBucket<K, V, M> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d probably make this whole function unsafe. (i.e. pub unsafe fn put key)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
// Only used for InPlacement insert. Avoid unnecessary value copy.
fn insert_key(self) -> FullBucketMut<'a, K, V> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be unsafe fn too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@F001

@nagisa

@nagisa

Oh, bors didn’t notice the delegation above :/

@eddyb

@bors

✌️ @nagisa can now approve this pull request

@nagisa

@bors

📌 Commit 584c798 has been approved by nagisa

frewsxcv added a commit to frewsxcv/rust that referenced this pull request

Mar 12, 2017

@frewsxcv

bors added a commit that referenced this pull request

Mar 12, 2017

@bors