Help clarifying repackage_hidden in word_language_model (original) (raw)
January 27, 2017, 10:21pm 1
Hi,
In the example of word_language_model, we have
def repackage_hidden(h):
"""Wraps hidden states in new Variables, to detach them from their history."""
if type(h) == Variable:
return Variable(h.data)
else:
return tuple(repackage_hidden(v) for v in h)
I dont think I fully understand what the “history” includes, can somebody helps clarify this?
Thanks!
apaszke (Adam Paszke) January 27, 2017, 11:12pm 2
Every variable has a .creator attribute that is an entry point to a graph, that encodes the operation history. This allows autograd to replay it and differentiate each op. So each hidden state will have a reference to some graph node that has created it, but in that example you’re doing BPTT, so you never want to backprop to it after you finish the sequence. To get rid of the reference, you have to take out the tensor containing the hidden state h.data and wrap it in a fresh Variable, that has no history (is a graph leaf). This allows the previous graph to go out of scope and free up the memory for next iteration.
jekbradbury (James Bradbury) January 27, 2017, 11:23pm 3
I was going to add that .detach() does the same thing, but I checked the code and realized that I’m not at all sure about the semantics of var2 = var1.detach() vs var2 = Variable(var1.data)…
apaszke (Adam Paszke) January 27, 2017, 11:26pm 4
Right now the difference is that .detach() still retains the reference, but it should be fixed.
It will change once more when we add lazy execution. In eager mode, it will stay as is (always discard the .creator and mark as not requiring grad). In lazy mode var1.detach() won’t trigger the compute and will save the reference, while Variable(var1.data) will trigger it, because you’re accessing the data.
LaceyChen17 (Yihong Chen) July 15, 2017, 2:38am 5
So we do not need to repackage hidden state when making predictions ,since we don’t do a BPTT ?
jdhao (jdhao) October 31, 2017, 7:17am 6
For any latecomers, Variable object does not have creator attribute any more, which is renamed to grad_fn. You can see here for more information.
ratishsp (Ratish Puduppully) December 30, 2018, 5:48pm 7
Shouldn’t the code set requires_grad=True to the hidden state as shown below?
As per my understanding, each bptt set should be able to have gradients computed for h.
def repackage_hidden(h):
"""Wraps hidden states in new Variables, to detach them from their history."""
if type(h) == Variable:
return Variable(h.data, requires_grad=True)
else:
return tuple(repackage_hidden(v) for v in h)
Thanks.
alphadl (Liam) July 30, 2019, 3:30am 8
it has already been updated to be compatible with the latest PyTorch version:
def repackage_hidden(h):
"""Wraps hidden states in new Tensors, to detach them from their history."""
if isinstance(h, torch.Tensor):
return h.detach()
else:
return tuple(repackage_hidden(v) for v in h)