NodeTraversor visits nodes twice in certain cases after removal (original) (raw)
In short: if NodeVisitor.head() removes the last child, NodeTraversor revisits the previous children at the same level. In some cases this doesn't matter, e.g. if all is expected from NodeVisitor is just modification of the DOM tree. For example, assertion in test canRemoveDuringHead() only asserts how the tree looks afterwards. However, if NodeVisitor acts as accumulator, such duplicate visiting can easily lead to incorrect results.
Reproducer in the form of a test case (e.g. for TraversorTest):
@ParameterizedTest
@ValueSource(strings = {"em", "b"})
void doesntVisitAgainAfterRemoving(String removeTag) {
Document doc = Jsoup.parse("<div><em>first</em><b>last</b></div>");
Set<Node> visited = new HashSet<>();
NodeTraversor.traverse((node, depth) -> {
if (!visited.add(node))
fail(String.format("node '%s' is being visited for the second time", node));
if (removeTag.equals(node.nodeName()))
node.remove();
}, doc);
}