Migrating from 0.1 to 1.0 (original) (raw)
I don't like breaking backwards compatibility, but to be able to add new features I felt I had to. This means that updating from 0.1 to 1.0 might require code changes.
- Slimmer public API
- Returning TokenLists and TokenTrees instead of lists
- Parsing of ID:s now include ranges and decimals
Slimmer public API
Previously init.py included a lot of public methods. Some of these have NOT moved:
from conllu import parse from conllu import parse_tree
These two work just like they did before. But they now return a TokenList or TokenTree instead of a raw list. See next heading on how to handle this.
-from conllu.parser import parse -from conllu.parser import parse_tree +from conllu import parse +from conllu import parse_tree
Importing parse and parse_tree for conllu.parser is no longer supported. Remove ".parser" and the imports will work again.
-from conllu.parser import parse_with_comments
parse_with_comments is now removed. When using parse comments are automatically included. You can access them with by accessing the new metadata property on the returned TokenList.
-from conllu.parser import serialize_tree -from conllu.tree_helpers import print_tree
These two methods have been moved to TokenTree that is returned from parse_tree. serialize_tree is now tree.serialize(), and print_tree is now tree.print_tree().
Returning TokenLists and TokenTrees instead of lists
The return values from both parse and parse_tree have changed.
sentences = parse(raw_conllu_str) sentence = sentences[0] for token in sentence: print(token)
This code will keep working since TokenList has a getitem defined that makes it work like a list. If you relied on some other part the return value behaving like a list, you might have to change that.
sentences = parse_tree(raw_conllu_str) root = sentences[0] -print(root.data, root.children) +print(root.token, root.children)
When switching from TreeNode to TokenTree I've also changed data to instead be token. So you have to change all places where you access .data to access .token instead.
Parsing of ID:s now include ranges and decimals
Previously only ID:s in the form of positive integers where recognized. Now conllu has support for ranges ("1-3") and decimals ("3.1") too. If your code relied on those numbers being returned as None, you need to change that to say isinstance(value, int) instead..
"1" -> 1 -"1-3" -> None +"1-3" -> (1, "-", 3) -"3.1" -> None +"3.1" -> (3, ".", 1)