namespace.py fix compute_qname missing namespaces by tgbugs · Pull Request #649 · RDFLib/rdflib (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation66 Commits14 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

The way the compute_qname worked was to call split_uri
and then see if there was a match in self.store.prefix.
This produced incomplete behavior for namesapces that
end in LETTERS_ instead of / or #. This commit corrects
this behavior by iterating through name and testing
namespace + name[:i] to see if there is a matching prefix.

tests fail... what spec are you referring to when saying that namespaces should not end in / or # ?

Maybe I have incorrectly assumed that OWLAPI implements prefix handling according to the spec. This change was intended to match its behavior. To clarify about / and # endings: these are correct, and they remain correct with my changes, essentially this adds the ability to use arbitrary prefixes.

I will check again to see if this is in 4.2.2 and figure out why the tests are failing.

@tgbugs - i decided to fix this together with #632 in the turtle serializer itself. see here: #661

and for the full discussion between @gromgull and myself see #660

@satra the solution in #661 looks good to me. Closing this. Thanks!

I have reverted the change to NAME_START_CATEGORIES that was causing the build failures. Tests should pass now.

@tgbugs - with this change does the following work for you:

graph = Graph()
graph.bind('GENO', 'http://purl.obolibrary.org/obo/GENO_')
graph.bind('RO_has_phenotype',
                'http://purl.obolibrary.org/obo/RO_0002200')
graph.add((URIRef('http://example.org'),
           URIRef('http://purl.obolibrary.org/obo/RO_0002200'),
           URIRef('http://purl.obolibrary.org/obo/GENO_0000385')))
output = [val for val in
          graph.serialize(format='turtle').decode().splitlines()
          if not val.startswith('@prefix')]
output = ' '.join(output)
assert 'RO_has_phenotype: ' in output
assert 'GENO:0000385' in output

@satra No, that example fails, producing ' <http://example.org> ns1:RO_0002200 ns1:GENO_0000385 . '. From my experiments the issues arrises because a call to graph.namespace_manager.qname(URIRef('http://purl.obolibrary.org/obo/RO_0002200')) (which is buried in serialize) forces the creation of ('ns1', rdflib.term.URIRef('http://purl.obolibrary.org/obo/')). A call to graph.namespace_manager.qname(URIRef('http://purl.obolibrary.org/obo/GENO_0000385')) does not cause this behavior, which leads me to believe it is because my code stops just short of the end (it assumes that users would not prefix an entire iri) I think I can fix this and will look into it.

@satra Fixed. Was stopping at len(name) instead of len(name) + 1. Your tests now pass. If we care about efficiency the slowest part of this seems to be the fact that I have to convert the concatenated name to a URIRef every time in order to check the prefixes.

I can pull in your changes to test/test_turtle_serialize.py to this pr if you want.

@tgbugs - great. could you please add the test below to: test_turtle_serialize.py

@gromgull and @joernhees - let's focus on this PR and i'll close the other one. any questions or considerations we are overlooking?

def test_turtle_namespace(): graph = Graph() graph.bind('GENO', 'http://purl.obolibrary.org/obo/GENO_') graph.bind('RO_has_phenotype', 'http://purl.obolibrary.org/obo/RO_0002200') graph.add((URIRef('http://example.org'), URIRef('http://purl.obolibrary.org/obo/RO_0002200'), URIRef('http://purl.obolibrary.org/obo/GENO_0000385'))) output = [val for val in graph.serialize(format='turtle').decode().splitlines() if not val.startswith('@prefix')] output = ' '.join(output) assert 'RO_has_phenotype: ' in output assert 'GENO:0000385' in output

I slightly modified the test case for this so that it forces us to match the longest prefix which is contained in an iri instead of the shortest. I'm pretty sure we can improve significantly on the performance here if needs be.

Actual issue: if there is an exact prefix match then it doesn't check to see if there are longer matches. Will fix.

@gromgull - just a ping here to see if this can be merged. also is there a timeline for a release?

I hope to get a 4.2.2 out "quite soon", i.e. maybe before end of february. I'll review this before then!