Only use unicode-compatible values for lexical space by nateprewitt · Pull Request #674 · RDFLib/rdflib (original) (raw)

@joernhees I'm not tied to the idea of requiring them to be manually decoded, but I do think the approach provides the least amount of discomfort for the most users.

The three primary issues I see with automatically decoding are the subtle modification of the Literal value, the issues with bytes objects between Python 2/3, and as you noted, the fact that Literal is defined to be unicode in the class hierarchy.

With automatically decoding, I can run a SPARQL query and get the Literal back as its Base64-encoded value. When I try to compare that with the value of my Literal node in Python, it will differ. It will likely be hard to determine what's happening here without digging into this issue tracker. We also have the problem of the value content possibly not being string compatible, which isn't true for the rest of RDFLib's literals. If I create an loop treating my literals as strings, RDFLib may crash here which can be painful especially with large/changing datasets.

Regarding returning bytes, I'm generally in favor of bytes over encoded-strings, however this should preferably be done uniformly. Returning bytes here introduces the issue of the literal having a type of str in Python 2 and bytes (and not str/unicode) in Python 3. That means code has the possibility of diverting unexpectedly when running a program with different versions. This is something to be expected when switching versions but I've found things like isinstance(value, str) or str(value) are common stumbling blocks, even for those experienced with Python. It's easy for unchanged code that was working to suddenly break without a immediate explanation to why.

As for Literals subclassing unicode, returning bytes fundamentally undermines some core premises in the library. While we may be able to remove that single cast to unicode from XSDToPython, most of the Literal class' functionality relies on it being castable with unicode. We lose the ability to use all equivalences (__eq__,__lt__, etc), and other basic functionality.

All that said, I can look into amending the patch for a more agreeable solution if this one is a non-starter.