fix: small InputSource related issues by aucampia · Pull Request #2255 · RDFLib/rdflib (original) (raw)
Summary of changes
I have added a bunch of tests for InputSource
handling, checking every kind of input source with most of the parsers. During this, I detected the following issues that I fixed:
rdflib.util._iri2uri()
should not URL quote thenetloc
parameter, theidna
encoding already takes care of special characters. I removed the URL quoting ofnetloc
- HexTuple parsing was handling the input source in a way that would only work for some input sources, and not raising errors for other input sources. I changed the input source handling to be more generic.
rdflib.parser.create_input_source()
incorrectly usesfile.buffer
instead ofsource.buffer
when dealing with IO stream sources.
Other changes with no runtime impact include:
- Changed the HTTP mocking stuff in test slightly to accommodate serving arbitrary files, as I used this in the
InputSource
tests. - Don't use google in tests as we keep getting
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
from it.
Checklist
- Checked that there aren't other open pull requests for
the same change. - Added tests for any changes that have a runtime impact.
- Checked that all tests and type checking passes.
- Considered granting push permissions to the PR branch,
so maintainers can fix minor issues and keep your PR up to date.
Coverage: 90.771%. Remained the same when pulling ec2e5c6 on aucampia:iwana-20230307T1924-fix_iri2uri into a146e0a on RDFLib:main.
aucampia marked this pull request as ready for review
I did this while looking at #1844 - but this does not quite address anything there. I plan to make a separate PR with some security documentation and warnings, and maybe an example of how to use python auditing and urllib.request.install_opener
to mitigate the issue. I'm going to consider that as closing the matter.
Reduced the size of changes a bit, will merge with no review as this is pretty well tested code and the non-test changes are very minor.
I have added a bunch of tests for InputSource
handling, checking
every kind of input source with every parser. During this, I detected
the following issues that I fixed:
rdflib.util._iri2uri()
was URL quoting thenetloc
parameter, but this is wrong and theidna
encoding already takes care of special characters. I removed the URL quoting ofnetloc
.HexTuple parsing was handling the input source in a way that would only work for some input sources, and not raising errors for other input sources. I changed the input source handling to be more generic.
rdflib.parser.create_input_source()
incorrectly usedfile.buffer
instead ofsource.buffer
when dealing with IO stream sources.
Other changes with no runtime impact include:
- Changed the HTTP mocking stuff in test slightly to accommodate
serving arbitrary files, as I used this in the
InputSource
tests. - Don't use google in tests as we keep getting
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
from it.
aucampia deleted the iwana-20230307T1924-fix_iri2uri branch