[Python-Dev] Bytes path related questions for Guido (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Mon Aug 25 01:19:19 CEST 2014


On 25 Aug 2014 03:55, "Guido van Rossum" <guido at python.org> wrote:

Yes on #1 -- making the low-level functions more usable for edge cases by supporting bytes seems fine (as long as the support for strings, where it exists, is not compromised).

Thanks!

The status of pathlib is a little unclear to me -- is there a plan to eventually support bytes or not?

It's text only and Antoine plans to keep it that - the concatenation operations, etc, are really only safe if you decode first.

For #2 I think you should probably just work with the others you have mentioned.

Yes, that sounds like a good idea. There's been some good progress on the issue tracker, so I think we can thrash out some workable (and comprehensible!) utilities that will be useful in their own right while also serving as aids to understanding for the underlying mechanisms.

Cheers, Nick.

On Sat, Aug 23, 2014 at 9:44 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

At Guido's request, splitting out two specific questions from Serhiy's thread where I believe we could do with an explicit "yes or no" from him. 1. Should we accept patches adding support for the direct use of bytes paths in lower level filesystem manipulation APIs? (i.e. everything that isn't pathlib) This was Serhiy's original question (due to some open issues [1,2]). I think the answer is yes, as we already do in some cases, and the "pathlib doesn't support binary paths" design decision is a high level platform independent API vs low level potentially platform dependent API one rather than being about disallowing the use of bytes paths in general. [1] http://bugs.python.org/issue19997 [2] http://bugs.python.org/issue20797 2. Should we add some additional helpers to the string module for dealing with surrogate escaped bytes and other techniques for smuggling arbitrary binary data as text? My proposal [3] is to add: * string.escapedsurrogates (constant with the 128 escaped code points) * string.clean(s): replaces surrogates with '\ufffd' or another specified code point * string.redecode(s, encoding): encodes a string back to bytes and then decodes it again using the specified encoding (the old encoding defaults to 'latin-1' to match the assumptions in WSGI) "s != string.clean(s)" would then serve as a check for "does this string contain any surrogate escaped bytes?" [3] http://bugs.python.org/issue18814#msg225791 Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140825/c425050d/attachment.html>



More information about the Python-Dev mailing list