[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 (original) (raw)
Juraj Sukop juraj.sukop at gmail.com
Sat Jan 11 00:40:28 CET 2014
- Previous message: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
- Next message: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Jan 10, 2014 at 10:52 PM, Chris Barker <chris.barker at noaa.gov>wrote:
On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop <juraj.sukop at gmail.com>wrote:
As you may know, PDF operates over bytes and an integer or floating-point number is written down as-is, for example "100" or "1.23".
Just to be clear here -- is PDF specifically bytes+ascii? Or could there be some-other-encoding unicode in there?
From the specs: "At the most fundamental level, a PDF file is a sequence of 8-bit bytes." But it is also possible to represent a PDF using printable ASCII + whitespace by using escapes and "filters". Then, there are also "text strings" which might be encoded in UTF+16.
What this all means is that the PDF objects are expressed in ASCII, "stream" objects like images and fonts may have a binary part and I never saw those UTF+16 strings.
u"stream\n%s\nendstream\nendobj"%binary_data.decode('latin-1')
The argument for dropping "%f" et al. has been that if something is a text, then it should be Unicode. Conversely, if it is not text, then it should not be Unicode. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140111/0fc170d5/attachment.html>
- Previous message: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
- Next message: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]