[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 (original) (raw)

Juraj Sukop juraj.sukop at gmail.com
Fri Jan 10 18:17:02 CET 2014


(Sorry if this messes-up the thread order, it is meant as a reply to the original RFC.)

Dear list,

newbie here. After much hesitation I decided to put forward a use case which bothers me about the current proposal. Disclaimer: I happen to write a library which is directly influenced by this.

As you may know, PDF operates over bytes and an integer or floating-point number is written down as-is, for example "100" or "1.23".

However, the proposal drops "%d", "%f" and "%x" formats and the suggested workaround for writing down a number is to use ".encode('ascii')", which I think has two problems:

One is that it needs to construct one additional object per formatting as opposed to Python 2; it is not uncommon for a PDF file to contain millions of numbers.

The second problem is that, in my eyes, it is very counter-intuitive to require the use of str only to get formatting on bytes. Consider the case where a large bytes object is created out of many smaller bytes objects. If I wanted to format a part I had to use str instead. For example:

content = b''.join([
    b'header',
    b'some dictionary structure',
    b'part 1 abc',
    ('part 2 %.3f' % number).encode('ascii'),
    b'trailer'])

In the case of PDF, the embedding of an image into PDF looks like:

10 0 obj
  << /Type /XObject
     /Width 100
     /Height 100
     /Alternates 15 0 R
     /Length 2167
  >>
stream
...binary image data...
endstream
endobj

Because of the image it makes sense to store such structure inside bytes. On the other hand, there may well be another "obj" which contains the coordinates of Bezier paths:

11 0 obj
...
stream
0.5 0.1 0.2 RG
300 300 m
300 400 400 400 400 300 c
b
endstream
endobj

To summarize, there are cases which mix "binary" and "text" and, in my opinion, dropping the bytes-formatting of numbers makes it more complicated than it was. I would appreciate any explanation on how:

b'%.1f %.1f %.1f RG' % (r, g, b)

is more confusing than:

b'%s %s %s RG' % tuple(map(lambda x: (u'%.1f' % x).encode('ascii'), (r,

g, b)))

Similar situation exists for HTTP ("Content-Length: 123") and ASCII STL ("vertex 1.0 0.0 0.0").

Thanks and have a nice day,

Juraj Sukop

PS: In the case the proposal will not include the number formatting, it would be nice to list there a set of guidelines or examples on how to proceed with porting Python 2 formats to Python 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140110/c2eb5dbe/attachment.html>



More information about the Python-Dev mailing list