(original) (raw)
On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
What not building "10 0 obj ... stream" and "endstream endobj" in
Unicode and then encode to ASCII? Example:
data = b''.join((
� ("%d %d obj ... stream" % (10, 0)).encode('ascii'),
� binary\_image\_data,
� ("endstream endobj").encode('ascii'),
))
The key is "encode to ASCII" which means that the result is bytes. Then, there is this "11 0 obj" which should also be bytes. But it has no "binary\_image\_data" - only lots of numbers waiting to be somehow converted to bytes. I already mentioned the problems with ".encode('ascii')" but it does not stop here. Numbers may appear not only inside "streams" but almost anywhere: in the header there is PDF version, an image has to have "width" and "height", at the end of PDF there is a structure containing offsets to all of the objects in file. Basically, to ".encode('ascii')" every possible number is not exactly simple or pretty.
�