(original) (raw)

Wait a second, this is how I understood it but what Nick said made me think otherwise...

On Sun, Jan 12, 2014 at 6:22 PM, Steven D'Aprano <steve@pearwood.info> wrote:

On Sun, Jan 12, 2014 at 12:52:18PM +0100, Juraj Sukop wrote:
\> On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano <steve@pearwood.info>wrote:
>
\> Just to check I understood what you are saying. Instead of writing:
\>
\> � � content = b'\\n'.join(\[
\> � � � � b'header',
\> � � � � b'part 2 %.3f' % number,
\> � � � � binary\_image\_data,
\> � � � � utf16\_string.encode('utf-16be'),
\> � � � � b'trailer'\])

Which doesn't work, since bytes don't support %f in Python 3.

I know and this was an example of the ideal (for me, anyway) way of formatting bytes.

�

First, "utf16\_string" confuses me. What is it? If it is a Unicode
string, i.e.:

It is a Unicode string which happens to contain code points outside U+00FF (as with the TTF example above), so that it triggers the (at least) 2-bytes memory representation in CPython 3.3+. I agree, I chose the variable name poorly, my bad.

�

� � content = '\\n'.join(\[
� � � � 'header',
� � � � 'part 2 %.3f' % number,
� � � � binary\_image\_data.decode('latin-1'),

� � � � utf16\_string, �# Misleading name, actually Unicode string
� � � � 'trailer'\])

Which, because of that horribly-named-variable, prevents the use of simple memcpy and makes the image data occupy way more memory than as when it was in simple bytes.

�

Both examples assume that you intend to do further processing of content
before sending it, and will encode just before sending:

Not really, I was interested to compare it to bytes formatting, hence it included the "encode()" as well.