(original) (raw)

Wait a second, this is how I understood it but what Nick said made me think otherwise...

On Sun, Jan 12, 2014 at 6:22 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Jan 12, 2014 at 12:52:18PM +0100, Juraj Sukop wrote:
\> On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano <steve@pearwood.info>wrote:
>
\> Just to check I understood what you are saying. Instead of writing:
\>
\> � � content = b'\\n'.join(\[
\> � � � � b'header',
\> � � � � b'part 2 %.3f' % number,
\> � � � � binary\_image\_data,
\> � � � � utf16\_string.encode('utf-16be'),
\> � � � � b'trailer'\])

Which doesn't work, since bytes don't support %f in Python 3.

I know and this was an example of the ideal (for me, anyway) way of formatting bytes.
First, "utf16\_string" confuses me. What is it? If it is a Unicode
string, i.e.:

It is a Unicode string which happens to contain code points outside U+00FF (as with the TTF example above), so that it triggers the (at least) 2-bytes memory representation in CPython 3.3+. I agree, I chose the variable name poorly, my bad.


� � content = '\\n'.join(\[
� � � � 'header',
� � � � 'part 2 %.3f' % number,
� � � � binary\_image\_data.decode('latin-1'),
� � � � utf16\_string, �# Misleading name, actually Unicode string
� � � � 'trailer'\])

Which, because of that horribly-named-variable, prevents the use of simple memcpy and makes the image data occupy way more memory than as when it was in simple bytes.
Both examples assume that you intend to do further processing of content
before sending it, and will encode just before sending:

Not really, I was interested to compare it to bytes formatting, hence it included the "encode()" as well.