(original) (raw)

On Fri, Jan 10, 2014 at 3:40 PM, Juraj Sukop <juraj.sukop@gmail.com> wrote:

What this all means is that the PDF objects are expressed in ASCII, "stream" objects like images and fonts may have a binary part and I never saw those UTF+16 strings.

hmm -- I wonder if they are out there in the wild, though....

u"stream\\n%s\\nendstream\\nendobj"%binary\_data.decode('latin-1')

The argument for dropping "%f" et al. has been that if something is a text, then it should be Unicode. Conversely, if it is not text, then it should not be Unicode.

????

What I'm trying to demostrate / test is that you can use unicode objects for mixed binary + ascii, if you make sure to encode/decode using latin-1 ( any others?). The idea is that ascii can be seen/used as text, and other bytes are preserved, and you can ignore whatever meaning latin-1 gives them.

using unicode objects means that you can use the existing string formatting (%s), and if you want to pass in binary blobs, you need to decode them as latin-1, creating a unicode object, which will get interpolated into your unicode object, but then that unicode gets encoded back to latin-1, the original bytes are preserved.

I think this it confusing, as we are calling it latin-1, but not really using it that way, but it seems it should work.

\-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division

NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov