Issue 1210680: Split email headers near a space (original) (raw)

Hello,

I recently used Python to automatically send messages to my gmail account. I was surprised to find out that some of the words in the subjects of messages were split by a space character which came from nowhere.

It turns out that the international (Hebrew) subject was split into multiple lines by the email package, sometimes in the middle of words. Gmail treats these line breaks as spaces, so words gets cut into two. I've checked, and there are email clients which ignore the line breaks, so the subject looks ok.

I added four lines to the _binsplit function of email.Header, so that if there is a space character in the string, it will be splitted there. This fixes the problem, and subjects look fine again. These four lines (plus a comment which I wrote) are:

# Try to find a place in splittable[:i] which is

near a space, # and split there, so that clients which interpret the line break # as a separator won't insert a space in the middle of a word. if splittable[i:i+1] != ' ': spacepos = splittable.rfind(' ', 0, i) if spacepos != -1: i = spacepos + 1

These lines should be added before the last three lines of _binsplit. Sorry about not attaching a diff file - I currently don't have diff at hand.

Thank you, Noam Raphael

Yes, if there's a bug here and it can be fixed without a major behavior change then it could be backported.

I'm not clear on what the bug is, though, since there is no example given. If the Hebrew is encoded as encoded words, it can and will be split in the middle of words, but the RFC2047 reassembly process removes those spaces (ie: this may be/may have been a gmail bug).

Without a test case we can't be sure.