Issue 1440472: email.Generator is not idempotent (original) (raw)
The documentation for the email.Generator module claims that the flatten() method is idempotent (i.e., output identical to the input, if email.Parser.Parser was used on the input), but it is not in all cases. The most obvious example is that you need to disable mangle_from and set maxheaderlen=0 to disable header wrapping. This could be considered common sense, but the documentation should mention it, as both are enabled by default. (unixfrom can also create differences between input and output, but is disabled by default.) More importantly, whitespace is not preserved in headers: if there are extra spaces between the header name and the header contents, it will be collapsed to a single space.
This little snippet will demonstrate the problem:
parser = email.Parser.Parser()
msg = parser.parse(sys.stdin)
print msg
gen = email.Generator.Generator(sys.stdout,
mangle_from_=False, maxheaderlen=0) gen.flatten(msg, unixfrom=False)
Feed it a single message with extra spaces beween field name and field contents in one or more fields, and diff the input and the output.
It is probably not worth actually making these routines idempotent, as preserving whitespace is not important in most applications and would require extra bookkeeping. However, as long as the documentation claims the routines are idempotent, it is a bug not to be. In my particular application, it was important to be truly idempotent, so this was a problem. Had the documentation not made false claims, I would have known from the start that I needed to write my own versions of the routines in the email module.