Issue 36704: logging.FileHandler currently hardcodes errors='strict' (original) (raw)

Problem description

import os
import logging

logging.basicConfig(filename='example.log',level=logging.DEBUG)
for name in os.listdir():
    logging.error('Hypothetical syntax error at line 1 of file: {}'.format(name))

The above program is incorrect.[*] Because it is liable to raise UnicodeEncodeError. This can be reproduced on a standard UTF-8 locale by creating a filename which is not valid UTF-8, e.g. touch $(printf '\xff'). Plenty of arguments have been written on this topic, but this is not my problem.

The user can report the program error, and it should not be too hard to debug. And there is a fairly straightforward workaround, following Python's IOError: use repr() when outputing filenames.

But there is another issue with the above. The docs advise that most programs deployed to production, will want to set logging.raiseExceptions = false.

My motivating example is the Linux package manager dnf, which followed this advice. Specifically when they wanted to avoid UnicodeEncodeError. [][*]

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1303293#c17

I think UnicodeEncodeError is an important case to handle, but the logging module does not handle it in a good way. When we have logging.raiseExceptions = false, the problem messages will be lost. Those messages could be critical to troubleshooting the user's problem. It is even possible that all messages are lost - I think this would be very frustrating to troubleshoot.

Alternative solutions which have been considered

A possible solution

When you set raiseExceptions = false, logging.FileHandler and friends should use errors='backslashreplace'.

errors='backslashreplace' is already the default for sys.stderr. Matching this seems nice in case the program uses the default stderr logging in some configurations.

A log file full of encoding noise will be a specific sign, that can be used in troubleshooting. And in cases similar to my example program, parts of the message could be entirely readable. The end-user might be able to use the log message without noticing the incorrectness at all. This is entirely consistent with the rationale for logging.raiseExceptions = false.

Previously you could set logging.raiseExceptions after you create the logger. It will be a bit inconsistent if FileHandler relies on setting the errors parameter of open(). It seems fairly easy to document this restriction. But if that is not considered acceptable, we would either need some weird system that calls stream.reconfigure(), or muck around like dnf.i18n.UnicodeStream does:

try:
    stream.write(s)
except UnicodeEncodeError:
    if logging.raiseExceptions:
        raise
    else:
        stream.flush()  # make sure nothing gets out of order!
        s = s.encode(stream.encoding, 'backslashreplace')
        stream.buffer.write(s)

[*] C programs which segfault are said to be incorrect (or there is an error in system software or hardware). I am using similar phrasing for python programs which raise unhandled UnicodeError's. I am not sure if it is a good phrase to use for a Python program, but I hope my overall point is fairly clear.

[**] dnf developers did not appear to work on the correctness issue they found. It might be a bug in gettext.

[***] In the linked case, none of dnf's messages were readable. But I sympathize that dnf is so critical, it seems useful for dnf to try and hobble along. Even in cases like this one. As the user attempts to work towards some desired end state... The user might temporarily ignore the non-fatal problem in dnf, because their current problem seems more important, e.g. trying to install some software needed to recover or back up document files. At best, they might know some diagnostic software that knows how to diagnose they have a locale problem :-). Or more serendipitously, installing certain other software that also suffers a locale problem might help them realize what is happening.

Oops. I assumed logging.raiseExceptions == True (the default) actually raises exceptions, but it doesn't. It prints the exception to stderr and continues.

E.g. traditional UNIX daemons run with stderr connected to /dev/null. Consider during development / debugging, or if you overlooked setting raiseExceptions altogether, or maybe you made the same mistake I did (because the reference-style documentation doesn't explain this behaviour). My above proposal was to escape encoding errors only when raiseExceptions == False. So when raiseExceptions == True, we might still lose log messages, and the UnicodeEncodeError might be printed to /dev/null i.e. also silently discarded. This suggests my proposal was trying to be too clever.

I guess the simplest approach is for emit() to handle encode errors by using 'backslashreplace' to log the message AND calling handleError().

Instead of the original code sketch, change StreamHandler.emit() to

    try:
        # [issue 35046](issue35046 "[closed] logging.StreamHandler performs two syscalls when one would do"): merged two stream.writes into one.
        msg = self.format(record) + self.terminator
        stream = self.stream
        try:
            stream.write(msg)
            self.flush()
        except UnicodeEncodeError:
            # Try to log something, even pure mojibake might provide a clue
            encoding = getattr(stream, 'encoding', None)
            if encoding:
                bytes = msg.encode(encoding, errors='backslashreplace')
                msg = bytes.decode(encoding)
                stream.write(msg)
                self.flush()
            # Call handleError() as normal
            raise
    except RecursionError:  # See [issue 36272](issue36272 "[closed] Recursive logging crashes Interpreter in Python 3")
        raise
    except Exception:
        self.handleError(record)

(And I'd like a similar change for SyslogHandler at least).