Issue 9771: add an optional "default" argument to tokenize.detect_encoding (original) (raw)

The function tokenize.detect_encoding() detects the encoding either in the coding cookie or in the BOM. If no encoding is found, it returns 'utf-8':

When result is 'utf-8', there's no (easy) way to know if the encoding was really detected in the file, or if it falls back to the default value.

Cases (with utf-8):

The proposal is to allow to call the function with a different default value (None or ''), in order to know if the encoding is really detected.

For example, this function could be used by the Tools/scripts/findnocoding.py script.

Patch attached.