Issue 9548: locale can be imported at startup but relies on too many library modules (original) (raw)

Created on 2010-08-09 19:09 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (29)

msg113457 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-09 19:09

Consider the two following commands, from an SVN working copy:

$ ./python -c 'import heapq; print(heapq.heapify)'

$ cat | ./python -c 'import heapq; print(heapq.heapify)' <function heapify at 0x7f5d456025a0>

In the second case, the "from _heapq import *" in heapq fails silently. The reason was a bit difficult to find, but it goes as following:

The issue might seem rather exotic, but it is a more general symptom of the problem that importing Lib/locale.py when initializing std{in,out,err} brings too many import dependencies.

A possible solution is to define a subset of locale.py that is used for bootstrap and rely on as little modules as possible.

msg113459 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-09 19:41

Attached patch fixes the issue by creating a separate "_bootlocale" module, used at bootstrap, which doesn't import collections, functools and friends.

msg113462 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-09 19:56

Updated patch also replaces imports of locale in site.py (since it can be imported early on startup).

msg113466 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-09 20:27

Chances are that the patch will also fix the test_heapq failures on some buildbots (see e.g. http://www.python.org/dev/buildbot/3.x/builders/x86%20Tiger%203.x/builds/759/steps/test/logs/stdio ).

msg113476 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2010-08-09 21:10

Antoine Pitrou wrote:

Antoine Pitrou <pitrou@free.fr> added the comment:

Attached patch fixes the issue by creating a separate "_bootlocale" module, used at bootstrap, which doesn't import collections, functools and friends.

I don't think that's the right way forward.

It's much easier to either remove the need to import those extra modules or defer their import to actual use within a function.

Both collections and functools are used in two distinct places. The re module use could also be deferred to actual use.

It would be worthwhile adding a note to the top of the module mentioning the fact that the module is loaded early on by Python and to warn about use of non-builtin module imports.

msg113481 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-09 21:30

I don't think that's the right way forward.

It's much easier to either remove the need to import those extra modules or defer their import to actual use within a function.

The latter has various issues, such as being overly tedious and potentially risk (because of the import lock). Furthermore, functools.wraps is used as a decorator at the toplevel of locale.py, which makes it currently unremovable.

The former is easier said than done, and we may end up writing poor man's reimplementations of some stdlib functions.

It would be worthwhile adding a note to the top of the module mentioning the fact that the module is loaded early on by Python and to warn about use of non-builtin module imports.

Use of non-builtin stdlib modules is legitimate for many functions which aren't used in the startup process anyway. The rationale behind the patch is precisely to define a subset that is needed at startup and shouldn't rely on stdlib facilities. The rest of the locale module is allowed to use whatever stdlib facilities it desires, which is always much more practical.

(note, this is already done elsewhere, for example _abcoll.py or _weakrefset.py)

msg113488 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2010-08-09 22:04

Antoine Pitrou wrote:

Antoine Pitrou <pitrou@free.fr> added the comment:

I don't think that's the right way forward.

It's much easier to either remove the need to import those extra modules or defer their import to actual use within a function.

The latter has various issues, such as being overly tedious and potentially risk (because of the import lock).

Really ? collections is only used in format_string and can easily be imported there. Note that putting imports into functions is a technique to remove problems with import locks - it doesn't create any.

Furthermore, functools.wraps is used as a decorator at the toplevel of locale.py, which makes it currently unremovable.

If you look at the code, that decorator is not needed at all. Furthermore the whole complication was just introduced to make some tests easier to write.

The former is easier said than done, and we may end up writing poor man's reimplementations of some stdlib functions.

Please be more specific. The modules that have an impact on import time are collections, functools and re.

It would be worthwhile adding a note to the top of the module mentioning the fact that the module is loaded early on by Python and to warn about use of non-builtin module imports.

Use of non-builtin stdlib modules is legitimate for many functions which aren't used in the startup process anyway. The rationale behind the patch is precisely to define a subset that is needed at startup and shouldn't rely on stdlib facilities. The rest of the locale module is allowed to use whatever stdlib facilities it desires, which is always much more practical.

I said "warn about the use of non-builtin modules", not disallow their use. AFAIK, the purpose of the exercise is to reduce the number of non-builtin modules being loaded early on during Python startup.

(note, this is already done elsewhere, for example _abcoll.py or _weakrefset.py)

Those are special cases since they are needed for ABCs. locale is not special enough to warrant such a disruption in module organization and it's not needed either (see above).

msg113492 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-09 22:19

I said "warn about the use of non-builtin modules", not disallow their use. AFAIK, the purpose of the exercise is to reduce the number of non-builtin modules being loaded early on during Python startup.

The purpose of the exercise is to avoid, as much as possible, the numerous issues that have already been encountered because of the startup procedure relying on locale in various conditions for initialization of the standard streams.

You are proposing that we commit a one-time fix to fix the current situation. Recent history shows that similar problems will arise again, and waste useful developer time for committing urgency fixes in order to fix the build (not to mention that the symptoms are sometimes difficult to diagnose, such as in this very issue). I am proposing a broader change which prevents, as much as possible, similar problems from reappearing and will relieve us from a dispensable burden.

(note, this is already done elsewhere, for example _abcoll.py or _weakrefset.py)

Those are special cases since they are needed for ABCs. locale is not special enough to warrant such a disruption in module organization and it's not needed either (see above).

"special enough" is in the eye of the beholder.

msg113499 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2010-08-09 23:44

Just to comment on the import lock issue when importing from within a function, it is a problem. If an import triggers the launching of a thread which calls these functions that have an import inside of them, deadlock will occur. This has been reported as an issue before so people do it (unfortunately).

Importing within functions is something to be avoided when possible.

msg113510 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2010-08-10 09:26

heapq tries to import _heapq but, at this point, the build dir (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path

The problem only exists for developers, not for an installation copy of Python?

Another approach is to call required site code earlier (eg. rewrite it in C and execute it before loading the locale module).

msg113511 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-10 09:32

heapq tries to import _heapq but, at this point, the build dir (such as "build/lib.linux-x86_64-3.2/") hasn't been added to sys.path

The problem only exists for developers, not for an installation copy of Python?

This particular problem indeed (for developers and for buildbots - see the weird test_heapq failures on some OS X buildbots).

Another approach is to call required site code earlier (eg. rewrite it in C and execute it before loading the locale module).

Indeed, but since it calls sysconfig.get_platform(), I'm not sure how much code would need to be rewritten in C.

msg113516 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2010-08-10 09:44

Indeed, but since it calls sysconfig.get_platform(), I'm not sure how much code would need to be rewritten in C.

Oh, the function is prefixed by the following comment:

XXX This should not be part of site.py, since it is needed even when

using the -S option for Python. See http://www.python.org/sf/586680

def addbuilddir():

Issue #586680 was closed as wontfix.

--

Oh yes, sysconfig.get_platform() is complex :-/

Brett wrote "If we think once can reliably add the directory based purely on whether it starts with "build/lib.", and ..." ().

msg113591 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2010-08-11 08:07

Antoine Pitrou wrote:

Antoine Pitrou <pitrou@free.fr> added the comment:

I said "warn about the use of non-builtin modules", not disallow their use. AFAIK, the purpose of the exercise is to reduce the number of non-builtin modules being loaded early on during Python startup.

The purpose of the exercise is to avoid, as much as possible, the numerous issues that have already been encountered because of the startup procedure relying on locale in various conditions for initialization of the standard streams.

You are proposing that we commit a one-time fix to fix the current situation. Recent history shows that similar problems will arise again, and waste useful developer time for committing urgency fixes in order to fix the build (not to mention that the symptoms are sometimes difficult to diagnose, such as in this very issue). I am proposing a broader change which prevents, as much as possible, similar problems from reappearing and will relieve us from a dispensable burden.

No, what I'm proposing is to make "import locale" safe during boot time. By separating out some functions into a separate module which is then supposed to be used by the boot process, you don't really solve the problem. Other code in the boot process may very well still import the main locale module and thus cause the same problems you were trying to solve by separating out the problematic code - even if this is just code that gets run via sitecustomize.py or other such hooks.

The changes I'm suggesting will be beneficial to all standard uses of the module without any such workarounds based on conventions.

msg113596 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-11 09:15

No, what I'm proposing is to make "import locale" safe during boot time. By separating out some functions into a separate module which is then supposed to be used by the boot process, you don't really solve the problem.

I do, and my experimentations show it.

Believe me, variations of this problem have been bothering us often enough in recent times that I know your solution won't work for long.

"Trying to be careful with imports in a large stdlib module" doesn't cut it, because we always need new imports when we make changes. The point of _bootlocale (you might not like the name, in which case you can suggest an alternative) is that it is restricted and static: we only need getpreferredencoding() and its dependencies, and this code isn't likely to change a lot.

(you might ask why this problem hasn't affected 2.x, and that's because in 2.x standard streams are much simpler, built-in objects; in particular, they don't need to choose an encoding for character decoding; their initialization doesn't require executing stdlib code)

Other code in the boot process may very well still import the main locale module

It doesn't. By "boot process" I really mean something very specific. It is all which runs until site.py gets executed (if it isn't skipped). There isn't a whole lot of code there. Mostly, it's initialization of standard streams, where two stdlib functions can be invoked: os.device_encoding() and locale.getpreferredencoding() (depending on the circumstances).

When sitecustomize.py gets run, everything is already set up and there's no problem importing whatever module you want.

The changes I'm suggesting will be beneficial to all standard uses of the module without any such workarounds based on conventions.

Standard uses of the module aren't problematic at all, and importing functools or collections in that context is harmless (they will probably be imported by other stdlib modules anyway).

msg113878 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2010-08-14 07:49

As we move more and more infrastructure into Python code, we're going to see this pattern (i.e. a bootstrap module underlying the real module) more and more often (e.g. I seem to recall Brett has to do something similar when playing with the pure Python import implementation).

I don't have an issue with it - it's a solid, standard solution to a recurring problem with otherwise circular dependencies.

The only changes I would suggest to Antoine's patch are to explicitly scope "interpreter startup" in the _bootlocale docstring (to make it clear that sitecustomize.py should use the ordinary locale module) and to mention the nature of _bootlocale in a comment just before the "from _bootlocale import *" line in locale.py.

msg113880 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2010-08-14 09:40

Antoine fixed #9589 by rewriting site.py code in C and calling it more much earlier: r83988.

This commit fixes the initial problem of this issue:

$ ./python -c 'import heapq; print(heapq.heapify)' $ cat | ./python -c 'import heapq; print(heapq.heapify)'

Can we close this issue, or do you consider that it is still very important to not load too much modules at startup?

msg113912 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-08-14 17:06

Can we close this issue, or do you consider that it is still very important to not load too much modules at startup?

The patch eases the bootstrap constraints by creating a bootstrap-friendly subset of locale.py. It is indeed much less pressing now that the initial bug has been fixed.

It also depends whether it is important to have a lightweight executable. If it isn't, I guess we could compile all extension modules statically.

msg113936 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2010-08-14 23:00

Nick Coghlan wrote:

Nick Coghlan <ncoghlan@gmail.com> added the comment:

As we move more and more infrastructure into Python code, we're going to see this pattern (i.e. a bootstrap module underlying the real module) more and more often (e.g. I seem to recall Brett has to do something similar when playing with the pure Python import implementation).

I don't have an issue with it - it's a solid, standard solution to a recurring problem with otherwise circular dependencies.

The only changes I would suggest to Antoine's patch are to explicitly scope "interpreter startup" in the _bootlocale docstring (to make it clear that sitecustomize.py should use the ordinary locale module) and to mention the nature of _bootlocale in a comment just before the "from _bootlocale import *" line in locale.py.

I don't object to such strategies in general, but in this case, a change of this size is neither required nor helpful, so remain a firm -1 on the patch.

The locale module already uses a C helper module. By now adding another indirection via a boot-time only Python extract, you complicate the setup - and what's worse: it doesn't solve the problem, since some other module you import at boot time may still import locale directly and for the same reason the collection module got used in locale: programmers being unaware of the implications this has.

msg160940 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2012-05-16 22:48

initializing the standard streams imports Lib/locale.py

It looks like only locale.getpreferredencoding() is needed to initialize standard streams. Another option is to add a locale encoding codec. I already proposed the idea in #13619 but it was rejected.

msg199346 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2013-10-09 21:29

Updated patch for 3.4.

msg199364 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2013-10-10 06:29

+1 This seems like a reasonable solution.

msg199367 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2013-10-10 07:58

The io module doesn't need to set temporarly the LC_CTYPE locale (which is a good thing because the change is process-wide!). If we ignore systems where CODESET is not available, the _bootlocale can be simplified to a few lines:

if sys.platform.startswith("win"): def getpreferredencoding(): import _locale return _locale._getdefaultlocale()[1] else: def getpreferredencoding(): result = nl_langinfo(CODESET) if not result and sys.platform == 'darwin': result = 'UTF-8' return result

This code can probably be implemented in C, directly in the _locale module. Would it be acceptable to modify the io module to replace locale.getpreferredencoding(False) with _locale.getpreferredencoding(False)?

Does anyone know if Python does still support systems where CODESET is not available? Which OS does not support CODESET? Would it be acceptable to fallback to locale.py if CODESET is not available?

msg199507 - (view)

Author: Christian Heimes (christian.heimes) * (Python committer)

Date: 2013-10-11 21:30

The locale module uses only collections.abc.Mapping. The import of the entire collections module can be avoided if collections.abc is renamed to _abcoll again. functools.wrap() can be replaced with:

localeconv.doc = _localeconv.doc

The other attributes are either equal (e.g. name) or do not exist on builtin functions (dict).

msg199510 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2013-10-11 21:58

... the _bootlocale can be simplified to a few lines: ...

Here is the patch implementing my proposition: bootlocale3.patch.

msg199511 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2013-10-11 22:13

"Does anyone know if Python does still support systems where CODESET is not available? Which OS does not support CODESET?"

I checked my VMs with Python, nl_langinfo(CODESET) works on:

I also tested my patch on Windows 7: _bootlocale.getpreferredencoding() works as expected (it returns "cp1252").

msg199512 - (view)

Author: Christian Heimes (christian.heimes) * (Python committer)

Date: 2013-10-11 22:14

You could raise an error and wait until somebody files a complain. :)

msg199513 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2013-10-11 22:15

New changeset fbbf8b160e8d by Antoine Pitrou in branch 'default': Issue #9548: Add a minimal "_bootlocale" module that is imported by the _io module instead of the full locale module. http://hg.python.org/cpython/rev/fbbf8b160e8d

msg199575 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2013-10-12 15:25

Just a quick favour to ask people: please post benchmark numbers of startup_nosite and normal_startup with your patches, otherwise we are taking stabs in the dark that the code complexity being suggested is worth it.

msg199581 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2013-10-12 16:05

Here normal_startup and startup_nosite wouldn't show a difference (under Linux anyway) because the locale module is only imported for non-interactive streams, AFAICT.

History

Date

User

Action

Args

2022-04-11 14:57:05

admin

set

github: 53757

2013-10-12 16:05:51

pitrou

set

status: open -> closed
resolution: fixed
messages: +

stage: patch review -> resolved

2013-10-12 15:25:07

brett.cannon

set

messages: +

2013-10-11 22:15:25

python-dev

set

nosy: + python-dev
messages: +

2013-10-11 22:14:58

christian.heimes

set

messages: +

2013-10-11 22:13:20

vstinner

set

messages: +

2013-10-11 21:58:45

vstinner

set

files: + bootlocale3.patch

messages: +

2013-10-11 21:30:24

christian.heimes

set

messages: +

2013-10-10 07:58:25

vstinner

set

messages: +

2013-10-10 06:29:42

rhettinger

set

messages: +

2013-10-10 00:00:59

eric.snow

set

nosy: + eric.snow

2013-10-09 21:29:38

pitrou

set

files: + bootlocale2.patch

2013-10-09 21:29:24

pitrou

set

nosy: + christian.heimes, benjamin.peterson

messages: +
versions: + Python 3.4, - Python 3.3

2013-10-09 17:17:50

barry

set

nosy: + barry

2012-10-02 18:39:42

pitrou

unlink

issue16101 dependencies

2012-10-02 13:08:14

brett.cannon

link

issue16101 dependencies

2012-07-22 10:12:16

flox

set

nosy: + flox

2012-05-17 17:03:39

Arfrever

set

nosy: + Arfrever

2012-05-16 22:48:20

vstinner

set

messages: +

2012-05-16 16:16:42

pitrou

set

stage: patch review
versions: + Python 3.3, - Python 3.2

2010-08-14 23:10:55

eric.araujo

set

nosy: + eric.araujo

2010-08-14 23:00:37

lemburg

set

messages: +

2010-08-14 17:06:03

pitrou

set

messages: +

2010-08-14 09:40:43

vstinner

set

messages: +

2010-08-14 07:49:32

ncoghlan

set

messages: +

2010-08-13 16:22:39

pitrou

set

nosy: + ncoghlan

2010-08-11 09:16:01

pitrou

set

messages: +

2010-08-11 08:07:39

lemburg

set

messages: +

2010-08-10 09:44:49

vstinner

set

messages: +

2010-08-10 09:32:46

pitrou

set

messages: +

2010-08-10 09:26:58

vstinner

set

messages: +

2010-08-09 23:44:39

brett.cannon

set

messages: +

2010-08-09 22:19:41

pitrou

set

messages: +

2010-08-09 22:04:44

lemburg

set

messages: +

2010-08-09 21:30:22

pitrou

set

messages: +

2010-08-09 21:10:46

lemburg

set

messages: +
title: locale can be imported at startup but relies on too many library modules -> locale can be imported at startup but relies on too many library modules

2010-08-09 21:03:11

brett.cannon

set

nosy: + brett.cannon

2010-08-09 20:27:28

pitrou

set

messages: +

2010-08-09 19:57:37

pitrou

set

files: - bootlocale.patch

2010-08-09 19:56:18

pitrou

set

files: + bootlocale.patch

messages: +

2010-08-09 19:42:21

mark.dickinson

set

nosy: + mark.dickinson

2010-08-09 19:41:29

pitrou

set

files: + bootlocale.patch
keywords: + patch
messages: +

2010-08-09 19:33:19

pitrou

set

nosy: + lemburg, rhettinger, orsenthil

2010-08-09 19:09:29

pitrou

create