(original) (raw)

On Sun, Jun 24, 2012 at 3:51 AM, "Martin v. L�wis" <martin@v.loewis.de> wrote:

On 23.06.2012 17:58, Antoine Pitrou wrote:
> On Sat, 23 Jun 2012 17:55:24 +0200
> martin@v.loewis.de wrote:
>>> That's true. I would have hoped for it to be recognized only when
>>> there's at least one module or package inside, but it doesn't sound
>>> easy to check for (especially in the recursive namespace packages case
>>> - is that possible?).
>>
>> Yes - a directory becomes a namespace package by not having an \_\_init\_\_.py,
>> so the "namespace package" case will likely become the default, and people
>> will start removing the empty \_\_init\_\_.pys when they don't need to support
>> 3.2- anymore.
>
> Have you tested the performance of namespace packages compared to
> normal packages?

No, I haven't.

It's probably not worthwhile; any performance cost increase due to looking at more sys.path entries should be offset by the speedup of any subsequent imports from later sys.path entries.

Or, to put it another way, almost all the extra I/O cost of namespace packages is paid only once, for the \*first\* namespace package imported.� In effect, this means that the amortized cost of using namespace packages actually \*decreases\* as namespace packages become more popular.� Also, the total extra overhead equals the cost of a listdir() for each directory on sys.path that would otherwise not have been checked for an import.� (So, for example, if even one import fails over the life of a program's execution, or it performs even one import from the last directory on sys.path, then there is no actual extra overhead.)

Of course, there are still cache validation stat() calls, and they make the cost of an initial import of a namespace package (vs. a self-contained package with \_\_init\_\_.py) to be an extra N stat() calls, where N is the number of sys.path entries that appear \*after\* the sys.path directory where the package is found.� (This cost of course must still be compared against the costs of finding, opening, and running an empty \_\_init\_\_.py\[co\] file, so it may actually still be quite competitive in many cases.)

For imports \*within\* a namespace package, similar considerations apply, except that N is smaller, and in the simple case of replacing a self-contained package with a namespace (but not adding any additional path locations), N will be zero, making imports from inside the namespace run exactly as quickly as normal imports.

In short, it's not worth worrying about, and definitely nothing that should cause people to spread an idea that \_\_init\_\_.py somehow speeds things up.� If there's a difference, it'll likely be lost in measurement noise, due to importlib's new directory caching mechanism.