How should package collection work? · Issue #7777 · pytest-dev/pytest (original) (raw)
I am working on cleaning up our collection code, but the current behavior seems odd and incidental, therefore I'd first like to discuss how it should behave.
pytest and operating system versions: pytest 6.0.2/master on Linux.
Current behavior
Consider the following filesystem tree:
a
├── b
│ ├── c
│ │ ├── d
│ │ │ ├── __init__.py
│ │ │ └── test_d.py
│ │ └── test_c.py
│ ├── __init__.py
│ └── test_b.py
├── b2
│ ├── __init__.py
│ └── test_b2.py
├── __init__.py
└── test_a.py
This has several nested packages, but note that the c
level doesn't have an __init__.py
(just to make it more interesting).
This results in the following collection tree:
$ pytest --co a/
collected 5 items
<Package a>
<Module test_a.py>
<Function test_a1>
<Package b>
<Module test_b.py>
<Function test_b1>
<Module a/b/c/test_c.py>
<Function test_c1>
<Package d>
<Module test_d.py>
<Function test_d1>
<Package b2>
<Module test_b2.py>
<Function test_b21>
The Package
s are all flat, not nested. Namespace packages are not considered Package
s. Files not inside packages are collected as standalone Module
s.
This however does not reveal the real story. See what happens when --keep-duplicates
is used:
$ pytest --co --keep-duplicates a/
collected 11 items
<Package a>
<Module test_a.py>
<Function test_a1>
<Package b>
<Module test_b.py>
<Function test_b1>
<Module test_c.py>
<Function test_c1>
<Package d>
<Module test_d.py>
<Function test_d1>
<Package b2>
<Module test_b2.py>
<Function test_b21>
<Package b>
<Module test_b.py>
<Function test_b1>
<Module test_c.py>
<Function test_c1>
<Package d>
<Module test_d.py>
<Function test_d1>
<Module a/b/c/test_c.py>
<Function test_c1>
<Package d>
<Module test_d.py>
<Function test_d1>
<Package b2>
<Module test_b2.py>
<Function test_b21>
This has all of the previous collectors, but also duplicates which are nested under each Package.
Technical details
For this interested, the code details are as follows:
pytest has two recursive filesystem collectors, Session
and Package
.
Session.collect() walks the entire trees (of the given command line arguments) recursively in BFS order and creates collectors for Package
s and Module
s. It has various obscure code to special-case Package
s and exclude files belonging directly to the package. The Collector
s it creates always have the Session
as parent (i.e. flat).
Note that the Collector
s themselves are only recursively expanded to Item
s after the above is finished. (This step is done by Session.genitems()
which calls each Collector
's own collect()
method).
Package.collect() also walks the package's directory recursively. It has some code to try to avoid collecting Modules
belonging to sub-packages, but otherwise creates Module
s and Package
s with itself as parent.
Since the stuff collected by Package
s was already collected by the Session
(with the exception of a package's own direct files) they are mostly discarded as duplicates unless --keep-duplicates
is used.
Question
My question is, what do we want to happen?
Evidently the nesting is not super important, otherwise we would have heard loud complaints by now (though there are some issues about this). But the nesting does have an effect on how package-scope fixtures are applied - reuse from super-package or not?
The duplication seems definitely broken.
And there is a question on how PEP 382 namespace packages fit into this (if it all).
Would be happy for any thoughts!