How should package collection work? · Issue #7777 · pytest-dev/pytest (original) (raw)

I am working on cleaning up our collection code, but the current behavior seems odd and incidental, therefore I'd first like to discuss how it should behave.

pytest and operating system versions: pytest 6.0.2/master on Linux.

Current behavior

Consider the following filesystem tree:

a
├── b
│   ├── c
│   │   ├── d
│   │   │   ├── __init__.py
│   │   │   └── test_d.py
│   │   └── test_c.py
│   ├── __init__.py
│   └── test_b.py
├── b2
│   ├── __init__.py
│   └── test_b2.py
├── __init__.py
└── test_a.py

This has several nested packages, but note that the c level doesn't have an __init__.py (just to make it more interesting).

This results in the following collection tree:

$ pytest --co a/
collected 5 items

<Package a>
  <Module test_a.py>
    <Function test_a1>
<Package b>
  <Module test_b.py>
    <Function test_b1>
<Module a/b/c/test_c.py>
  <Function test_c1>
<Package d>
  <Module test_d.py>
    <Function test_d1>
<Package b2>
  <Module test_b2.py>
    <Function test_b21>

The Packages are all flat, not nested. Namespace packages are not considered Packages. Files not inside packages are collected as standalone Modules.

This however does not reveal the real story. See what happens when --keep-duplicates is used:

$ pytest --co --keep-duplicates a/
collected 11 items

<Package a>
  <Module test_a.py>
    <Function test_a1>
  <Package b>
    <Module test_b.py>
      <Function test_b1>
    <Module test_c.py>
      <Function test_c1>
    <Package d>
      <Module test_d.py>
        <Function test_d1>
  <Package b2>
    <Module test_b2.py>
      <Function test_b21>
<Package b>
  <Module test_b.py>
    <Function test_b1>
  <Module test_c.py>
    <Function test_c1>
  <Package d>
    <Module test_d.py>
      <Function test_d1>
<Module a/b/c/test_c.py>
  <Function test_c1>
<Package d>
  <Module test_d.py>
    <Function test_d1>
<Package b2>
  <Module test_b2.py>
    <Function test_b21>

This has all of the previous collectors, but also duplicates which are nested under each Package.

Technical details

For this interested, the code details are as follows:

pytest has two recursive filesystem collectors, Session and Package.

Session.collect() walks the entire trees (of the given command line arguments) recursively in BFS order and creates collectors for Packages and Modules. It has various obscure code to special-case Packages and exclude files belonging directly to the package. The Collectors it creates always have the Session as parent (i.e. flat).

Note that the Collectors themselves are only recursively expanded to Items after the above is finished. (This step is done by Session.genitems() which calls each Collector's own collect() method).

Package.collect() also walks the package's directory recursively. It has some code to try to avoid collecting Modules belonging to sub-packages, but otherwise creates Modules and Packages with itself as parent.

Since the stuff collected by Packages was already collected by the Session (with the exception of a package's own direct files) they are mostly discarded as duplicates unless --keep-duplicates is used.

Question

My question is, what do we want to happen?

Evidently the nesting is not super important, otherwise we would have heard loud complaints by now (though there are some issues about this). But the nesting does have an effect on how package-scope fixtures are applied - reuse from super-package or not?

The duplication seems definitely broken.

And there is a question on how PEP 382 namespace packages fit into this (if it all).

Would be happy for any thoughts!