PEP-508: Any plans for conditional dependency versioning? (original) (raw)
Hi folks!
I want to create a PIP package and need to restrict some dependent package versions.
pyarrow < 8 if using PySpark < 3
pyarrow < 16 if using PySpark >= 3
Current PEP-508 standard does not allow setting condititional dependencies.
Is it planned in future?
groodt (Greg Roodt) May 5, 2025, 9:05am 2
Dependencies can be conditional. It’s just that they’re conditional based on environment markers and extras.
So one way to achieve what you’re looking for is to provide the same package with different extras. This lets the package author and installer agree on what to install.
JamesParrott (James Parrott) May 5, 2025, 9:16am 3
Are conditional dependencies required when, as in your example, the conditions in question are both mutually exclusive, and define an outcome for every possibility?
Doesn’t anything like this work:
pyarrow<8;PySpark<"3" pyarrow<16;PySpark>="3"
I don’t claim to understand every nuance of marker ops and environment markers, but it’s a powerful mechanism. Dependency specifiers - Python Packaging User Guide
pitrou (Antoine Pitrou) May 5, 2025, 9:41am 4
This is a bit weird, why do you need to do that for third-party packages?
(note: I’m a PyArrow maintainer)
Felix-neko (Felix Neko) May 5, 2025, 12:30pm 5
Are conditional dependencies required when, as in your example, the conditions in question are both mutually exclusive, and define an outcome for every possibility?
Yes, I have to set the version of pyarrow for both of these the mutually exclusive conditions.
Alas, such conditions don’t work yet.
pf_moore (Paul Moore) May 5, 2025, 12:36pm 6
To answer your question, no there’s no such feature at the moment, and no plans to add it. Basically, no-one has requested it, and unless it’s likely to be of use to a significant number of users, it’s not realistically going to be implemented.
As others have commented, I think you’d be better off thinking about why you want this, and whether you can achieve what you’re after a different way. My immediate thought is that if you can’t use pyarrow >=8, < 16 when you’re using PySpark < 3, that’s something that should be part of the PySpark metadata, not something you have to encode in your application’s dependencies.
Felix-neko (Felix Neko) May 5, 2025, 12:36pm 7
I want to get the folloging behavior:
When my user executes pip3 install 'my_library' 'pyspark==2.4.8'
there should be installed one version of pyarrow as a dependency of my_library
.
When my user executes pip3 install 'my_library' 'pyspark==3.4.4'
there should be executed another version of pyarrow (also as a dependency my li_library
).
I don’t want my users to manually set dependency specifiers as pip3 install my_library[pyspark3]
or pip3 install my_library[pyspark2]
, I want to select dependency versions automatically, based on other dependency versions (of pyspark version). If pyspark version is not set manually I want to get highest available pyspark version and highest available pyarrow version it supports.
barry-scott (Barry Scott) May 5, 2025, 2:19pm 8
Are the different condition based on the version of python the user is using?
e.g PySpeak >= 3 means a new python then when PySpeak >= 2?
If so you could build a set of wheels for each python version and upload them I assume?
dimbleby (David Hotham) May 5, 2025, 3:26pm 9
The last release in the pyspark 2 series was made quite some time before pyarrow 8 existed (even longer, of course, before pyarrow 16).
Proposing that pyspark ought to have used upper bounds in its constraints to make guesses about future compatibility is edging into contentious territory…
Probably if OP is really having problems with older pyspark, they should simply drop support for it. 3.0.0 is almost five years old now.
notatallshaw (Damian Shaw) May 5, 2025, 4:04pm 10
Even if there was a big demand for this, let’s not overload marker syntax, which really was not designed for this.
What is being asked for is a specific case of general Boolean satisfiability. If requirements were to support this I would rather it was just done fully, e.g:
(A > 3 and B < 5) or (A <= 3 and B >= 5)
But I’m not sure if any current Python package resolvers could handle the complexity of this as soon as requirements become non-trivial.
Any proposal would have to get consensus among all popular Python installers (pip, uv, poetry, etc ) and IMO ideally have a proof of concept for pip.
The benefits seem marginal and the effort large.
jamestwebber (James Webber) May 5, 2025, 4:11pm 11
Might as well expand it to triplets (A > 3 and B < 5 and C == 4
) so people can embed a full-blown 3-SAT problem into their requirements file.
(2-SAT is at least solvable in polynomial time, but I’m guessing it’d still be a bear…)
JamesParrott (James Parrott) May 5, 2025, 4:48pm 12
Ah OK. I understand now - cheers. On the surface it seems simple to add dependency version comparisons as an environment marker, but I can imagine that makes the job of the dependency resolver a lot more complicated.
Felix-neko (Felix Neko) May 5, 2025, 8:49pm 13
Hmm. It looks like there is one more question. Highest available version of which package should our resolver prefer of version of A is not set externally (when we just do pip3 install my_library
, without any other packages A or B with versions binded).
Should we prefer the highest version of A (with B < 5) or the highest version of B (with A <= 3) ?
If we just add conditional dependencies in our specifiers like in OP post – we can get the resolution order: that package B version should be resolved after package A version is resolved and set as the highest available.
groodt (Greg Roodt) May 5, 2025, 8:52pm 14
Personally, I prefer the extras syntax and the explicit control handed to the user and handled by the resolver. If the user needs to install a particular version of pyspark and so does your library, have your users provide an extra when installing your library and the resolver will ensure the right version of pyspark gets installed.
I wouldn’t want the installation of my_library transitive dependencies to conditionally depend on arbitrary dependencies in my environment. That means other libraries can influence my_library transitive deps.
I prefer:
pip install my_library[pyspark3]
Compared to:
pip install my_library pyspark>=3
I prefer:
pip install my_library[pyspark2]
Compared to:
pip install my_library pyspark<3
pf_moore (Paul Moore) May 5, 2025, 9:36pm 15
That’s a separate question. Boolean satisfiability algorithms (the mathematics behind the sort of constraints you’re asking for) are about identifying which solutions (if any) exist. They do not assign any sort of ranking of which is “best”. That’s something that would need to be decided (and implemented) as a separate step. It’s entirely possible that there’s no general way of determining which solution to prefer, in a way that would match user expectations.
There’s no “just” about this. The original post here is just as much about a satisfiability problem as the syntax @notatallshaw described, just a more limited form. With your example:
pyarrow < 8 if using PySpark < 3
pyarrow < 16 if using PySpark >= 3
you can’t know what constraint to use for pyarrow until you’ve worked out which version of PySpark you will be using. And you can’t do that until you have the full set of constraints, including the constraint on pyarrow.
Problems like this aren’t solved by starting with one package, finding a solution, then looking at the next package, etc. That’s what pip’s “legacy” resolver did, and to be blunt, it sucked. It had bugs, and created invalid environments. We switched for a reason.
If you want to understand the complexities involved here, you can read up on constraint solving algorithms. There’s a lot of research (much of it pretty recent). Most of it is broadly applicable to Python - although be careful, as a lot of approaches assume you know all the constraints before you start, which isn’t possible with Python.
Disclaimer: I only have a high-level understanding of the subject here (and I was one of the developers who worked on pip’s current resolver!) - it’s a very deep subject, and there’s active research so what I know may be out of date. What I’ve said above is broadly accurate, though (I hope - @notatallshaw correct me if I’ve said anything daft!)
notatallshaw (Damian Shaw) May 6, 2025, 12:15am 16
@pf_moore is correct, the problem of rank is just as present with the syntax described in the original post as the syntax I described.
Worse, the problem of rank is just as present with limited requirement options today, it’s just less obvious to most users.
Here is a minimal example:
- User requests
A
andB
(in that order) - There will be a transitive dependency
C
A
,B
,C
all have two versionsv1
andv2
Av2
depends onC==1
Bv2
depends onC==2
Av1
depends onC==2
Bv1
depends onC==1
Which of the two solutions is “best”?
Av2
,Bv1
,Cv1
Av1
,Bv2
,Cv2
The first one because the user requested A first? Or the second one because more versions are “newer”? What if the choice is on transitive dependencies not direct user requirements? What if there’s an equal amount of “newer” versions? Can you come up with a metric that is both user intuitive and holds up to complex examples?
Introducing more general boolean satisfiability operations to users doesn’t introduce the problem of rank, it just makes it more obvious to users.
This is an open problem for, I think all, current Python package resolvers, if they can define a “good” way of ranking solutions, and if they can embed that into their resolver to be more likely to find a “good” solution.