Moving off of egg_info · Issue #4713 · pypa/pip (original) (raw)

@pradyunsg

Since I think it's pretty clear that we're going to have to remove egg_info, I wanted to discuss the high-level approach to this here. First, we need to agree that this is needed.

I disagree here.

AFA PEP 517 and the resolver are concerned, the metadata hook would mean
that pip doesn't need to build a wheel to get dependency information. The
resolver would be unaffected by whether there's the metadata hook
available; it would just ask what the metadata is and the rest of the build
toolchain would be brought into play if the metadata hook is not available.

In this pre-PEP 517 world, egg_info serves as that metadata hook for sdists. I do not want to see it go. It's an important feature (not just an optimisation IMO) -- I don't want a numpy wheel being built just so that pip can realise that you wouldn't want to it installing because it conflicts with your requirements.


Everything else you've mentioned is really not directly related to the topic you've set for the issue. I'll respond now though since I don't think there's much discussion to be had on those fronts anyway. I suggest you make new issues for them if you think it makes sense to have a discussion on those.


should we install it so that WheelBuilder can run?

IIRC, pip does not need wheel to be installed for installing from a .whl file, it needs wheel to build one.

I think it would be a good idea to split InstallRequirement into two classes: LegacyInstallRequirement and NewInstallRequirement, representing egg_info/setup.py install and bdist_wheel. What do others think of this idea?

This is actually on my roadmap -- I'm thinking of calling them "Distributions" -- like LegacySourceDistribution, PEP517SourceDistribution, WheelDistribution. I'll get to these eventually, I think there's a lot of stuff that needs to move around before this can be done cleanly.

@pfmoore

I'm assuming this is in a PEP 517 world.

So I'd say yes, we should (initially) rename InstallRequirement to LegacyInstallRequirement on the understanding that it's present for the purposes of retaining the existing behaviour until we're in a position to drop it in favour of the new process. This should be nothing more than a renaming, though - we should not touch the logic at all.

Secondly, we introduce a new PEP517InstallRequirement class. This one uses only PEP 517 hooks to do the build and install. So for example, there's no question of needing wheel as there's no WheelBuilder in this part - just a build_wheel hook call that it's up to the backend to ensure works (and that may mean that the backend needs to implement get_requires_for_build_wheel, so that it can tell pip that wheel is needed - but that's the backend's choice, not pip's.

I'd suggest that InstallRequirement remain as a base class, that holds any common code and defines the public interface that the 2 concrete implementations are providing. Then all code in pip that works with install requirements can safely use only the interface defined by InstallRequirement with no need to care whether it's the new or legacy path.

I would imagine this as 2 separate PRs:

  1. Implement the base class and rename the existing class.
  2. Implement the PEP 517 code.

The first PR is a refactoring, and so should be covered by the existing test suite. The second PR is wholly new code, and will require a new set of unit and functional tests. It may be hard to write functional tests in the absence of a backend, but the PEP includes a sample implementation of a complete, if basic, backend, which we should be able to use for tests.

Also, I'd strongly recommend that the second PR use @takluyver's PEP 517 helper library, if he has it ready in time. The correct implementation of hook calling code is tricky, and I don't think we should implement or maintain our own code for this if there's a library we can use.

@pfmoore

This is actually on my roadmap -- I'm thinking of calling them "Distributions" -- like LegacySourceDistribution, PEP517SourceDistribution, WheelDistribution. I'll get to these eventually, I think there's a lot of stuff that needs to move around before this can be done cleanly.

That sounds exactly like what I was thinking of with my "first PR" above. I hadn't thought of the wheel-only case so that's a good point too. As you're currently very involved with this are of the code, I'm happy to wait for this to be done the way you're proposing - and then we can look at implementing PEP517SourceDistribution.

@ghost

If you look at the PR that's going into setuptools, I used a ProcessPoolExecutor to call the backend functions. It's currently 40 lines that I think can be shrunk down to ~30 and side-steps all of the issues that come with even looking at the command line.

@pfmoore

@ghost

In this pre-PEP 517 world, egg_info serves as that metadata hook for sdists. I do not want to see it go. It's an important feature (not just an optimisation IMO) -- I don't want a numpy wheel being built just so that pip can realise that you wouldn't want to it installing because it conflicts with your requirements.

@njsmith Do you have a response to this? I actually agree with @pradyunsg here which is why I was attempting to implement dist_info.

@pfmoore

Not trying to speak for @njsmith but I think you're confusing two things here.

Code for legacy sdists should not change - specifically it will continue to call egg_info. There's absolutely no point trying to make any sort of major change (such as removing egg_info calls) to that code, as it'll eventually be removed anyway. New code for PEP 517 has no need for egg_info. The new hooks replace all of that interface with a unified one that all backends will provide.

My understanding is that @pradyunsg and myself have been telling you we can't remove egg_info from the existing code, whereas @njsmith was trying to tell you that there's no need for egg_info in the new code. Both statements are true.

@pradyunsg

Great. The router is acting up. I'd typed a nice long message and now I've lost it. It was basically what @pfmoore is saying - I think you're confusing between how legacy packages and PEP 517 packages would be handled - just much longer and touching upon the vagueness of that comment by @xoviat.

@njsmith

The point I was making is that right now, egg_info is almost never actually providing any value, even for legacy sdist builds. It provides information that would be useful to a resolver if we had a resolver... but we don't. By the time pip calls egg_info on an sdist then it's already committed to installing that sdist, so it could just as well go ahead and build a wheel; egg_info isn't providing any value.

Historically, the reason pip's legacy build path calls egg_info is to hack around a setuptools misfeature: if you called setup.py install and setuptools noticed that some install_requires was missing, then it would go ahead and easy_install it. So this complicated dance with egg_info was invented as a way to preempt this and make sure that by the time we actually call setup.py install, the packages that it will try to install will already be installed. But... we don't actually call setup.py install anymore. Or, well, hardly ever.

So one possible way forward would be:

I'm not, like, committed to this in any way -- I'm not a pip dev and I'm not volunteering to do the work :-). And I can see why you might prefer to keep the legacy code the way it is. But that constraint does seem like it makes things much more complicated. You can't just hide the egg_info/build split behind an abstract interface, because it changes the whole flow of pip install. You can't actually keep the legacy code unchanged, because you want to support PEP 518 for legacy builds, and doing that with egg_info means you need a way to store the PEP 518 environments across multiple invocations of setup.py. You can't trivially make a PEP 517 sdist act like a legacy sdist, because you need to handle PEP 517 backends that don't provide that hook, plus you need a way to save the metadata tmpdir and pass it back to build_wheel later, which isn't something that comes up in the legacy build.

I know this looks like a stereotypical "let's blow up the world and make it better!" proposal. I'm mentioning it because it seems like despite the appearances, it might actually be easier, faster, simpler, and produce a better result in the end, so I think it's worth thinking through carefully instead of rejecting out of hand.

@ghost

Just to be clear, I would support dropping setup.py install, as it would make the logic so much simpler, but I don't make that decision. What are the thoughts of the decision-makers?

@pradyunsg

I'm not a decision maker here.

That said, I'm not in favour of this approach. I had considered this for the resolver (that I'd rewrite the stuff completely). I decided against it.

This codebase is mostly maintained on volunteer time and there's a lot of... interesting... relationships between the various parts.

It's very possible that midway through the rewrite you leave the codebase in a non working state - that's exactly what I won't want to see happening and I don't think anyone would want that with the PEP 517/518 work either.

I personally know that the incremental refactor and improve approach works here. That is what I used when implementing the config command. It's slower but almost equally effective.

if we had a resolver... but we don't.

I won't hold my breath on it but we will soon. :)

@pradyunsg

Closed by mistake. Sorry!

@ghost ghost mentioned this issue

Oct 20, 2017

@pradyunsg

@github-actions github-actions bot locked as resolved and limited conversation to collaborators

Oct 15, 2021