bpo-43976: add vendor config by FFY00 · Pull Request #25718 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation85 Commits9 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

FFY00

@encukou

The "Check if generated files are up to date" test is failing because of unfortunate timing when merging other PRs (GH-25315, GH-25687).
It's OK to ignore it. Sorry for the inconvenience.
To make it pass, you can merge the master branch.

@FFY00

No worries, I haven't committed the generated files anyway. I purposedly excluded generated files from the commit to make it easier to review. I still have to add documentation to the commit, and then I will push a commit updating the generated files, I can rebase when I do that. The only reason I opened this PR now anyway was that I am not gonna be home this afternoon and wanted to add the missing bits from my laptop.
Before I mark it ready for review I will be sure to rebase (I usually do) 😊

@tiran

Please use autoconf 2.69 to regenerate configure scripts and helpers. Most core devs are on platforms that do not have autoconf 2.71 yet.

@FFY00

The venv failure seems to be because we are hitting pypa/pip#9617.

@FFY00 FFY00 marked this pull request as ready for review

April 29, 2021 16:29

@FFY00 FFY00 changed the titlebpo-41282: add vendor config bpo-43976: add vendor config

Apr 29, 2021

@merwok

I think this needs a mailing-list or forum discussion.

I would have notified the people in pypa/pip#9617 but that is locked, maybe @uranusjr or @pradyunsg can ping them. (But as all design discussions, please use a proper python forum and not a PR)

@FFY00

I have opened https://discuss.python.org/t/mechanism-for-distributors-to-add-site-install-schemes-to-python-installations/8467.

A note about the PR. Currently, the site module does not import sysconfig, but it contains two copied functions from it. As far as I understand, this is just a micro optimization, the only two sysconfig functions it needs are pretty simple and do not depend on other sysconfig code or other modules.
This PR, however, needs a significant part of sysconfig, so it imports it.

I did some bechmarks to see if we would notice any performance difference, but the results are pretty negligible.

With the PR:

$ ./python -m timeit 'import site'
2000000 loops, best of 5: 167 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 174 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 166 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 176 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 184 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 176 nsec per loop

In master:

$ ./python -m timeit 'import site'
2000000 loops, best of 5: 176 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 179 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 171 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 176 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 177 nsec per loop
$ ./python -m timeit 'import site'
2000000 loops, best of 5: 176 nsec per loop

If this PR goes in, I will open a new PR removing the two sysconfig functions that the site module vendors.

@merwok

Can you build python from scratch if sysconfig imports site?
What about cross-compilation? (which has unclear support status)

@pradyunsg

For the stuff with the pip release, the PR that needs merging is pypa/pip#9912. Once that's merged and released, things should get fixed.

@FFY00

I have applied #25761 to fix the tests, github should be able to merge this without issue after that PR goes in.

pradyunsg

@FFY00

@uranusjr can you take a look at the last commit (the get_preferred_schemes one)?

jaraco

Comment on lines 287 to 319

sys.path.append(os.path.abspath(os.path.join(__file__, '..', 'vendor_config')))
# force re-load of vendor schemes with the patched sys.path
site._VENDOR_SCHEMES = None
sysconfig._load_vendor_schemes()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a separate test rather than mutate an existing test?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we still need the test as-is, but we can also keep the original one.

jaraco

Comment on lines 281 to 293

sys.path.append(os.path.abspath(os.path.join(__file__, '..', 'vendor_config')))
# force re-load of vendor schemes with the patched sys.path
sysconfig._load_vendor_schemes()
wanted = ['nt', 'posix_home', 'posix_prefix', 'some_vendor']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have two tests, one which patches sys.path and another that doesn't?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

jaraco

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and a good start for providing hooks for vendor configurability.

I'd like to see an answer to @merwok's questions:

Can you build python from scratch if sysconfig imports site?
What about cross-compilation? (which has unclear support status)

I'm not familiar with building from scratch, but I agree, it would be important to address these cases. Another way to think about it - is this change backward-compatible? Can this change provide a graceful degradation such that if no vendor file is supplied, the behavior is all-but-guaranteed to be the same as on prior versions (if no --vendor-config is supplied)? If the change is "safe" in that respect (compatibility), the feature can be refined through the betas.

@merwok Do you know how one could test for building Python from scratch (to prove compatibility or demonstrate a failure)?

Also, the docs builds are failing - so those checks will need to be fixed before merging.

Those things addressed, I'm ready to say this change is suitable for merge, acknowledging that the short time frame prior to the feature freeze means this change may have a fair likelihood of being reverted (with all the consternation that brings and leaving no opportunity to restore prior to Python 3.11).

@bedevere-bot

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@FFY00

Signed-off-by: Filipe Laíns lains@riseup.net

@merwok

Putting the config in a submodule was proposed by zooba IIRC, to make it more difficult for users to shadow it by mistake.

IMO there is little difference between _vendor.config and _vendor_config or _distributor_config.

@FFY00

I think the main difference is that we still install a _vendor package, regardless of a config existing or not, so the only way to shadow it is to give more precedence to a path containing a module with same name over the standard library. This is all internal stuff, not user facing, so I don't see the issue with naming. The current approach is purely funcional, not a matter of organization, like the other modules.

@FFY00

Signed-off-by: Filipe Laíns lains@riseup.net

@FFY00

I have frozen _sysconfig, I need to re-run benchmarks to see how that affects startup time.

@FFY00

I re-ran the benchmarks.

In my server, which is not very stable, and does not support tuning (I had to pass --no-tune). For some reason it is now reporting >40ms.

2021-10-19_13-44-master-09c04e7f0d26.json.gz
============================================

Performance version: 1.0.2
Report on Linux-5.14.11-hardened1-1-hardened-x86_64-with-glibc2.33
Number of logical CPUs: 80
Start date: 2021-10-23 01:12:10.158362
End date: 2021-10-23 01:13:11.489496

2021-10-22_22-50-master-303bd4e423f6.json.gz
============================================

Performance version: 1.0.2
Report on Linux-5.14.11-hardened1-1-hardened-x86_64-with-glibc2.33
Number of logical CPUs: 80
Start date: 2021-10-23 00:16:29.013840
End date: 2021-10-23 00:17:32.112072

### python_startup ###
Mean +- std dev: 42.3 ms +- 0.2 ms -> 44.4 ms +- 2.4 ms: 1.05x slower
Significant (t=-12.59)

And in my desktop, which still is not very stable, but more stable than the server.

2021-10-19_13-44-master-09c04e7f0d26.json.gz
============================================

Performance version: 1.0.2
Report on Linux-5.14.0-next-20210906-1-next-git-x86_64-with-glibc2.33
Number of logical CPUs: 8
Start date: 2021-10-23 01:49:41.915911
End date: 2021-10-23 01:50:34.306962

2021-10-23_00-41-master-66c29cc7b683.json.gz
============================================

Performance version: 1.0.2
Report on Linux-5.14.0-next-20210906-1-next-git-x86_64-with-glibc2.33
Number of logical CPUs: 8
Start date: 2021-10-23 01:46:53.628563
End date: 2021-10-23 01:47:48.478551

### python_startup ###
Mean +- std dev: 12.1 ms +- 0.3 ms -> 12.7 ms +- 0.4 ms: 1.05x slower
Significant (t=-19.23)

Which gives a consistent 1.05x result. @vstinner is this acceptable?

If anyone wants to run the benchmarks themselves, the way I am testing this is by committing a config file because pyperformance does not let us pass options to configure, so we need to place the config file there some other way.

commit 66c29cc7b68342a38457740d58274d2d32371ca0 Author: Filipe Laíns lains@riseup.net Date: Sat Oct 23 01:24:17 2021 +0100

checkout vendor config for pyperformance

Signed-off-by: Filipe Laíns <lains@riseup.net>

diff --git a/Lib/_vendor/config.py b/Lib/_vendor/config.py new file mode 100644 index 0000000000..9d979c7e8f --- /dev/null +++ b/Lib/_vendor/config.py @@ -0,0 +1,18 @@ +EXTRA_INSTALL_SCHEMES = {

+} + +EXTRA_SITE_INSTALL_SCHEMES = [

+]

[compile]
lto = True
pgo = True
install = True

[run_benchmark]
benchmarks = python_startup

@gvanrossum

Which gives a consistent 1.05x result. @vstinner is this acceptable?

I'm not Victor, and I don't understand this patch (I'm not a Linux user), but several us at the faster-cpython project have sunk a fair amount of time in reducing startup time (in particular the dip around Oct 16 here) and it would be kind of sad to see this all for naught.

@FFY00

Hi, thank you for the feedback. Unfortunately, I don't think there is a right answer. This patch essentially introduces a tradeoff, it provides a reasonable mechanism for vendors to customize the site initialization, which they can opt-in in exchange for a slower startup time. I don't think your optimizations to the startup time are wasted, they are still there and Python distributions without a vendor config still have the same startup time as before, but I understand the frustration 😕

IMO the current situation with downstream patching is very precarious, so this tradeoff makes sense to me. Ultimately, I think this is something that would be up to the users, as they are not necessarily married to one Python distribution and still hold the choice of using something else (eg. on Debian/Ubuntu use deadsnakes instead of the distro provided Python, use a Linux distribution that does not patch Python, hence does not need to make use of this mechanism, like Arch Linux, etc.).

That said, I acknowledge that it is indeed unfortunate that this option needs to come with a tradeoff, but I don't see another way around this 😟

There are still a couple optimizations that could be made to improve this (eg. the installed vendor config module could be frozen), and they are definitely something I will explore more in depth if this proposal gets accepted.

@gvanrossum

Thanks, that's a very reasonable answer. Just so I understand, this PR only add a build-time option that vendors can opt in to?

@encukou

No, this always adds a new stdlib package, _vendor, and _vendor.config is imported at startup.
I still don't understand why this is so much better than patching sysconfig, to be honest. The only way to prevent vendors from patching is still to tell them to not do that, and even with this PR you need to tell them that. So why not just document what kind of patching is allowed?

@FFY00

Yes, you are both right, it adds a --with-vendor-config option to configure, which will install a _vendor.config module.

In site, we try to import that module, and if it is present and EXTRA_SITE_INSTALL_SCHEMES is set and not empty, we will import a minimal version of sysconfig (_sysconfig), use it to load the paths for the specified schemes, and add them to the site initialization.

This didn't have any measurable difference in startup time when a vendor config was not installed, but I can benchmark again with if you want me to show actual data.

So why not just document what kind of patching is allowed?

I think the answer is very simple: because we cannot rely on downstreams to patch things reliably.
Police downstream patching is not a reasonable long-term solution, and patching that does not follow the guidelines will still have the very same impact. Opt-ing out of support because someone is making unsupported modifications to Python does not solve anything, and people will definitely still implement fragile workarounds if things don't work out of the box (eg. Meson has been overwriting purelib and platlib to hardcoded values if the user was on a Debian based distro, this was only recently fixed).

@markshannon

It seems to me that we cannot prevent vendors from modifying the layout of the Python distribution and patching files.

Assuming that vendors don't want to ship broken Python installs, shouldn't we provide tools to help them?
It seems to me that if we provided some sort of verification tool, we could expect vendors to use it, and fix the errors.

If the problem is that there are lots of implicit assumptions about layout, let's make those assumptions explicit.

@gvanrossum

Speaking of layout, maybe @zooba's rewrite of getpath.c in Python is helpful to look at? It makes those implicit decisions a little clearer. See #29041

@FFY00

Assuming that vendors don't want to ship broken Python installs, shouldn't we provide tools to help them?

This assumption is one of the issues, I think. Distributors want to ship Python installs that work, not that are not broken, these things may sound similar but aren't the same! For years distributors have been shipping what I would call broken Python installs, as they work, not because Python is not broken, but because people have implemented workarounds to get things to work.
IMO this is really bad for the ecosystem. I have written about this more in-depth in the gist linked above and here.

It seems to me that if we provided some sort of verification tool, we could expect vendors to use it, and fix the errors.

This is an interesting proposal, but I am not sure it is enough. I think distributors have been aware for a long time they were shipping broken installs but haven't really done anything because things worked.

Speaking of layout, maybe @zooba's rewrite of getpath.c in Python is helpful to look at? It makes those implicit decisions a little clearer. See #29041

Distributors don't usually modify things at the getpath level, I am personally not aware of any that does. What they generally patch is the site module.


I really appreciate the feedback, and I am open to go with a different solution if you think is better, but I still think the best option to solve this issue long-term is to standardize these modifications via an upstream supported mechanism 😕
I can do a bit more work to minimize the impact if anyone is interested and/or that would help with the proposal.

@pfmoore

Distributors want to ship Python installs that work, not that are not broken, these things may sound similar but aren't the same!

I think this is an important point. One of the significant pain points with Python packaging is dealing with distribution-patched Python implementations. Most of the constraints we'd like to impose are currently implicit, and distributions have their own constraints that often conflict with ours. So when we consider something as "broken", a distribution might well consider it as "conforming to policy" (for example).

I haven't watched this PR closely, but representatives from the various distros have been involved in the discussions, so I assume that (a) they are OK with its approach, and (b) it provides a solution that lets them satisfy their constraints while letting Python packaging tools avoid needing to special-case distro Pythons all over the place. Maybe a "config checker" utility would give a similar benefit, but we'd need to get feedback to confirm that, which means another cycle of discussion (which risks burning out interested parties, an ongoing issue with packaging). The one thing we know is not going to work is to continue to allow vendors to patch Python without any constraints - on the pip project, for example, we routinely have to tell users "you need to complain to your distro vendor, they've patched Python in ways we don't support", which sucks for the user and is an ongoing drain on pip maintainer time.

Slower startup is a real concern, but surely it's an issue for the vendors customising things? Standard Python remains fast, and vendors have to decide, is varying from the CPython default config important enough to us to take that performance hit? I'd be more concerned if, for example, Fedora were to say that they would continue patching Python rather than using this mechanism, because they didn't want Fedora-supplied Python to be slower than a self-built copy.

@tiran

I'd be more concerned if, for example, Fedora were to say that they would continue patching Python rather than using this mechanism, because they didn't want Fedora-supplied Python to be slower than a self-built copy.

That is my main concern, too. How are you going to convince vendors to use the new method? Their current approach works well enough for vendors and the new approach is slowing down startup. Startup performance is important for command line tools. On modern Linux, lots of tools are written in Python. grep -R /usr/bin/python /usr/bin/ /usr/sbin | wc -l returns 430 hits on my system. If we want vendors to adopt the approach, then it has to be all carrots and no sticks.

What is the cause of the slow down? Is it code execution or imports? In case it's import, could we do a more hacky approach and embed the vendor snippet into _sysconfig.py file directly to avoid the import cost?

@pradyunsg

The important bit here is to ensure that the startup cost is only paid if there's a vendor module.

Right now, as implemented, this PR attempts an unconditional import, which makes a blanket hit to startup.

@pfmoore

How are you going to convince vendors to use the new method?

Isn't that the point of the discussion linked above? That this approach has the support from vendors? I agree that if major distributions like Debian and Fedora are going to ignore this mechanism, then it's pointless adding it. Maybe it's worth someone collecting a list of distributions who have committed to implementing this mechanism in place of their custom patches, and posting it here?

@stefanor

Debian hat on (although I'm not the cpython maintainer, @doko42, I help out from time to time):

I'm interested in mechanisms like this that can mean we can reduce our patch-load. I have been aware of this PR and subscribed to it since inception, but haven't done the investigation work on whether it meets our needs, yet. I certainly don't intend to ignore it, but haven't looked at it, yet.

@FFY00

Right now, as implemented, this PR attempts an unconditional import, which makes a blanket hit to startup.

My testing showed that this did not made any significant difference. The costly bit is importing _sysconfig and doing the path expansion, which is required if for relocated installs (see PYTHONHOME).

I should re-do it with the last changes though, to make sure I am not mistaken.

What is the cause of the slow down? Is it code execution or imports? In case it's import, could we do a more hacky approach and embed the vendor snippet into _sysconfig.py file directly to avoid the import cost?

From my testing, it's importing _sysconfig and expanding the paths, which is only done if there is a vendor config and that vendor config specifies extra schemes to be added to the site initialization.

@FFY00

Re-ran the benchmarks without a vendor config:

2021-10-19_13-44-master-09c04e7f0d26.json.gz
============================================

Performance version: 1.0.2
Report on Linux-5.14.0-next-20210906-1-next-git-x86_64-with-glibc2.33
Number of logical CPUs: 8
Start date: 2021-10-26 21:14:04.686571
End date: 2021-10-26 21:14:59.851457

2021-10-22_22-47-master-3e25a85fc5e9.json.gz
============================================

Performance version: 1.0.2
Report on Linux-5.14.0-next-20210906-1-next-git-x86_64-with-glibc2.33
Number of logical CPUs: 8
Start date: 2021-10-26 21:10:15.800195
End date: 2021-10-26 21:11:11.439100

### python_startup ###
Mean +- std dev: 12.7 ms +- 0.6 ms -> 12.9 ms +- 0.5 ms: 1.01x slower
Not significant

@encukou

The important bit here is to ensure that the startup cost is only paid if there's a vendor module.

You mean it's OK to slow things down if there are modifications?
It's not. If --with-vendor-config makes things slow, then it's better for distros to patch instead.

@Bo98 Bo98 mentioned this pull request

Dec 14, 2021

@FFY00

I am closing this now. After 2 separate attempts at getting this merged, no progress was made. #29041 made a similar change to the interpreter initialization and did not have near this much push back, which, to be clear, I think was very good progress, but so is this IMO, it solves a very real problem. I am really unsure about what I could have done differently to have this progress further.

I'd welcome any feedback, and other alternative approaches to solve this problem, so far I don't think anyone made any other serious proposal.

@merwok

This comment #25718 (comment) expressed that a broad discussion was needed and its outcome encoded in a PEP, which makes sense to me. There’s a lot of history of CPython devs doing changes and OS maintainers adding their patches without much communication (also seen recently with the surprise and resistance to backend-independent packaging even though it’s been years in the making and communicated in multiple places); it would be good to see all these people agree on the needs, problems and solutions.

@FFY00

Thank you for the feedback, I realized that and ended up witting https://gist.github.com/FFY00/625f65681fbcd7fc039dd4d727bb2c2f, but was hoping it did not have to make it to a full-blown PEP. I have gotten feedback from a couple people, and for all of the ones that could benefit from this mechanism, this seemed like a reasonable approach. Unfortunately, I did not get any feedback from Matthias, who was the main blocker on this PR so I am not sure what else I can do besides submitting it as a PEP, but I am not sure we will get the feedback you were hopping from everyone/

@FFY00 FFY00 mentioned this pull request

May 10, 2022

Reviewers

@vstinner vstinner vstinner left review comments

@jaraco jaraco jaraco approved these changes

@uranusjr uranusjr uranusjr left review comments

@hroncok hroncok hroncok left review comments

@pradyunsg pradyunsg pradyunsg left review comments

@tiran tiran tiran requested changes

@willingc willingc Awaiting requested review from willingc

@isidentical isidentical Awaiting requested review from isidentical