gh-118761: Improve import time for pstats and zipfile by removing imports to typing by picnixz · Pull Request #128981 · python/cpython (original) (raw)

Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize import zipfile overall.

Okay, so I actually went in and took some timings.

Here is my patch:

diff --git a/Lib/zipfile/init.py b/Lib/zipfile/init.py index b8b496ad947..5f479965ba3 100644 --- a/Lib/zipfile/init.py +++ b/Lib/zipfile/init.py @@ -21,16 +21,6 @@ zlib = None crc32 = binascii.crc32 -try: - import bz2 # We may need its compression method -except ImportError: - bz2 = None

-try: - import lzma # We may need its compression method -except ImportError: - lzma = None

all = ["BadZipFile", "BadZipfile", "error", "ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2", "ZIP_LZMA", "is_zipfile", "ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile", @@ -705,6 +695,7 @@ def init(self): self._comp = None

 def _init(self):

@@ -731,6 +722,7 @@ def init(self):

 def decompress(self, data):
     if self._decomp is None:

@@ -778,11 +770,15 @@ def _check_compression(compression): raise RuntimeError( "Compression requires the (missing) zlib module") elif compression == ZIP_BZIP2:

@@ -812,6 +809,7 @@ def _get_decompressor(compress_type): elif compress_type == ZIP_DEFLATED: return zlib.decompressobj(-15) elif compress_type == ZIP_BZIP2:

Note that I don't delay the zipfile import; it's needed to handle crc32 consistently and it wasn't obviously something that would be significant to optimize.

I have 3 timings:

$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      35.6 ms ±   5.7 ms    [User: 30.1 ms, System: 4.9 ms]
  Range (min … max):    30.8 ms …  50.3 ms    65 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      26.9 ms ±   5.1 ms    [User: 22.1 ms, System: 4.5 ms]
  Range (min … max):    24.0 ms …  53.6 ms    115 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      25.4 ms ±   2.6 ms    [User: 20.7 ms, System: 4.5 ms]
  Range (min … max):    24.2 ms …  50.1 ms    120 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

I cannot remotely guarantee I know what I'm doing with a benchmarking tool, I've said so in other PRs too. ;) Still, the experimental results I got say that there are HUGE gains to be gotten from removing typing, and effectively no gains whatsoever to gain from avoiding bz2 / lzma when they aren't used.

The reason I did these timings was because I thought it sounded like a great idea to solve this as a followup:

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

But based on my timings I've changed my mind and don't intend to submit this patch as it feels useless to waste time caring about this insignificant and not at all slow import.