gh-118761: Improve import time for `pstats` and `zipfile` by removing imports to `typing` by picnixz · Pull Request #128981 · python/cpython (original) (raw)

Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize import zipfile overall.

Okay, so I actually went in and took some timings.

Here is my patch:

diff --git a/Lib/zipfile/init.py b/Lib/zipfile/init.py index b8b496ad947..5f479965ba3 100644 --- a/Lib/zipfile/init.py +++ b/Lib/zipfile/init.py @@ -21,16 +21,6 @@ zlib = None crc32 = binascii.crc32 -try: - import bz2 # We may need its compression method -except ImportError: - bz2 = None

-try: - import lzma # We may need its compression method -except ImportError: - lzma = None

all = ["BadZipFile", "BadZipfile", "error", "ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2", "ZIP_LZMA", "is_zipfile", "ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile", @@ -705,6 +695,7 @@ def init(self): self._comp = None

 def _init(self):

   import lzma
   props = lzma._encode_filter_properties({'id': lzma.FILTER_LZMA1})
   self._comp = lzma.LZMACompressor(lzma.FORMAT_RAW, filters=[
       lzma._decode_filter_properties(lzma.FILTER_LZMA1, props)

@@ -731,6 +722,7 @@ def init(self):

 def decompress(self, data):
     if self._decomp is None:

       import lzma
       self._unconsumed += data
       if len(self._unconsumed) <= 4:
           return b''

@@ -778,11 +770,15 @@ def _check_compression(compression): raise RuntimeError( "Compression requires the (missing) zlib module") elif compression == ZIP_BZIP2:

```
   if not bz2:
```

```
   try:
```
```
       import bz2
```

   except ImportError:
       raise RuntimeError(
           "Compression requires the (missing) bz2 module")

elif compression == ZIP_LZMA:

```
   if not lzma:
```

```
   try:
```
```
       import lzma
```
```
   except ImportError:
       raise RuntimeError(
           "Compression requires the (missing) lzma module")
```
else: @@ -795,6 +791,7 @@ def _get_compressor(compress_type, compresslevel=None): return zlib.compressobj(compresslevel, zlib.DEFLATED, -15) return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) elif compress_type == ZIP_BZIP2:

   import bz2
   if compresslevel is not None:
       return bz2.BZ2Compressor(compresslevel)
   return bz2.BZ2Compressor()

@@ -812,6 +809,7 @@ def _get_decompressor(compress_type): elif compress_type == ZIP_DEFLATED: return zlib.decompressobj(-15) elif compress_type == ZIP_BZIP2:

```
   import bz2
   return bz2.BZ2Decompressor()
```
elif compress_type == ZIP_LZMA: return LZMADecompressor()

Note that I don't delay the zipfile import; it's needed to handle crc32 consistently and it wasn't obviously something that would be significant to optimize.

I have 3 timings:

The first is for zipfile without this typing change.
The second is for zipfile with this typing change.
The third is for zipfile with my patch applied.

$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      35.6 ms ±   5.7 ms    [User: 30.1 ms, System: 4.9 ms]
  Range (min … max):    30.8 ms …  50.3 ms    65 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      26.9 ms ±   5.1 ms    [User: 22.1 ms, System: 4.5 ms]
  Range (min … max):    24.0 ms …  53.6 ms    115 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      25.4 ms ±   2.6 ms    [User: 20.7 ms, System: 4.5 ms]
  Range (min … max):    24.2 ms …  50.1 ms    120 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

I cannot remotely guarantee I know what I'm doing with a benchmarking tool, I've said so in other PRs too. ;) Still, the experimental results I got say that there are HUGE gains to be gotten from removing typing, and effectively no gains whatsoever to gain from avoiding bz2 / lzma when they aren't used.

The reason I did these timings was because I thought it sounded like a great idea to solve this as a followup:

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

But based on my timings I've changed my mind and don't intend to submit this patch as it feels useless to waste time caring about this insignificant and not at all slow import.

gh-118761: Improve import time for pstats and zipfile by removing imports to typing by picnixz · Pull Request #128981 · python/cpython (original) (raw)

diff --git a/Lib/zipfile/init.py b/Lib/zipfile/init.py index b8b496ad947..5f479965ba3 100644 --- a/Lib/zipfile/init.py +++ b/Lib/zipfile/init.py @@ -21,16 +21,6 @@ zlib = None crc32 = binascii.crc32 -try: - import bz2 # We may need its compression method -except ImportError: - bz2 = None

-try: - import lzma # We may need its compression method -except ImportError: - lzma = None

gh-118761: Improve import time for `pstats` and `zipfile` by removing imports to `typing` by picnixz · Pull Request #128981 · python/cpython (original) (raw)