gh-118761: Improve import time for pstats
and zipfile
by removing imports to typing
by picnixz · Pull Request #128981 · python/cpython (original) (raw)
Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize
import zipfile
overall.
Okay, so I actually went in and took some timings.
Here is my patch:
diff --git a/Lib/zipfile/init.py b/Lib/zipfile/init.py index b8b496ad947..5f479965ba3 100644 --- a/Lib/zipfile/init.py +++ b/Lib/zipfile/init.py @@ -21,16 +21,6 @@ zlib = None crc32 = binascii.crc32 -try: - import bz2 # We may need its compression method -except ImportError: - bz2 = None
-try: - import lzma # We may need its compression method -except ImportError: - lzma = None
all = ["BadZipFile", "BadZipfile", "error", "ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2", "ZIP_LZMA", "is_zipfile", "ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile", @@ -705,6 +695,7 @@ def init(self): self._comp = None
def _init(self):
import lzma props = lzma._encode_filter_properties({'id': lzma.FILTER_LZMA1}) self._comp = lzma.LZMACompressor(lzma.FORMAT_RAW, filters=[ lzma._decode_filter_properties(lzma.FILTER_LZMA1, props)
@@ -731,6 +722,7 @@ def init(self):
def decompress(self, data):
if self._decomp is None:
import lzma self._unconsumed += data if len(self._unconsumed) <= 4: return b''
@@ -778,11 +770,15 @@ def _check_compression(compression): raise RuntimeError( "Compression requires the (missing) zlib module") elif compression == ZIP_BZIP2:
if not bz2:
try:
import bz2
elif compression == ZIP_LZMA:except ImportError: raise RuntimeError( "Compression requires the (missing) bz2 module")
if not lzma:
try:
import lzma
else: @@ -795,6 +791,7 @@ def _get_compressor(compress_type, compresslevel=None): return zlib.compressobj(compresslevel, zlib.DEFLATED, -15) return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) elif compress_type == ZIP_BZIP2:except ImportError: raise RuntimeError( "Compression requires the (missing) lzma module")
import bz2 if compresslevel is not None: return bz2.BZ2Compressor(compresslevel) return bz2.BZ2Compressor()
@@ -812,6 +809,7 @@ def _get_decompressor(compress_type): elif compress_type == ZIP_DEFLATED: return zlib.decompressobj(-15) elif compress_type == ZIP_BZIP2:
elif compress_type == ZIP_LZMA: return LZMADecompressor()import bz2 return bz2.BZ2Decompressor()
Note that I don't delay the zipfile import; it's needed to handle crc32 consistently and it wasn't obviously something that would be significant to optimize.
I have 3 timings:
- The first is for zipfile without this
typing
change. - The second is for zipfile with this
typing
change. - The third is for zipfile with my patch applied.
$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
Time (mean ± σ): 35.6 ms ± 5.7 ms [User: 30.1 ms, System: 4.9 ms]
Range (min … max): 30.8 ms … 50.3 ms 65 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
Time (mean ± σ): 26.9 ms ± 5.1 ms [User: 22.1 ms, System: 4.5 ms]
Range (min … max): 24.0 ms … 53.6 ms 115 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
Time (mean ± σ): 25.4 ms ± 2.6 ms [User: 20.7 ms, System: 4.5 ms]
Range (min … max): 24.2 ms … 50.1 ms 120 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
I cannot remotely guarantee I know what I'm doing with a benchmarking tool, I've said so in other PRs too. ;) Still, the experimental results I got say that there are HUGE gains to be gotten from removing typing
, and effectively no gains whatsoever to gain from avoiding bz2
/ lzma
when they aren't used.
The reason I did these timings was because I thought it sounded like a great idea to solve this as a followup:
If the import time of
zipfile
is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.
But based on my timings I've changed my mind and don't intend to submit this patch as it feels useless to waste time caring about this insignificant and not at all slow import.