File: ziptools/ziptools/ziptools/ziptools.py (original) (raw)

#!/usr/bin/env python3 # -- coding: utf8 -- r"""jun25 (else py3.14? makes ? escapes errors)

ziptools.py - the main library module of the ziptools system. See ziptools' ../_README.html for license, attribution, and other logistics.

Tools to create and extract zipfiles containing a set of files, folders, and symbolic links. All functions here are callable, but the main top-level entry points are these two (see ahead for more on their arguments):

createzipfile(zipname, [addnames], storedirs=True, cruftpatts={}, atlinks=False, trace=print, zipat=None, nocompress=False)

extractzipfile(zipname, pathto='.', nofixlinks=False, trace=print, permissions=False, nomangle=False)

Pass "trace=lambda *p, **k: None" to silence most messages from these calls. See also scripts zip-create.py and zip-extract.py for command-line clients, and zipcruft.cruft_skip_keep for a default "cruftpatts" cruft-file definition. All of these have additional documentation omitted here.

This ziptools package mostly extends Python's zipfile module with top-level convenience tools that add some important missing features:

Docs which span creates and extracts (see these functions for more):

CRUFT HANDLING: This script sidesteps other tools' issues with ".*" cruft files (metadata that is normally hidden): by default, they are not silently/implicitly omitted in zips here for completeness, but can be omitted by passing a filename-patterns definition structure to the optional "cruftpatts" argument.

See zipcruft.py for pattern defaults to import and pass, and zipfile-create.py for more background. Most end-user zips should skip cruft files (see Mergeall: cruft can be a major issue on Mac OS in data to be transferred elsewhere).

WINDOWS LONG PATHS: This program, like Mergeall, supports pathnames that are not restricted to the usual 260/248-character length limit on all versions of Windows. To lift the limit, pathnames are automatically prefixed with '\?' as needed, both when adding to and extracting from archives. This allows archives created on platforms without such limits to be unzipped and rezipped on Windows machines.

The \?-prefix magic is internal, and hidden from the user as much as possible. It also makes paths absolute, but relative paths are still fully supported: absolute paths are not propagated to the archive when creating, and impact only output messages on Windows when extracting (where we strip the prefix and try to map back to relative as needed to keep the messages coherent).

SYMLINKS SUPPORT: This package also supports adding symlinks (symbolic links) to and extracting them from zip archives, on both Unix and Windows with Python 3.X, but only on Unix with Python 2.X. Windows requires admin permissions and NTFS filesystem destinations to create symlinks from a zip file; Unix does not.

The underlying Python zipfile module doesn't support symlinks directly today, short of employing the very low-level magic used in ziptools_symlinks.py here, and there is an open bug report to improve this:

  https://bugs.python.org/issue18595
  https://mail.python.org/pipermail/python-list/2005-June/322179.html
  https://duckduckgo.com/?q=python+zipfile+symlink

Symlinks customize messages with "~" characters in creation and "(Link)" prefixes in extraction, because they are a special-enough case to call out in logs, and may require special permission and handling to unzip and use on Windows. For example, link creation and extraction messages are as follows:

  Adding  link  ~folder test1/dirlink   # create message
  (Link) Extracted test1/dirlink        # extract message

By default, zipfile creation zips links themselves verbatim, not the items they refer to. Pass True to the "atlinks" function argument to instead follow links and zip the items they refer to. Unzipping restores whatever was zipped.

When links are copied verbatim, extracts adjust the text of a link's path to use the hosting platform's separators - '' for Windows and '/ for Unix. This provides some degree of link portability between Unix and Windows, but is switchable with "nofixlinks" because it may not be desirable in all contexts (e.g., when unzipping to a drive to be used elsewhere). Symlinks will still be nonportable if they contain other platform-specific syntax, such as Windows drive letters or UNC paths, or use absolute references to extra-archive items.

When "atlinks" is used to follow links and copy items they refer to, recursive links are detected on platforms and Pythons that support stat objects' st_ino (a.k.a. inode) unique directory identifiers. This includes all Unix contexts, and Windows as of Python 3.2 (other contexts fails on path or memory errors). Recursive links are copied themselves, verbatim, to avoid loops and errors.

Besides symlinks, FIFOs and other exotic items are always skipped and ignored; items with system state can't be transferred in a zipfile (or Mergeall mirror). See also the top-level README for more on symlinks. Due to known limitations, symlinks on Windows function correctly but do not retain original modtimes.

PYTHON SYMLINKS SUPPORT: The following table summarizes Python's current symlink support, which wholly determines that of ziptools. In it, IO means basic os.{readlink, symlink} read/write calls, lchmod is the os module's permissions writer, and S_F_S stands for the os.supports_follow_symlinks registration table of tools that can work on symlinks instead of their referents. utime is required for modtimes, and chmod or lchmod for permissions, and there is no lutime:

  Platform   Python   IO    lchmod   S_F_S   S_F_S content
  Windows    2.X      no    no       no      n/a
  Windows    3.X      yes   no       yes     os.stat only
  Unix       2.X      yes   yes      no      n/a
  Unix       3.X      yes   yes      yes     os.{utime, chmod,...}

Per this table, Unix gets the best coverage, though its 2.X cannot propagate symlink modtimes because it has no symlink utime, and must use lchmod instead of chmod. Windows support is spottier: 2.X has none, and 3.X cannot update modtimes or permissions. Worse, os.path.islink() simply returns False for symlinks on Windows+Python 2.X, which means symlinks cannot be detected, and are followed on zips (creates); use 3.X if you care. Nits: Python 3.X has S_F_S only in 3.3 and later, and all of this is may change (and hopefully will...).

PERMISSIONS SUPPORT: As of ziptools version [1.1], extracts (unzips) now propagate Unix permissions for files, folders, and symlinks. Due to interoperability concerns, however, this option only works if it is explicitly selected, and should generally be used only for Unix from+to. See the extract function below for more details.

FAILURES POLICY: Per the user guide, ziptools' general policy is to report but ignore failures of permissions, modtimes, and symlinks (because they are metadata), but end the run for failures of files and folders (because they are core data).

"""

from future import print_function # py 2.X compatibility

import os, sys, shutil from zipfile import ZipFile # stdlib base support from zipfile import ZIP_DEFLATED, ZIP_STORED # compressed, or not

nested package so same if used as main script or package in py3.X

default cruft-file patterns, import here for importers ({}=don't skip)

from .zipcruft import cruft_skip_keep, isCruft # [1.1] isCruft moved

major workaround to support links: split this narly code off to a module...

from .zipsymlinks import addSymlink, isSymlink, extractSymlink

also major: fix any too-long Windows paths on archive adds and extracts

from .ziplongpaths import FWP, UFWP

UTC timestamp zip/unzip [1.2]

from .zipmodtimeutc import addModtimeUTC, getModtimeUTCorLocal

interoperability nits [1.1]

RunningOnPython2 = sys.version.startswith('2') RunningOnMacOS = sys.platform.startswith('darwin') RunningOnWindows = sys.platform.startswith('win')

#===============================================================================

_builtinprint = print

def print(*pargs, **kargs): r""" ----------------------------------------------------------------------- [1.1] Avoid print() exceptions on Python 2.X and Windows. This code redefines print for the rest of this module only, but the custom print becomes the default for "trace" arguments here; you can import this and pass it back in, but shouldn't need to ("trace" is mostly for silencing). This code addresses two potentials for aborts (print() exceptions):

1) On Python 2.X, prints of non-ASCII Unicode to a pipe on Unix can 
throw a to-ASCII encoding exception, even though the same print to the 
console works fine.  Work around by manually encoding unicode arguments 
to UTF-8 when printed.  This was escalated by forcing unicode filenames 
for zip interoperability, but the exceptions also happened if 2.X 
unzipped a 3.X zip and received decoded unicode for non-ASCII names.

2) On Windows, printing Unicode is perilous in general.  The user may 
have set PYTHONIOENCODING to UTF-8, but this is too much to require;
the Windows default CP437 cannot be assumed because it may not apply
on the host; and printing unsupported characters triggers an encoding
exception in any case.  To avoid exceptions, replace all non-ASCII 
characters for message display only, and specialize for 3.X / 2.X str
(e.g., 'Li[\u241][\u241]ux.png' / 'Li[\xC3][\xB1][\xC3][\xB1]ux.png').
This could use ascii() in 3.X (only), but opted to match 2.X displays.
mungestr2X doesn't work on 3.X bytes, but this doesn't have to care.

[1.2] This is now also imported and used by interactive-mode prints in 
main scripts, to avoid similar (though unlikely) non-ASCII print aborts.
This applies only to prints on Windows in this context: input() text 
is a str in 2.X, and so need not be down-converted from unicode.
Creating the abort required pasting a Unicode filename into scripts.
-----------------------------------------------------------------------
"""
usersetting = os.environ.get('PYTHONIOENCODING')  # not currently used 

# 3.X str: decoded codepoints
mungestr3X = \
    lambda s: u''.join(c if ord(c) <= 127 else ('[\\u%d]' % ord(c)) for c in s)

# 2.X str: encoded bytes
mungestr2X = \
    lambda s: b''.join(c if ord(c) <= 127 else ('[\\x%X]' % ord(c)) for c in s)

if RunningOnPython2:
    pargs = [parg.encode('UTF-8') if type(parg) is unicode else parg
                  for parg in pargs]
    if RunningOnWindows:
        pargs = [mungestr2X(parg) if type(parg) is str else parg
                      for parg in pargs]

elif RunningOnWindows:
    pargs = [mungestr3X(parg) if type(parg) is str else parg
                  for parg in pargs]
    
try:
    _builtinprint(*pargs, **kargs)
except UnicodeEncodeError:
    print('--Cannot print filename: message skipped')    # we tried; punt!

#===============================================================================

def tryrmtree(folder, trace=print): """ ----------------------------------------------------------------------- Utility: remove a folder by pathname if needed before unzipping to it. Optionally run by zip-extract.py in interactive mode, but not by the base extractzipfile() function here: manually clean targets as needed.

Python's shutil.rmtree() can sometimes fail on Windows with a "directory
not empty" error, even though the dir _is_ empty when inspected after
the error, and running again usually fixes the problem (deletes the
folder successfully).  Bizarre, yes?  See the rmtreeworkaround() onerror
handler in Mergeall's backup.py for explanations and fixes.  rmtree()
can also fail on read-only files, but this is likely intended by users.

Update: rmtree() can also fail if macOS auto-deletes an AppleDouble 
file first, and Windows fails because its deletes may not be atomic.
This matters only in interactive zip-extract.py and self-test.py here.
-----------------------------------------------------------------------
"""

if os.path.exists(FWP(folder)):
    trace('Removing', folder)
    try:
        if os.path.islink(FWP(folder)):
            os.remove(FWP(folder))
        else:
            shutil.rmtree(FWP(folder, force=True))    # recurs: always \\?\
    except Exception as why:
        print('shutil.rmtree (or os.remove) failed:', why)
        input('Try running again, and press Enter to exit.')
        sys.exit(1)

#===============================================================================

def isRecursiveLink(dirpath): r"""jun25 (else py3.14? makes ? escapes errors) ----------------------------------------------------------------------- Use inodes to identify each part of path leading to a link, on platforms that support inodes. All Unix/Posix do, though Windows Python doesn't until till 3.2 - if absent, allow other error to occur (there are not many more options here; on all Windows, os.path.realpath() is just os.path.abspath()).

This is linearly slow in the length of paths to dir links,
but links are exceedingly rare, "atlinks" use in ziptools
may be rarer, and recursive links are arguably-invalid data.
Recursion may be better than os.walk when path history is
required, though this incurs overheads only if needed as is.

dirpath does not have a \\?\ Windows long-path prefix here;
FWP adds one and also calls abspath() redundantly - but only
on Windows, and we need abspath() on other platforms too.
-----------------------------------------------------------------------
"""
trace = lambda *args: None                  # or print to watch

# called iff atlinks: following links
if (not os.path.islink(FWP(dirpath)) or     # dir item not a link?
    os.stat(os.getcwd()).st_ino == 0):      # platform has no inodes?
    return False                            # moot, or hope for best 
else:
    # collect inode ids for each path extension except last
    inodes = []
    path = []
    parts = dirpath.split(os.sep)[:-1]      # all but link at end
    while parts:
        trace(path, parts)
        path    += [parts[0]]               # add next path part
        parts    = parts[1:]                # expand, fetch inode
        thisext  = os.sep.join(path)
        thispath = os.path.abspath(thisext)
        inodes.append(os.stat(FWP(thispath)).st_ino)

    # recursive if points to item with same inode as any item in path               
    linkpath = os.path.abspath(dirpath)
    trace(inodes, os.stat(FWP(linkpath)).st_ino)
    return os.stat(FWP(linkpath)).st_ino in inodes

#===============================================================================

def isRecursiveLink0(dirpath, visited): """ ----------------------------------------------------------------------- ABANDONED, UNUSED: realpath() cannot be used portably, because it is just abspath() on Windows Python (but why?).

Trap recursive links to own parent dir, but allow multiple
non-recursive link visits.  The logic here is as follows:
If we've reached a link that leads to a path we've already
reached from a link AND we formerly reached that path from
a link located at a path that is a prefix of the new link's
path, then the new link must be recursive.  No, really...
Catches link at visit #2, but avoids overhead for non-links.
-----------------------------------------------------------------------
"""
# called iff atlinks: following links
if not os.path.islink(dirpath):
    # skip non-links
    return False                                      # don't note path
else:
    # check links history
    realpath = os.path.realpath(dirpath)              # dereference, abs
    #print('\t', dirpath, '\n\t', realpath, sep='')
    if (realpath in visited and
        any(dirpath.startswith(prior) for prior in visited[realpath])):
        return True          
    else:
        # record this link's visit
        visited[realpath] = visited.get(realpath, []) # add first or next
        visited[realpath].append(dirpath)
        return False

#===============================================================================

class CreateStats: """ ----------------------------------------------------------------------- Helper for recursive create (zip) stats counters [1.1]. May also pass same mutable instance instead of using +=. ----------------------------------------------------------------------- """ attrs = 'files', 'folders', 'symlinks', 'unknowns', 'crufts'

def __init__(self):
    for attr in self.attrs:
        setattr(self, attr, 0)       # or exec() strs

def __iadd__(self, other):           # += all attrs in place
    for attr in self.attrs:
        setattr(self, attr, getattr(self, attr) + getattr(other, attr))
    return self

def __repr__(self, format='%s=%%d'):
    display = ', '.join(format % attr for attr in self.attrs)
    return display % tuple(getattr(self, attr) for attr in self.attrs) 

class ExtractStats(CreateStats): """ ----------------------------------------------------------------------- Extract (unzip) stats: unknowns unlikely, no crufts or recursion. [1.3] Add mangled and skipped, but don't display unless nonzero. ----------------------------------------------------------------------- """ attrs = ['files', 'folders', 'symlinks', 'unknowns', 'mangled', 'skipped']

def __repr__(self, format='%s=%%d'):
    """
    Don't show mangled or skipped if 0: rare and too much info
    """
    self.attrs = ExtractStats.attrs[:]      # .copy(), but work in py 2.X 
    for attr in ['mangled', 'skipped']:
          if getattr(self, attr) == 0: self.attrs.remove(attr)
    return CreateStats.__repr__(self, format)

def _testCreateStats(): x = CreateStats() print(x) # files=0, folders=0, symlinks=0, unknowns=0, crufts=0

x.files += 1; x.folders += 2;  x.symlinks += 3
print(x)  # files=1, folders=2, symlinks=3, unknowns=0, crufts=0

y = CreateStats()
y.folders += 10; y.unknowns += 20
x += y
print(x)  # files=1, folders=12, symlinks=3, unknowns=20, crufts=0

#===============================================================================

def addEntireDir(thisdirpath, # pathname of directory to add (rel or abs) zipfile, # open zipfile.Zipfile object to add to stats, # counters instance, same at all levels [1.1] thiszipatpath, # modified pathname if zipat/zip@ used [1.2] storedirs=True, # record dirs explicitly in zipfile? cruftpatts={}, # cruft files skip/keep, or {}=do not skip atlinks=False, # zip items referenced instead of links? trace=print): # trace message router (or lambda *p, **k: None) r"""jun25 (else py3.14? makes ? escapes errors) ----------------------------------------------------------------------- Add the full folder at thisdirpath to zipfile by adding all its parts. Python's zipfile module has extractall(), but nothing like an addall() (apart from simple command-line use). The top-level createzipfile() kicks off the recursion here, and docs more of this function's utility.

ADDING DIRS: 
   Dirs (a.k.a. folders) don't always need to be written to the 
   zipfile themselves, because extracts add all of a file's dirs if
   needed (with os.makedirs(), in Python's zipfile module and the local
   zipsymlinks module).  Really, zipfiles don't have folders per se -
   just individual items with pathnames and metadata.

   However, dirs MUST be added to the zipfile themselves to either:
   1) Retain folders that are empty in the original.
   2) Retain the original modtimes of folders (see extract below).

   When added directly, the zipfile records folders as zero-length
   items with a trailing "/", and recreates the folder on extracts
   as needed.  Disable folder writes with "storedirs" if this proves
   incompatible with other tools (but it works fine with WinZip).

   Note that the os.walk()'s files list is really all non-dirs (which
   may include non-file items that should likely be excluded on some
   platforms), and non-link subdirs are always reached by the walker.
   Dir links are returned in subdir list, but not followed by default.
   [Update: per ahead[*], os.walk() was later replaced here with an 
   explicit-recursion coding, which visits directories more directly.]

SYMLINKS: 
   If atlinks=True, this copies items links reference, not links themselves,
   and steps into subdirs referenced by links; else, it copies links and 
   doesn't follow them.  For links to dirs, os.walk() yields the name of 
   the link (not the dir it references), and this is the name under which
   the linked subdir is stored in the zip if atlinks (hence, dirs can be 
   present in multiple tree locations).  For example, if link 'python' 
   references dir 'python3', the latter is stored under the former name.
   [Update: the non-os.walk() recoding per ahead[*] behaves this same way.]

   This also traps recursive link paths to avoid running into memory errors
   or path limits, by using stat object st_ino unique identifiers to
   discern loops from valid dir repeats, where inode ids are supported.
   For more on recursive links detection, see isRecursiveLink() above.
   For more details on links in os.walk(), see docetc/symlinks/demo*.txt.

WINDOWS LONG PATHS: 
   On Windows, very long paths are supported by prefixing all file-tool
   call paths with '\\?' and making them absolute, and passing these on to
   zipfile and zipsymlinks APIs for use in file-tool calls.  Names without
   \\?\ or absolute mapping are passed for use in the archive itself; this
   is required to support relative paths in the archive itself -- if not
   passed, archive names are created from filenames by running filenames
   though os.path.splitdrive() which drops the \\?\, but this does not
   translate from absolute back to relative (when users pass relative).

   [*]THIS ALSO required replacing the former os.walk() coding with explicit
   (manual) recursion.  os.walk() required the root to have a just-in-case 
   FWP() prefix to support arbitrary depth; which made os.walk() yield dirs
   that were always \\?\-prefixed and absolute; which in turn made all
   paths absolute in the zip archive.  Supporting relative zip paths
   AND long-paths requires either explicit recursion (used here) or an
   os.walk() coding with abs->rel mapping (which is possible, but may
   be preclusive: see the message display code in the extract ahead).

   Nit: the explicit-recursion recoding changes the order in which items
   are visited and added - it's now alphabetical per level on Mac OS HFS,
   instead of files-then-dirs (roughly).  This order is different but
   completely arbitrary: it impacts the order of messages output, but
   not the content or utility of the archive zipfile generated.  For
   the prior os.walk() variant, see ../docetc/longpaths/prior-code.

   Also nit: in the explicit-recursion recoding, links that are invalid 
   (do not point to an existing file or dir) are now an explicit case
   here.  Specifically, links to both nonexistent items and non-file/dir
   items are added to the zipfile, despite their rareness, and even if 
   "-atlinks" follow-links mode is used and the referent cannot be added. 
   This is done in part because Mergeall and cpall propagate such links
   too, but also because programs should never silently drop content for
   you: invalid links may have valid uses, and may refer to items present
   on another machine.  The former os.walk()-based version added such 
   links just because that call returns dirs and non-dirs, and invalid
   links appear in the latter. 

   Also also nit: more clearly a win, the new coding reports full paths 
   to cruft items; it's difficult to identify drops from basenames alone.
   See folder _algorithms here for alternative codings for this function.
-----------------------------------------------------------------------
"""

# 
# handle this dir
#
if storedirs and thisdirpath != '.':
    # add folders too
    stats.folders += 1
    trace2('Adding folder', thisdirpath, thiszipatpath, trace)  
    zipfile.write(filename=FWP(thisdirpath),             # fwp for file tools
                  arcname=thiszipatpath)                 # not \\?\ + abs, -zip@?
    addModtimeUTC(zipfile, FWP(thisdirpath))             # UTC modtimes [1.2]

# 
# handle items here
#
for itemname in os.listdir(FWP(thisdirpath)):            # list (fixed windows) path
    itempath  = os.path.join(thisdirpath, itemname)      # extend real provided path
    zipatpath = os.path.join(thiszipatpath, itemname)    # possibly munged path [1.2]
    
    # 
    # handle subdirs (and links to them)
    #
    if os.path.isdir(FWP(itempath)):
        if isCruft(itemname, cruftpatts):                # match name, not path
            # skip cruft dirs
            stats.crufts += 1
            trace('--Skipped cruft dir', itempath)

        elif atlinks:
            # following links: follow? + add
            if isRecursiveLink(itempath):
                # links to a parent: copy dir link instead
                stats.symlinks += 1
                trace('Recursive link copied', itempath)
                addSymlink(FWP(itempath), zipatpath, zipfile, trace)
            else:
                # recur into dir or link
                addEntireDir(itempath, zipfile,     
                             stats, zipatpath, 
                             storedirs, cruftpatts, atlinks, trace)

        else:
            # not following links
            if os.path.islink(FWP(itempath)):
                # copy dir link
                stats.symlinks += 1 
                trace2('Adding  link  ~folder', itempath, zipatpath, trace) 
                addSymlink(FWP(itempath), zipatpath, zipfile, trace)               
            else:
                # recur into dir
                addEntireDir(itempath, zipfile, 
                             stats, zipatpath,
                             storedirs, cruftpatts, atlinks, trace)

    # 
    # handle files (and links to them)
    # 
    elif os.path.isfile(FWP(itempath)):
        if isCruft(itemname, cruftpatts):
            # skip cruft files
            stats.crufts += 1
            trace('--Skipped cruft file', itempath)

        elif atlinks:
            # following links: follow? + add
            stats.files += 1
            trace2('Adding  file ', itempath, zipatpath, trace)
            zipfile.write(filename=FWP(itempath),         # fwp for file tools
                          arcname=zipatpath)              # not \\?\ + abs, -zip@?
            addModtimeUTC(zipfile, FWP(itempath))         # UTC modtimes [1.2]

        else:
            # not following links
            if os.path.islink(FWP(itempath)):
                # copy file link
                stats.symlinks += 1  
                trace2('Adding  link  ~file', itempath, zipatpath, trace)
                addSymlink(FWP(itempath), zipatpath, zipfile, trace)
            else:
                # add simple file
                stats.files += 1
                trace2('Adding  file ', itempath, zipatpath, trace)
                zipfile.write(filename=FWP(itempath),     # fwp for file tools
                              arcname=zipatpath)          # name in archive, -zip@?
                addModtimeUTC(zipfile, FWP(itempath))     # UTC modtimes [1.2]

    #
    # handle non-file/dir links (to nonexistents or oddities)
    #
    elif os.path.islink(FWP(itempath)):
        if isCruft(itemname, cruftpatts):
            # skip cruft non-file/dir links
            stats.crufts += 1
            trace('--Skipped cruft link', itempath)

        else:
            # copy link to other: atlinks or not
            stats.symlinks += 1   
            trace2('Adding  link  ~unknown', itempath, zipatpath, trace)
            addSymlink(FWP(itempath), zipatpath, zipfile, trace)

    #
    # handle oddities (not links to them)
    #
    else:
        # ignore cruft: not adding this
        stats.unknowns += 1
        trace('--Skipped unknown type:', itempath)       # skip fifos, etc.

    # goto next item in this folder

#===============================================================================

def zipatmunge(sourcepath, zipat): """ ----------------------------------------------------------------------- [1.2] If zipat is not None, replace the entire dir path in sourcepath with the zipat path string. This implements the "-zip@path" switch and function argument added in 1.2 to allow zipped paths (and hence later unzip paths) to be shortened, expanded, or dropped altogether.

Subtle things:
- This is called for sources at the top level of a create only; 
  the tree-walk recursion extends the munged path at each level
- sourceroot may be empty or '.' for source items in the CWD
- zipat may be '.' for no nesting, and may be empty (same as '.')
- The long-path prefix for Windows is not part of sourcepath here
- For border cases, this relies on the fact that os.path.split('z') 
  returns ('', 'z'), and os.path.join('', 'z') returns 'z'.
-----------------------------------------------------------------------
"""

if zipat is None:
    return sourcepath    # zip@ not used, or zipat not passed

assert isinstance(zipat, str)
zipat = zipat.rstrip(os.sep)                               # drops trailing slash
sourceroot, sourceitem = os.path.split(sourcepath)         # ditto, but implicit

if sourceroot == '':                                       # source has no path:
    return os.path.join(zipat, sourcepath)                 #   concat zipat, if any
elif zipat in ['.', '']:                                   # zipat is '.' or '': 
    return sourceitem                                      #   rm root path, if any
else:                                                      # else replace root path
    return sourcepath.replace(sourceroot, zipat, 1)        # but just at the front

#===============================================================================

def trace2(message, filepath, zipatpath, trace): """ ----------------------------------------------------------------------- [1.2] Used by creates. Now that zipat allows zip paths to vary from original file paths, show the zip path in output to clarify what was truly zipped. This generally makes a post-zip list unnecessary to see the create's results in the zipfile.

The extra line is not printed if the before/after paths are the same
(which matches pre-1.2 output); and it apes the '=>' format used for 
extracts (which still always show a second line, because target path 
is more crucial to disclose, even if it == zip path [but see trace3()
ahead: extracts now collapse same-path output lines too, for space]).  

This also parrots most of the zipfile module's (and zipsymlinks.py's)
transforms to zipatpath, to avoid spurious diffs (e.g., './x' != 'x'),
and make the extra line's zip path match that of post-zip listings. 
Avoids pretest '\' => '/' on Windows to minimize mismatches/lines,
but the goal is a bit gray: true zip paths, or just flag -zip@ diffs?
-----------------------------------------------------------------------
"""

# mimic what zipfile will do
arcname = os.path.splitdrive(zipatpath)[1]
arcname = os.path.normpath(arcname)
arcname = arcname.lstrip(os.sep + (os.altsep or ''))

# but not this: filepath still has '\' on Windows!
# arcname = arcname.replace(os.sep, "/")

trace(message, filepath)
if arcname != filepath:
    trace('\t\t=> %s' % arcname)    # sans leading '/\', '.', most '..', '\', 'c:'

#===============================================================================

def createzipfile(zipname, # pathname of new zipfile to create addnames, # sequence of pathnames of items to add storedirs=True, # record dirs explicitly in zipfile? cruftpatts={}, # cruft files skip/keep, or {}=do not skip atlinks=False, # zip items referenced instead of links? trace=print, # trace message router (or lambda *p, **k: None) zipat=None, # alternate root zip path for all items [1.2] nocompress=False): # store uncompressed in zipfile for speed [1.3] """ ----------------------------------------------------------------------- Make a zipfile at path "zipname" and add to it all folders and files in "addnames". Its relative or absolute pathnames are propagated to the zipfile, to be used as path suffix when extracting to a target dir. See extractzipfile(), ../zip-create.py, and ../zip-extract.py for more docs on the use of relative and absolute pathnames for zip sources.

Pass "trace=lambda *args: None" for silent operation.  See function
addEntireDir() above for details on "storedirs" (its default is normally
desired), and ahead here for "cruftpatts" and "atlinks" (their defaults
include all cruft files and folders in the zip, and copy links instead
of the items they reference, respectively).

This always uses ZIP_DEFLATED, the "usual" zip compression scheme,
and the only one supported in Python 2.X (ZIP_STORED is uncompressed).
Python's base zipfile module used here supports Unicode filenames 
automatically (encoded per UTF8).  Python's base zipfile module also
ensures that path separators in the zipfile always use Unix '/'.

UPDATE: 1.3 now also allows compression to be turned off for zip speed,
using ZIP_STORED if nocompress=True here (or -nocompress in zip-create.py).
Most useful for very large archives: in a 208G use case, a zip takes 1.5 
hours with compression, and just 22 minutes without (208G vs 195G zipfile).

[1.1] This now returns a CreateStats with #files/folders/symlinks/unknowns
and a repr for display when used in shell scripts (see class above).

[1.1] ziptools run on Python 2.X forces filenames to unicode, so 2.X's
zipfile module stores non-ASCII filenames in zips more portably; this
avoids munged names in unzips run on 3.X and other tools.  For details, 
see doctetc/1.1-upgrades/py-2.X-fixes.txt.

WILDCARDS
   [1.1] Unlike the "../zip-create.py" command-line script, this function
   does not auto-glob addnames with unexpanded "*" (and other) operators.  
   Use Python's glob.glob() to expand names as needed before calling here, 
   and see the script and top-level README for pointers (it's a one-liner).

CRUFT: 
   By default, all files and folders are added to the zip.  This is
   by design, because this code was written as a workaround for WinZip's
   silent file omissions.  As an option, though, this function will
   instead skip normally-hidden cruft files and folders (e.g., ".*")
   much like Mergeall, so they are not added to zips used to upload
   websites or otherwise distribute or transfer programs and data.  To
   enable cruft skipping, pass to cruftpatts a dictionary of this form:

      {'skip': ['pattern', ...],
       'keep': ['pattern', ...]}

   to define fnmatch filename patterns for both items to be skipped, and
   items to be kept despite matching a skip pattern (e.g., ".htaccess").
   If no dictionary is passed, all items are added to the zip; if either
   list is empty, it fails to match any file.  See zipcruft.py for more
   details, and customizable presets to import and pass to cruftpatts
   (the default is available as "cruft_skip_keep" from this module too).

SYMLINKS: 
   Also by default, if symbolic links are present, they are added to 
   the zip themselves - not the items they reference.  Pass atlinks=True
   to instead follow links and zip the items they reference.  This also 
   traps recursive links if atlinks=True, where inodes are supported; see
   isRecursiveLink() above for more details.  As of version [1.1], creates
   now also properly set per-link permission bits in zipfiles, for extracts.

LARGE FILES: 
   allowZip64=True supports files of size > 2G with ZIP64 extensions, 
   that are supported unevenly in other tools, but work fine with the 
   create and extract tools here.  It's True by default in Python 3.4+ 
   only; a False would prohibit large files altogether, which avoids 
   "unzip" issues but precludes use in supporting tools. 

   Per testing, some Unix "unzip"s fail with large files made here, but
   both the extract here and Mac's Finder-click unzips handle them well.
   Split zips into smaller parts iff large files fail in your tools, and
   you cannot find or install a recent Python 2.X or 3.X to run ziptools.
   Example publish-halves.py in learning-python.com/genhtml has pointers. 
-----------------------------------------------------------------------
"""

trace('Zipping', addnames, 'to', zipname)
if cruftpatts:
    trace('Cruft patterns:', cruftpatts)
stats = CreateStats()    # counts [1.1]

#
# handle top-level items
#
compress = ZIP_STORED if nocompress else ZIP_DEFLATED
zipfile = ZipFile(zipname, mode='w', compression=compress, allowZip64=True)
for addname in addnames:

    # force Unicode in Python 2.X so non-ASCII interoperable [1.1]
    if RunningOnPython2:
        try:
            addname = addname.decode(encoding='UTF-8')    # same as unicode()
        except:
            trace('**Cannot decode "%s": skipped' % addname)
            continue

    # change zipped paths for top-level sources if -zip@/zipat [1.2]
    zipatpath = zipatmunge(addname, zipat)

    if (addname not in ['.', '..'] and
        isCruft(os.path.basename(addname), cruftpatts)):
        stats.crufts += 1
        trace('--Skipped cruft item', addname)

    elif os.path.islink(FWP(addname)) and not atlinks:
        stats.symlinks += 1
        trace2('Adding  link  ~item', addname, zipatpath, trace)
        addSymlink(FWP(addname), zipatpath, zipfile, trace)

    elif os.path.isfile(FWP(addname)):
        stats.files += 1
        trace2('Adding  file ', addname, zipatpath, trace)
        zipfile.write(filename=FWP(addname), arcname=zipatpath)
        addModtimeUTC(zipfile, FWP(addname))    # UTC modtimes [1.2]

    elif os.path.isdir(FWP(addname)):
        addEntireDir(addname, zipfile,
                     stats, zipatpath,
                     storedirs, cruftpatts, atlinks, trace)

    else: # fifo, etc.
        stats.unknowns += 1
        trace('--Skipped unknown type:', addname)

zipfile.close()
return stats       # [1.1] printed at shell

#===============================================================================

def showpath(pathto, pathtoWasRelative): r"""jun25 (else py3.14? makes ? escapes errors) ----------------------------------------------------------------------- Extract helper: for message-display only, and on Windows only, try to undo the \?\ prefix and to-absolute mapping for paths. This may or may not be exactly what was given, but is better than always showing an absolute path in messages, and avoiding the just-in-case FWP() described in extract() would require an extensive extract() rewrite. This used to be a nested function; it probably shouldn't have been.

[1.3] Python's os.path.relpath() raises an exception on Windows if
pathto is on a different drive than an optional second argument 
which defaults to the current working directory (e.g., pathto on 
D: but the console in C:).  This can arise only when pathto was 
relative (e.g., for "D:folder" but not for "D:\folder").  Since this 
is rare and used only for message display (and Python doesn't support 
a CWD per drive on Windows), skip the exc and show the full path.
-----------------------------------------------------------------------
"""
if RunningOnWindows:
    pathto = UFWP(pathto)                       # strip \\?\
    if pathtoWasRelative:
        try:
            pathto = os.path.relpath(pathto)    # relative to '.'
        except:
            pass                                # abondon ship [1.3]
return pathto

#===============================================================================

def trace3(zippath, unzippath, trace): """ ----------------------------------------------------------------------- [1.3] Used by extracts. To reduce output volume, ziptools 1.3 (September 2021) now prints just one output line for items whose zipfile and unzip-device pathnames are the same. This happens whenever a zipped item is extracted directly to "." (the CWD), instead of another target given in the command or function call. This mimics the single/double-line output for creates in trace2().

os.path.normpath() might suffice for Windows, but seems overkill;
os.path.normcase() might help too, but this is a slippery slope,
and this is just a cosmetic issue in output lines.

Could path separators in zippath (the zipfile) be '\' on Windows?
Yes, but it would reflect a bug in the zip tool used there: path 
separators in zipfiles must be Unix '/' per the zip standard,
and '\' is a valid filename character on Unix.  The code here
works either way, and a worst-case miss just means 2 output lines.
ziptools creates always use '/' on Windows; other zips should too.
-----------------------------------------------------------------------
"""

# folders: drop trailing slash in zipfile to compare
zippathX = zippath.rstrip('/\\')    # or r'\/'

# Windows: match Unix slashes in zipfile to compare
if RunningOnWindows and '/' in zippath:
    unzippathX = unzippath.replace('\\', '/')
else:
    unzippathX = unzippath

if zippathX == unzippathX:
    # new lite format
    trace('Extracted %s' % zippath)
else:
    # original format
    trace('Extracted %s\n\t\t=> %s' % (zippath, unzippath))

#===============================================================================

Disable zipfile's auto-mangle on Windows. There is no good way to

customize its private _sanitize method, so do this bad way instead.

A class with getitem that raises LookupError is likely slower.

ZipFile._windows_illegal_name_trans_table = {None: None}

def trymangle(zipinfo, pathto, nomangle=False, trace=print): """ ----------------------------------------------------------------------- [1.3] On extract errors, see if nonportable filename characters could be to blame, and if so try again with all of these replaced with "_" (e.g., 'a|b?c.ext' becomes 'a_b_c.ext') to appease unzip filesystems.

This is attempted only on Windows.  Android 11 shared storage has a 
bug which precludes mangling; Linux writes to FAT32 and exFAT drives
are allowed to fail with skips to avoid perpetual diffs; and macOS
silently munges to/from Unicode privates on FAT32 and exFAT drives.
In most cases, fix-nonportable-filenames.py is the better option.

Return False if mangling has been disabled or there are no characters 
to mangle, and True if mangling has been applied to any parts of the 
unzip pathname in the zipfile.  The zipfile.filename must be changed 
in place here, because zipfile.extract() pulls it from there only;  
this requires passing the original name in some contexts (symlinks).

Mangled names are reported in both run output and tallies.  Items 
that still fail after this attempt are now reported as well and 
skipped so the rest of the archive is made available.  As a bonus,
the new coding now mangles symlink names as needed too.

Rationale
=========
ziptools disables the underlying zipfile module's filename mangling
(see _windows_illegal_name_trans_table above), and performs it here 
instead on failures.  The rationale for this is described in the user 
guide, available both in this module's package and online:

- ../_README.htm;#nomangle
- learning-python.com/ziptools/ziptools/_README.html#nomangle

In short, Python's zipfile module always silently mangles all items' 
names this way on Windows, despite its potential to overwrite files
and break syncs back to the source.  The primary purpose for catching 
and handling mangling here is to make this both explicit and optional:
mangles are reported and tallied, and -nomangle turns them off in full.

This update also attempted to support mangling for Android shared
storage that emulates FAT32, but passed due to a bug in Android 11:
folders with nonportable characters are created but accept no files,
mangling the full path here would leave doppelgänger folders, and 
removing or renaming nonportable paths may drop used content.

ziptools also provides a fixer script, fix-nonportable-filenames.py,
which can be run prior to transferring content to analyze or mangle
names, and avoid interoperability issues completely.  This is the 
recommended alternative for Android shared; here, mangling must be 
limited to Windows, to avoid generating bogus folders on Android.

Illegal characters
==================
In all cases, over-aggressive mangling just means that '_' replacements 
will be applied; under-aggressive will result in item skips and reports.
Path separators '/' and '\' are subtle, implicit, and special cases:

- On Windows, any '/' recorded in the zipfile path are changed to '\' by 
  a prestep, and any '\' are then consumed by a pathname split; hence,
  both are treated as path separators, and neither will be mangled.  
  This works whether '\' are from the prestep or recorded in the zipfile,
  and the prestep is performed in zipfile too so this can do no different.

- On Unix, '/' is consumed by a pathname split and hence won't be mangled.
  '\' is legal on Unix and not treated as a path separator; it would be 
  mangled here, but won't cause failures on Unix that trigger this code.

- On Android shared storage, '/' is consumed by a pathname split because 
  it's Unix (really, not Windows), but '\' may fail in FUSE drivers.
  Due to a bug in Android 11, no mangling is performed on Android, so any
  remaining '\' will trigger failures and skips (of files, not folders).
  Due to Android's convolution, it's recommended to run the provided 
  filename fixer script before unzipping to its shared storage.

More on Backslashes
====================
When a filename contains a backslash on Unix, code in both ziptools
and the underlying Python zipfile module it uses will treat the '\'
as a path separator on Windows, and create a subdirectory on extracts.
This is a flaw in Python's module (https://bugs.python.org/issue36534),
so we can do no better here - Python's module will make subdirectories 
for names that don't need to be mangled and hence never reach ziptools'
code, and ziptools mangling on failures must be consistent with this. 

For reference, here is the current code link to Python's extract code:
    https://github.com/python/cpython/
        blob/99495b8afffdc62145598516dbdf99e64b6249bd/Lib/zipfile.py#L1651 

Although this might be improved by mangling '\' before all '/' are 
replaced with '\' for Windows, this would break invalid zipfile entries 
which use '\' as a separator, and must await a fix in Python itself in 
any event.  Running the fixer script is the best work-around for this 
today, and avoids other issues in Python's mangling; back-sync and 
file-overwrite problems are inherent in any name-mangling scheme.

Coding notes
============
This is coded as a brute-force, last-ditch effort to extract a failed 
item, and is followed by a simple skip if this fails too.  The skip is
better than unzip termination as before (the rest of the archive can be
extracted), though it can leave incomplete data; users should check the
tallies at the end of run output for skips, and inspect run messages.

Mangling is attempted only after unmangled-name failure, because the 
underlying Python zipfile module's code is not easily customized, it's 
difficult to get and interpret a path's filesystem to mangle up front, 
and the exceptions may vary.  For more details, see the Python demos:

- docetc/illegal-filenames-demo-1.3/py-windows-ntfs-illegal.txt
- docetc/illegal-filenames-demo-1.3/py-android11-shared-app-illegal.txt

In brief, filename rules vary slightly between Windows and Android;
Android shared storage limits names but its app-specific storage does
not; Windows limits folder names but Android shared storage does not; 
and the exceptions raised on errors differ between the two platforms. 

Subtle: unlike fix-nonportable-filenames.py, this cannot add a numeric 
suffix to make mangled names unique: unzips allow existing folders' 
content to be overwritten, and we cannot tell here if a collision is 
a mangling accident, or a user-intended overwrite.  Yes, yuck.

Caveat: like the UTC timestamps extension, this code is tightly bound
to the current coding of Python's zipfiles; changes may break this.
-----------------------------------------------------------------------
"""

if not RunningOnWindows:
    # only mangle on Windows: Android shared storage botches folders
    return False

elif nomangle:
    # no changes if disabled in ziptools command or call
    return False

else:
    # split and consume path separators 
    zippath0 = zipinfo.filename                     # unzip path recorded in zipfile
    zippath1 = zippath0.replace('/', os.path.sep)   # unix+android no-op, windows /=>\
    zippath2 = os.path.splitdrive(zippath1)[1]      # drop a c: on windows else :=>_
    zipparts = zippath2.split(os.path.sep)          # unix+android on /, windows on \
    
    # illegal chars
    nonportables = ' \x00 / \\ | < > ? * : " '      # for filesystems, not platforms
    nonportables = nonportables.replace(' ', '')    # drop space used for readability

    if not any(c in part for part in zipparts for c in nonportables):
        # none found: mangling won't help
        return False

    else:
        # mangle the entire path
        replacements = {ord(c): '_' for c in nonportables}
        mangledparts = [part.translate(replacements) for part in zipparts]

        # join with zip / even on windows: trailing / means dir in zipfile
        mangledpath  = '/'.join(mangledparts)

        # replace in zipfile structure: required by zipfile.extract()
        zipinfo.filename = mangledpath
        message = '--Name mangled:\n    from... %s\n    to..... %s'
        trace(message % (zippath0, mangledpath))
        return True

#===============================================================================

def extractzipfile(zipname, # pathname of zipfile to extract from pathto='.', # pathname of folder to extract to nofixlinks=False, # do not translate symlink separators? trace=print, # trace router (or lambda *p, **k: None) permissions=False, # propagate saved permisssions? [1.1] nomangle=False): # don't mod bad filename chars to '_' on errors? """ ----------------------------------------------------------------------- Unzip an entire zipfile at zipname to "pathto", which is created if it doesn't exist. Items from the archive are stored under "pathto", using whatever subpaths with which they are recorded in the archive.

Note that compression is passed for writing, but is auto-detected for
reading here.  Pass "trace=lambda *p, **k: None" for silent operation.
This function does no cruft-file skipping, as it is assumed to operate
in tandem with the zip creation tools here; see Mergeall's script
nuke-cruft-files.py to remove cruft in other tools' zips if needed.

[1.1] This now returns an ExtractStats with #files/folders/symlinks/unknowns
and a repr for display when used in shell scripts (see class above).
[1.3] The returned stats object now also has #magled/skipped, though 
these are not printed by its repr unless they are nonzero; they're rare.

MODTIMES:
   At least through the latest 3.X, Python's zipfile library module does 
   record original files' modification times in the zipfiles it creates, 
   but does NOT retain files' original modification time when extracting:
   their modification times are all set to unzip time.  This is clearly 
   a defect, which will hopefully be addressed soon (a similar issue for
   permissions has been posted - see ahead).

   The workaround here manually propagates the files' original mod
   times in the zip as a post-extract step.  It's more code than an
   extractall(pathto), but this version works, and allows extracted
   files to be listed individually in the script's output,

   See this file's main docstring for details on symlinks support here;
   links and their paths are made portable between Unix and Windows by
   translating their path separators to the hosting platform's scheme,
   but "nofixlinks" can be used to suppress path separator replacement.

   UPDATE as of [1.2], errors while writing modtimes are ignored with a 
   message in output (the modtime isn't updated, but the extract proceeds).
   The only context in which this is known to happen is on Android's 2016 
   Nougat and earlier.  Modtime-update failures are silent elsewhere.
   It's unlikely that we'll try to update a symlink's modtime on pre-Oreo  
   Android (symlink creates fail and make a stub file), but it's handled.
   chmod also raises an error before Oreo, but it was already caught and 
   ignored with a message (and can be avoided by not using "-permissions").

FOLDER MODTIMES: 
   Py docs suggest that os.utime() doesn't work for folders' modtime 
   on Windows, but it does.  Still, a simple extract would change all 
   non-empty folders' modtimes to the unzip time, just by virtue of 
   writing files into those folders.  This isn't an issue for Mergeall:
   only files compare by modtime, and dirs are just structural.  The 
   issue is avoided here, though, by resetting folder modtimes to their
   original values in the zipfile AFTER all files have been written.

   The net effect: assuming the zip records folders as individual items
   (see create above), this preserves original modtimes for BOTH files
   and folders across zips, unlike many other zip tools.  Cut-and-paste,
   drag-and-drop, and xcopy can also change folder modtimes on Windows,
   so be sure to zip folders that have not been copied this way if you
   wish to test this script's folder modtime retention.

ABOUT SAVEPATH: 
   The written-to "savepath" returned by zipfile.extract() may not be 
   just os.path.join(pathto, filename).  extract() also removes any 
   leading slashes, Windows drive and UNC network names, and ".." 
   up-references in "filename" before appending it to "pathto", to ensure
   that the item is stored relative to "pathto" regardless of any absolute,
   drive- or server-rooted, or parent-relative names in the zipfile's items.
   zipfile.write() drops all but "..", which zipfile.extract() discards.
   The local extractSymlink() behaves like zipfile.extract() in this regard.

WINDOWS LONG PATHS: 
   To support long pathnames on Windows, always prefixes the pathto target
   dir with '\\?\' on Windows (only), so that all file-tool calls in zipfile
   and zipsymlinks just work for too-long paths -- the length of paths 
   joined to archive names is unknown here.  This internal transform is 
   hidden from users in messages, by dropping the prefix and mapping pathto 
   back to relative if was not given as absolute initially (see showpath()).

LARGE FILES: 
   allowZip64=True uses ZIP64 extensions which support very large files.  Such
   files are supported unevenly in other tools, but work with the create and 
   extract tools here.  It's True by default in Python 3.4+ only, and seems 
   unused when unzipping (ZIP64 fields are used).  See createzipfile() for more.

PERMISSIONS: 
   UPDATE: as of [1.1], ziptools now manually propagates permissions on extracts 
   (unzips) for files, folders, and links, but only if this is explicitly 
   enabled via command/function arguments.  This option should generally 
   be used only when unzipping from zipfiles known to have originated on 
   Unix, and when unzipping back to Unix.  Most use cases that require 
   permissions to survive trips to/from zips probably will satisfy this 
   guideline.  More permissions notes:

   - This option is enabled on unzips only.  Zips have saved permissions 
     since 1.0, though zipsymlinks.py required a change in 1.1 to set 
     per-link permission bits (instead of using a constant).  

   - Not all filesystems support Unix-style permissions.  On exFAT, for 
     instance, os.chmod() silently fails to change anything, and this
     upgrade has no effect.  It's okay to copy a zipfile to/from exFAT, 
     but don't unzip on exFAT if you care about keeping permissions.

   - The new permissions argument is last, perhaps inconsistently, for
     1.0 compatibility.  Use keyword arguments to avoid future flux.

   - The related Mergeall system gets permissions propagation "for free"
     because it uses shutil.copystat() to copy metadata from one file to 
     another.  That's not an option here, because "from" is a zip entry.
     See Mergeall's code at https://learning-python.com/mergeall.html.

   [Former caveat's notes: 
      Python's zipfile module preserves Unix-style permissions on creates 
      (zips) but not extracts (unzips).  This is a known Python bug; see: 
      https://bugs.python.org/issue15795.  ziptools 1.0 didn't try to work 
      around this one because it's subtle (e.g., Unix permissions cannot be 
      applied if the zip host was not Unix, but the host-type code may not 
      be set reliably or correctly in all zips).  Preserving executable 
      permissions on items extracted from zipfiles may also be security risk,
      but that's not much of an excuse: it's fine for zips that you create.]

MAC OS EXFAT BUG FIX: 
   There is a bizarre but real bug on Mac OS (discovered on El Capitan) 
   that requires utime() to be run *twice* to set modtimes on exFAT drive
   symlinks.  Details omitted here for space: see Mergeall's cpall.py script
   for background (https://learning-python.com/programs).  In short, modtime
   is usually updated by the second call, not first:

      >>> p = os.readlink('tlink')
      >>> t = os.lstat('tlink').st_mtime
      >>> os.symlink(p, '/Volumes/EXT256-1/tlink2')
      >>> os.utime('/Volumes/EXT256-1/tlink2', (t, t), follow_symlinks=False)
      >>> os.utime('/Volumes/EXT256-1/tlink2', (t, t), follow_symlinks=False)

   This incurs a rarely-run and harmless extra call for non-exFAT drives,
   but suffices to properly propagate modtimes to exFAT on Mac OS.

   UPDATE [1.1]: permissions aren't impacted by this bug (os.chmod() is 
   a no-op on exFAT, per above), but it also impacts exFAT _folder_ modtimes 
   on Mac OS (only files work properly on exFAT).  Fixed in 1.1 to uname() 
   twice for folders on Mac OS too.  There are a handful of ways to check
   for exFAT explicitly on Unix (e.g., lsvfs, mount, and df -T exfat), 
   but all require brittle output parsing, and none are portable to 
   Windows; accept the trivial uname()*2 speed hit on Mac OS instead.

MODTIMES - DST AND TIMEZONE: 
   UPDATE: as of [1.2], ziptools now stores UTC timestamps for item
   modtimes in zip extra fields, and uses them instead of zip's "local 
   time" on extracts.  This means that modtimes of zipfiles zipped and 
   unzipped by ziptools are immune to changes in both DST and timezone.  
   For more details, see the README's "_README.html#utctimestamps12",
   and see module zipmodtimeutc.py for most of the implementation.
   The former local-time scheme is still used for zipfiles without UTC. 
    
   [Former caveat's notes: 
      Extracts here do nothing about adjusting modtimes of zipped items 
      for the local timezone and DST policy, except to pass -1 to Python's
      time.mktime()'s DST flag to defer to libraries.  The net effect may 
      or may not agree with another unzip tool, and does not address 
      timezone changes.  For more background, see this related note:
      https://learning-python.com/post-release-updates.html#unziptimes.]

ILLEGAL FILENAME CHARACTERS
   UPDATE: as of [1.3], filenames having illegal characters for the unzip
   platform are now mangled to use '_' replacements here explicitly, and 
   reported in both the run's output and its final tallies line.  Name
   mangling is attempted whenever an item fails.  It is applied only on 
   Windows, and for all filesystems there; Windows disables illegal 
   characters across the platform, and Android 11 shared storage has a
   folders bug which precludes mangling there (see trymangle()).  

   As a bonus, the new coding mangles symlink names too if needed, and 
   continues with the unzip if any item fails post mangle.  Because
   mangling has a rare potential for data loss, it can also be disabled 
   with nomangle, and a utility script is provided as an manual alternative; 
   see trymangle() for more details.
   
   [Former caveat's notes:
      For most filenames, Python's underlying zipfile module munges illegal
      characters to "_" when they are extracted on Windows, but a symlink 
      with illegal Windows filename characters will be skipped with a message 
      in the output here (an unlikely case, given Windows' limited symlink 
      support - see zipsymlinks.py).]

ERRONEOUS PATH SEPARATORS
   ziptools' creates (zips) always use Unix '/' for path separators in 
   zipfiles on all platforms, per the zip standard.  While it's not 
   impossible that some Windows tools may record path separators as'\',
   there is no way to recognize these as special here, because '\' is 
   also a valid filename character on Unix.  Hence on Unix, any Windows 
   '\' are taken as part of a filename, and won't generate folders.  If  
   these arise, see the web for fixes, and consider using other Windows 
   zip tools that aren't an affront to standards and interoperability.

TBD: FLAGS ET AL.
   Should ziptools also propagate file flags on Unix?  It already does
   symlinks, UTC modtimes, and UNIX permissions explicitly.  Unix flags 
   have sketchy symlink support across Pythons (e.g., see os.lstat(),
   os.lchflags()); may or may not be easily smuggled in zips; and no 
   use case for them is known to ziptools developers.  There are also 
   the worm-cans of extended attributes and macOS resource forks; pass.
-----------------------------------------------------------------------
"""        
    
trace('Unzipping from', zipname, 'to', pathto)
dirmodtimes = []
stats = ExtractStats()

# always prefix with \\?\ on Windows: joined-path lengths are unknown;
# hence, on Windows 'savepath' result is also \\?\-prefixed and absolute;

pathtoWasRelative = not os.path.isabs(pathto)   # user gave relative?
pathto = FWP(pathto, force=True)                # add \\?\, make abs

#
# extract all items in zip
#
zipfile = ZipFile(zipname, mode='r', allowZip64=True)
for zipinfo in zipfile.infolist():              # for all items in zip
    origname = zipinfo.filename                 # before trymangle mods

    # 
    # extract one item
    #
    try:
        if isSymlink(zipinfo):
            # read/save link path: stubs on non-mangle failures
            trace('(Link)', end=' ')
            try:
                savepath = extractSymlink(
                       zipinfo, pathto, zipfile, nofixlinks, trace)
            except:
                # retry with mangled name? [1.3]
                if trymangle(zipinfo, pathto, nomangle, trace):
                    savepath = extractSymlink(
                           zipinfo, pathto, zipfile, nofixlinks, trace, origname)
                    stats.mangled += 1
                else:
                    raise  # reraise

        else:
            # create file or dir: skip on all failures
            try:
                savepath = zipfile.extract(zipinfo, pathto) 
            except:
                # retry with mangled name? [1.3]
                if trymangle(zipinfo, pathto, nomangle, trace):
                    savepath = zipfile.extract(zipinfo, pathto) 
                    stats.mangled += 1
                else:
                    raise  # reraise

    except Exception as E:
        # continue with rest on any item failure post mangle retry [1.3]
        stats.skipped += 1
        trace('**SKIP - item failed and skipped:', zipinfo.filename)
        trace('Python exception: %s, %s' % (E.__class__.__name__, E))
        continue  # next zipinfo in for loop
            
    # show both from+to paths iff they differ
    filename = zipinfo.filename                          # item's path in zip 
    showname = showpath(savepath, pathtoWasRelative)     # undo fwp on windows           
    trace3(filename, showname, trace)                    # show 1 or 2 lines [1.3]

    # 
    # propagate permissions from/to Unix for all, iff enabled [1.1]
    #
    if permissions:
        try:                                          # create saved perms
            perms = zipinfo.external_attr >> 16       # to lower 16 bits
            if perms != 0:

                if os.path.islink(savepath):
                    # mod link itself, where supported
                    # not on Windows, Py3.2 and earlier
                    # Mac OS bug moot: no-op on exFAT

                    if (hasattr(os, 'supports_follow_symlinks') and
                        os.chmod in os.supports_follow_symlinks):
                        os.chmod(savepath, perms, follow_symlinks=False)

                    # Unix Py 2.X and 3.2- have lchmod, but not f_s
                    elif hasattr(os, 'lchmod'):
                        os.lchmod(savepath, perms)

                else:
                    # mod file or dir, where supported (exFAT=no-op)
                    os.chmod(savepath, perms) 
        except:
            trace('--Error setting permissions')         # e.g., pre-Oreo Android

    # 
    # propagate modtime to files, links (and dirs on some platforms)
    #
    zipinfo.filename = origname                          # lookup premangle [1.3]
    datetime = getModtimeUTCorLocal(zipinfo, zipfile)    # UTC if present [1.2]

    if os.path.islink(savepath):
        # reset modtime of link itself where supported
        # but not on Windows or Py3.2-: keep now time
        # and call _twice_ on Mac for exFAT drives bug  

        stats.symlinks += 1
        if (hasattr(os, 'supports_follow_symlinks') and  # iff utime does links
            os.utime in os.supports_follow_symlinks):
            try:
                os.utime(savepath, (datetime, datetime), follow_symlinks=False)
            except:
                trace('--Error setting link modtime')    # pre-Oreo Android [1.2]
            else:
                # go again for Mac OS exFAT bug
                if RunningOnMacOS:
                    os.utime(savepath, (datetime, datetime), follow_symlinks=False)

    elif os.path.isfile(savepath):
        # reset (non-link) file modtime now              # no Mac OS exFAT bug 
        stats.files += 1
        try:
            os.utime(savepath, (datetime, datetime))     # dest time = src time 
        except:
            trace('--Error setting file modtime')        # pre-Oreo Android [1.2]

    elif os.path.isdir(savepath):
        # defer (non-link) dir till after add files
        stats.folders += 1
        dirmodtimes.append((savepath, datetime))         # where supported

    else:
        # bad type in zipfile
        stats.unknowns += 1
        assert False, 'Unknown type extracted'           # should never happen

# 
# reset (non-link) dir modtimes now, post file adds
#
for (savepath, datetime) in dirmodtimes:
    try:
        os.utime(savepath, (datetime, datetime))         # reset dir mtime now
    except:                                              # pre-Oreo Android [1.2]
        trace('--Error settting folder modtime')
    else:                                                # but ok on Windows/Unix
        # go again for Mac OS exFAT bug [1.1]
        if RunningOnMacOS:
            os.utime(savepath, (datetime, datetime))

zipfile.close()
return stats       # to be printed at shell [1.1]

#===============================================================================

see ../selftest.py for former main code cut here for new pkg structure