File: mergeall-products/unzipped/backup.py (original) (raw)

r"""jun25 (else py3.14? makes ? escapes errors)

backup.py: automatic backup/restore subsystem (part of the mergeall system [2.0])

Summary:

Make backup copies of all files and directories in the TO directory that will be destructively replaced or deleted in-place during a mergeall run. These items' prior versions in the TO tree are saved in the automatically-created bkp folder at the top of the TO archive, with their full directory paths. Backups are not synchronized across trees, but are automatically pruned by age when their number exceeds a limit. This option makes mergeall generally safer, as unwanted or failed changes can be later undone by restoring backup copies.

2.1 extends this model to support complete rollback of a prior run, by merging a backup subfolder to an archive root. It augments backups to also record files added in a _bkp__\subfolder_added.txt file, and runs a special merge mode which does not delete unique items in TO, but does delete all items in TO that are listed in the backup's added.txt file (which is itself removed from TO after the merge). The net effect restores the TO tree's prior state, as long as the restore run is made before any additional changes in TO.

Details:

Because a given item name may appear at multiple places in the archive tree, replaced and deleted items are backed up in folders of this form (the mergeall log gives a more linear list of changed items):

TO-archive-root\__bkp__\date-time\full-archive-path-less-root\filename

Items in the bkp folder are local to the TO archive copy, and save items changed in that copy only. Backup folders are not propagated to other archive copies by mergeall: unlike normal archive items, they are not synchronized across trees. Hence any changes in bkp folders are not themselves backed up. Backup subfolders in bkp are, however, automatically pruned by age when their number exceeds a limit (changeable in the code below). This allows changes from any of multiple mergeall runs to be selected for restores, if needed.

Any errors while writing a backup copy of an item causes the replacement or deletion of that item to be skipped, as the operation is unsafe. The item will register as a difference on the next mergeall run, if not resolved manually.

Note that users can make arbitrary changes to their bkp folders (including deleting them altogether), as they are not synchronized and are created automatically whenever needed. For instance, if the default pruning limit results in too much space being taken by bkp, you may delete portions of it freely.

Also note that bkp folders may register differences in diffall.py runs. As each run has its own uniquely-named subfolders, these will usually be few, and can generally be ignored. Per-run folders' content differences may be useful, however, to compare and analyze backup copies from different runs; run diffall on the subfolders themselves.

Purpose:

Though this option makes mergeall runs safer in general, it was primarily intended as an automatic guard against propagating corrupted files across multiple archive copies, and an better alternative to manual backup copies or multiple archive stores cycled by date manually. Merges generally works without issue if all but one archive copy are treated as read-only; however, unwanted or incorrect changes may spread though archive copies if changes are made in an archive copy and then synched back to the main copy. In such cases, the original's versions of files may be quickly lost, if not backed up.

By backing up items to be changed in each TO copy automatically and retaining multiple runs' backups, bad copies can generally be undone easily by taking the most recent backup's versions, without falling back to potentially much older full archive copies. Backup folders also serve as a record of the most recent mergeall runs against a tree.


More about bkp folders:

Stores bkp folder at top of TO archive so it doesn't become disjoint from it, but does not include bkp in the mergeall synchronization -- its contents are local to the TO archive, and not copied to/from other copies automatically. This is so, because:

  1. Propagating backups between copies in round-robin trips between three (or more) devices seems potentially very confusing, and device source would be unclear.

  2. Synchronizing would delete copes on TO not also on FROM, thereby limiting TO from having more than one backup, in one-way propagation use cases.

  3. Automatic backup pruning wouldn't be possible, as it violates the principle that changes in the archive must all be disjoint during a single mergeall run. If bkp was included in the synchronization process, it's not impossible that one of its folders on the TO archive would be both pruned by the backup system here and deleted due to an absence in the FROM archive -- regardless of the timing, the second deletion attempt would fail with an exception.

More about disjoint changes:

More generally, mergeall works at all only because its updates are mutually exclusive, and disjoint (a given name can appear in just one change category). Its changes are limited to these items, in this order:

a) Same-named files (replaced in TO) b) Unique files and directories in TO (deleted from TO) c) Unique files and directories in FROM (copied to TO) d) Same-named file/directory type mismatches between TO and FROM (replaced in TO)

Once any item is classified and changed in one of these categories, it is not further inspected or altered. Hence, each change in the TO tree is limited to its TO tree location, and cannot be impacted by, or appear as part of, any other changes category, regardless of the timing of changes made. Order does matter for renames on case-insensitive machines like Windows, but are sound as long as deletes occur before adds (see mergeall.py's mergetrees() docstring for more details).

Backups rely on the fact that the change sets are disjoint: a file can be saved for only one category (a, b, or d), and new copies (c) are never saved and thus cannot overwrite saves for deletions, even for mixed-case differences on Windows.

Synchronizing the bkp tree between archives would violate this disjointedness rule, however, because automatic backup subfolder prunes may delete an item also scheduled for deletion in category (b). This is true even though all differences are detected before the changes phase begins, and only removing bkp items from the set of backed-up items does not circumvent this; synchronization changes may still intersect with prunes.

Therefore, top-level bkp folders are removed from the set of synchronized items in the archives, and an archive copy retains only backups from mergeall runs where it served the role of TO archive. Backups can be manually copied into normal archive folders to propagate them to other archive copies if desired, but their management then becomes a user task; only items in the special-cased bkp folders are automatically pruned when their items count limit is exceeded.


General implementation notes:

Caveat: backup paths may be too long in some cases on some platforms. The original file's full archive path is recreated and required, because the same name may appear in more than one place in the tree (and filenames created by concatenating path parts seem more likely to exceed platform length limits).

Does not store a single '.bkp' alongside original: would need to special-case to avoid .bkp.bkp... accumulation, and a '.bkp' may be a valid to-be-archived user file name.

Uses copy instead of move (rename): need to retain original if code here fails, and also want to retain the original's mod times in the copy. os.rename also proved unreliable on Windows, especially across devices.

All errors return with an exception: caller handles. In mergeall, the backup exception here causes the removal or deletion to be skipped, as the operation is then unsafe.

Backs up files replaced or deleted, but does not backup new files added, as these would be just redundant copies of new data.

Backup folder are excluded from the synchronization process by removing only top-level bkp items from os.listdir results in the comparisons procedure. This proved more reliable than os.rename moves to/from a temporary folder (see unused code ahead).

TBD: should backups be stored outside the archive itself? Originally placed in bkp at root of archive, and synchronized so that prior versions are not stored on just one device only (which may fail). The downside of this is that, when this option is used, backups can accumulate quickly, and deletes are removed from the archive's actual folders but linger on the device taking up space. Could backup replaced files only, but that's less secure.

 RESOLVED: later opted to store __bkp__ in the TO archive for association, but
 not synchronize it across other archive copies.  This was required, because
 prunes violated the disjoint updates rule which is the logical basis of mergeall.

TBD: should backups be pruned? Currently, this is automatic, lest old backups accumulate. The downside of this is that the user may not want old backups removed. As an alternative, the end user could be expected to manage backups, and could also be expected to select the backups folder. This would be flexible, but complicates the command line and GUI with another directory choice, and deleting backup folders manually seems a substantial extra admin task.

RESOLVED: backups are stored in the automatically-created __bkp__ at the top of the
TO archives, and are automatically pruned by age after N (default 10) copies have
accumulated.  Given that __bkp__ is now local to each archive copy, this policy
allows ample backup retention -- any of the last N runs can be unwound if needed.
Users may also manually delete large backup folders before they are pruned for space.

TBD: bkp folders will register differences in diffall.py runs for their top-level per-run folders. These can be simply ignored in reports. diffall could skip bkp, like mergeall, but this seems too tight a dependency between the two programs to enforce.

"""

###################

CODE STARTS HERE

###################

from future import print_function # Py 2.X import os, time, glob, shutil, sys, time, errno, stat, unicodedata

this module is largely platform-neutral

RunningOnMac = sys.platform.startswith('darwin') RunningOnWindows = sys.platform.startswith('win')

import mergeall # for copyfile, copytree (from won't work here: recursive) indent1 = '....' # to distinguish messages here from main mergeall logic indent2 = '____' # rmdir retries: not just for pruning (also for uniqueto)

[2.1] from configs file, unless absent or errors

try: from mergeall_configs import MAXBACKUPS except: MAXBACKUPS = 10 # keep up to this many backup folders in each archive copy's bkp

#-------------------------------------------------------------

[2.3] Use UTF8 Unicode encoding for added.txt restore

files in bkp, not platform default. These files contain

filenames which may have arbitrary characters. Python 2.X's

codecs.open files are always binary mode: must specialize

line-ends here too. [3.2] ditto for deltas-set added.

#-------------------------------------------------------------

ADDENC = 'utf-8'

if sys.version[0] == '3': unicode_open = open # what 3.X probably should have done? unicode_linesep = '\n' # 3.X text mode files expand \n as needed else: import codecs unicode_open = codecs.open # 2.X compatibility (or use from...as...) unicode_linesep = os.linesep # 2.X codecs always binary: no \n expansion

#-------------------------------------------------------------

Make just 1 subfolder per run, on first file backed up.

Else may make 1 new subfolder for each second in run!

Now also used by noteaddition, to get run's subfolder:

either backupitem or noteaddition may be called first.

#-------------------------------------------------------------

datetimestamp = None

try to prune just once per run, on first file backed up

pruned = False

if -quiet, print backups message on first backup only [2.4]

firstbkpmsg = True

[3.0] fix too-long paths on Windows

from fixlongpaths import FWP

[3.3] fix Unicode variants in added.txt paths

from fixunicodedups import matchUnicodePathnames

def makedirs_ifneeded(dirpath): r"""jun25 (else py3.14? makes ? escapes errors) -------------------------------------------------------------------------------------- [2.3] Run an os.makedirs() call portably on Python 3.X or 2.X to create any and all parts of a directory path as needed. Only 3.X has the exists_ok flag to avoid an exception if a part already exists, and 2.X doesn't have 3.X's detailed exceptions. Alternative: could also call os.path.exists(dirpath) in an "if", but slightly slower.

Note: Python's os.makedirs() in Lib\os.py is recursive, but probably doesn't need
to be.  Because it scans a linear directory path, a simple loop should suffice, and
yield simpler code (alas, "batteries included" means you get what's shipped in the
box).  Recursion is really required only for arbitrary shapes, such as folder trees,
and even then can be replaced with explicit stacks (see Learning Python 5E, p555-561).
--------------------------------------------------------------------------------------
"""
if sys.version[0] == '3':
    # python 3.X
    os.makedirs(FWP(dirpath), exist_ok=True)          # 'recursive' mkdir, as needed
else:
    # python 2.X
    try:
        os.makedirs(FWP(dirpath))                     # 2.X has no exists_ok
    except OSError as E:                              # 2.X requires errno test
        import errno                                  # only need here on exc
        if E.errno != errno.EEXIST: raise             # reraise all others

def backupitem(pathto, toroot, dobackup, quiet): """ ---------------------------------------------------------------------------------------- If enabled by command-line (or by proxy via GUI toggle or console reply), make a backup copy of items (files and directories) in the TO tree before they are destructively replaced or deleted in-place. This includes a:

1) File or link in the TO tree
       that is about to be replaced by a newer same-named file in the FROM tree
2) File or directory in the TO tree
       that is about to be deleted due to absence in the FROM tree
3) File, directory, or link in TO
       that is about to be replaced due to a dir/file/link type mismatch in the FROM tree
4) File, directory, or link in TO
       that is about to be deleted by a -restore run because it's listed in __added__.txt

Unique files in FROM copied to TO are not backed up, as this is not a destructive action,
though they are listed by noteaddition in the backup's __added__.txt file for the run.

"pathto" is where the item to be destroyed resides; "toroot" is the top of the archive in
the TO tree, where the backup will be stored in a __bkp__ subfolder under pathto's tail.
Does not backup if not "dobackup": backups not enabled per user or rollback.py script.
Prints only one general backup log message if "quiet", to mimimize logfile clutter.

Either this or noteaddition may be called first: prune + timestamp if not yet done.
No longer needed, because __bkp__ not synched: "if pathto.startswith(bkproot): return".

[3.0] Use newly-added "strict" arg to copytree() to force it to propagate its first
file exception to here, instead of printing an error message and continuing.  We need
to pass the eception to this function's caller, so the item's update is cancelled if
its backup fails.  Else, we might delete a TO folder without backing up parts of it!

[3.0] Don't use copytree()'s new "skipcruft" here: we're backing up data already
present in the TO tree.

[3.0] Handle symlinks (to both files or dirs) by passing them to copyfile() just
like normal/simple files; copyfile() tests for links up-front and copies specially.
Symlinks are copied, not followed: backup the link itself, not what it refers to!

[3.2] Now also called to backup __added__.txt items removed from TO by -restore 
runs, to allow delta-set changes to be rolled back (and rollbacks to be unrolled...).
----------------------------------------------------------------------------------------
"""
global firstbkpmsg

# do nothing if "-backup" not passed by command, launcher, or rollback.py
if not dobackup:
    return  # avoid nesting

assert pathto.startswith(toroot)       # sanity check: changed file must be in TO tree
todir, tofile = '?', '?'               # initialize for early exceptions

try:    
    # prune old backups the first time through here (or noteaddition)
    try:
        prunebkpdirs(toroot)
    except:
        print(indent1 + 'prune failed, but backups and mergeall continued')
        print(indent1 + '%s %s' % (sys.exc_info()[0], sys.exc_info()[1]))

    # make run's subfolder timestamp first time here (or noteaddition)
    datetimestamp = makeruntimestamp()
    
    # verify or create the backup-to path in the TO copy
    bkproot = os.path.join(toroot, '__bkp__')                   # toroot is cmdargs.dirto
    todir, tofile = os.path.split(pathto)
    archtail = todir[(len(toroot) + len(os.sep)):]              # remove prefix=cmdargs.dirto
    bkppath = os.path.join(bkproot, datetimestamp, archtail)
    makedirs_ifneeded(bkppath)                                  # 'recursive' mkdir, as needed

    # copy file or dir over to backup copy 
    copytopath = os.path.join(bkppath, tofile)
    if not quiet:
        print(indent1 + 'backing up %s to %s' % (tofile, copytopath))
    elif firstbkpmsg:
        # [2.4] suppress per-file messages (superfluous?), but indicate backups mode once
        allbkpsroot = os.path.join(bkproot, datetimestamp)
        print(indent1 + 'backing up all items to %s' % allbkpsroot)    # [3.0] not bkppath!
        firstbkpmsg = False
    
    if (os.path.isfile(FWP(pathto)) or                        # [3.0] treat links like files
        os.path.islink(FWP(pathto))):                         # [3.0] fix long windows paths 
        mergeall.copyfile(pathto, copytopath)                 # this never catches excs
        
    elif os.path.isdir(FWP(pathto)):
        os.mkdir(FWP(copytopath))                             # [3.0] fix long Windows paths
        mergeall.copytree(pathto, copytopath, strict=True)    # [3.0] fail on any except

    else:
        assert False, ('unknown file type: ' + pathto)        # e.g. fifos: punt            

except:
    print(indent1 + '**Error backing up %s in %s' % (tofile, todir))
    raise   # reraise: handle in caller - cancel update, as it would be destructive

def makeruntimestamp(): """ --------------------------------------------------------------------------------------- Make the run's unique timestamp, to be used for its subfolder name in the TO tree's bkp backups folder. Factored to here, as this may be triggered by either backitem or noteaddition, either of which may be called first during a mergeall run. --------------------------------------------------------------------------------------- """ global datetimestamp

if not datetimestamp:
    datetimestamp = time.strftime('date%y%m%d-time%H%M%S')  # backup's unique top dir name
return datetimestamp

def prunebkpdirs(toroot, maxbackups=MAXBACKUPS): """ --------------------------------------------------------------------------------------- On first backup in session, auto-delete the oldest backup dir(s) in the TO archive if needed, keeping just the most recent N. Most of this was adapted from the frigcal GUI's backup system. TBD: this could be left to user, but seems likely to accumulate. Caller handles any exceptions here: this pre-merge step shouldn't be fatal - proceed.

[3.0] Recoded to skip a failed directory and continue the prune to process other
folders, instead of ending the prune at the first failure.  Else, this may miss
folders in the unlikely event that a failure of a more recent backup folder (e.g.,
permissions) prevents the prune from reaching earlier backups later on the list...
which can only ever happen if the #backups has been reduced in the configs file.
Users must address the failure to allow the failing folder to be pruned eventually.
Callers are still notified with an exception, but details will be displayed here.
---------------------------------------------------------------------------------------
"""
global pruned

if pruned:
    return
else:
    pruned = True
    bkproot = os.path.join(toroot, '__bkp__')
    if os.path.exists(bkproot):
        
        backuppatt  = 'date*-time*'
        currbackups = glob.glob(os.path.join(bkproot, backuppatt))
        currbackups.sort(reverse=True)
        prunes = currbackups[(maxbackups - 1):]              # earliest last, via names sort

        anyfailed = False
        for prunee in prunes:                                # globs have full paths
            print(indent1 + 'pruning', prunee)               # normally 0 or 1, unless failed
            try:
                shutil.rmtree(FWP(prunee, force=True), onerror=rmtreeworkaround)
            except Exception:
                anyfailed = True
                print(indent1 + 'this prunee failed, but pruning continued')
                print(indent1 + '%s %s' % (sys.exc_info()[0], sys.exc_info()[1]))                    

        assert not anyfailed, 'Some prunes had errors'       # [3.0] notify caller of excs

def noteaddition(pathto, toroot, dobackup): r"""jun25 (else py3.14? makes ? escapes errors) -------------------------------------------------------------------------------------- [2.1] Log unique FROM items (files and dirs) added to the TO tree in a text file, with 1 file path (relative to the TO tree's root) per line. The logged items are stored in file: toroot_bkp__\subfolder_added.txt. This allows additions to be automatically removed by "-restore" as part of a later run to perform a complete rollback. The adds file also serves as run documentation, in addition to logfiles. Either this or backupitem may be called first: prune + timestamp if not yet done.

[3.2] The archtail length calc here can be thrown off by a trailing / or \ in 
toroot (a.k.a. cmdargs_dirto); it's now removed in getargs() to avoid the drama.
This happened only when toroot cmdline arg had a trailing slash, and only for
top-level, unnested adds, but truncated the leftmost character in this case. 
For items nested in folders, a double '//' left by the comparison-phase walker 
spared the calcs here, and backupitem() was made immune by its spilt() call.
This also impacted the new deltas.py, where the issue was first uncovered.
This does not impact diffall.py or cpall.py (they simply treat two // as one).

[3.3] Note only: this doesn't print per-item messages, because the mergeall.py 
caller does (for the new copied item recorded here).  By contrast, deltas.py now
does print a message in similar code that adds to an __added__.txt file in delta
sets (for future TO removals), because the mergeall.py print code is not run.
--------------------------------------------------------------------------------------
"""
if not dobackup:
    return  # avoid nesting

try:
    # prune old backups the first time through here (or backupitem)
    try:
        prunebkpdirs(toroot)
    except:
        print(indent1 + 'prune failed, but note-add and mergeall continued')
        print(indent1 + '%s %s' % (sys.exc_info()[0], sys.exc_info()[1]))

    # make run's subfolder timestamp first time here (or backupitem)
    datetimestamp = makeruntimestamp()

    # build and make the adds file's path 
    addsname = '__added__.txt'
    bkproot = os.path.join(toroot, '__bkp__')                    # toroot is cmdargs.dirto
    runroot = os.path.join(bkproot, datetimestamp)               # bkp subfolder for this run
    addspath = os.path.join(runroot, addsname)                   # adds file in run's subfolder
    makedirs_ifneeded(runroot)                                   # 'recursive' mkdir, as needed

    # write copied item's relative path to adds file
    # [2.3] use utf8 for filenames, not platform default
    archtail = pathto[(len(toroot) + len(os.sep)):]              # remove prefix=cmdargs.dirto
    addsfile = unicode_open(addspath, encoding=ADDENC, mode='a') # add to end of file
    try:                                                         # OLD: use default encoding in 3.X
        addsfile.write(archtail + unicode_linesep)               # 3.X expands \n, 2.X does not 
    finally:
        addsfile.close()

except:
    # do NOT reraise: no need to cancel the update, as adds are non-destructive
    print('**Error noting add of %s' % pathto)
    
else:
    # don't issue a trace mesage here, as it seems gratuitous in the logs
    # print(indent1 + 'noted addition in TO of', archtail)
    pass

def trymangle(delpath): """ ---------------------------------------------------------------------- [3.2] For removeprioradds(): on Windows, mangle nonportable filename characters to "_" to match other tools' mods. Do so always, instead of just on errors: there's no way the unmangled form could be stored, and os.path type tests fail with unmangled names.

Mangling can happen when content is copied from Unix to Windows, by 
ziptools default unzips or similar.  Mergeall in general assumes that
FROM names are mangled as needed by unzips or other copies before 
syncs are run on Windows, but __added__.txt lists evade such transforms.
Hence, this code must mangle items listed in __added__.txt to match the 
names under which they are stored on TO on Windows.

Caveats: names may be mangled differently by some tools (e.g., macOS 
copies to FAT32/exFAT use Unicode privates), but the unmangled form 
clearly won't work.  This also risks removing files whose names match 
the mangled name only coincidentally, but that seems astronomically 
unlikely, and skipping removals seems worse.  See ziptools.ziptools.py
and ziptools' _README.html#nomangle for more details; per the latter, 
run the included fix-nonportable-filenames.py before transferring 
content to skip filename interoperability issues like this altogether,
and always for transfers to Android shared storage not mangled here.

Theory: this use case cannot arise when propagating content with 
Mergeall alone (it never mangles), or using intermediate drives with
filesystems or platforms that disallow nonportable names (e.g., Linux
refuses to write such names to FAT32 and exFAT).  However, some other
tools mangle names on Windows (e.g., ziptools), and some drive/platform
combos allow but munge names (e.g., macOS munges to and from Unicode 
privates on exFAT).  In particular, changes propagated indirectly by 
Mergeall's deltas.py and ziptools may wind up mangled on only Windows:
comparisons to a proxy drive with unmangled or munged characters may 
record unique TO names in __added__.txt in unmangled form, but these 
names are mangled on Windows if content was unzipped there by ziptools.

TBD: should mangling also be done for FAT/exFAT write errors on Linux?
ziptools does not mangle on Linux (it skips with error messages), but
other tools may.  Really, the filename fixer script should be run in
this use case - unlike macOS (which munges nonportable chars to/from 
Unicode privates covertly), Linux mangles will be perpetual diffs.
----------------------------------------------------------------------
"""
startpath = delpath

# drop drive first, so ':' not mangled
drive, delpath = os.path.splitdrive(delpath)     # 'c:', '\folder\file'

# split path to parts
delparts = delpath.split(os.path.sep)            # only on windows \ here, [0]=''

# illegal chars
nonportables = ' \x00 / \\ | < > ? * : " '       # for filesystems, not platforms
nonportables = nonportables.replace(' ', '')     # drop space used for readability

if not any(c in part for part in delparts for c in nonportables):
    # none found: mangling won't help
    return startpath
else:
    # mangle entire path, and report
    replacements = {ord(c): '_' for c in nonportables}
    mangledparts = [part.translate(replacements) for part in delparts]
    mangledpath  = os.path.sep.join(mangledparts)
    mangledpath  = drive + mangledpath
    print(indent1 + '--Name mangled: "%s" => "%s"' % (startpath, mangledpath))
    return mangledpath

def removeprioradds(fromroot, toroot, dobackup, quiet): """ ------------------------------------------------------------------------------------ [2.1] Remove items listed in fromroot's added.txt file from the toroot tree, if added.txt and the listed files are present, as part of a complete rollback from backups (or deltas-set apply) in a mergall.py "-restore" run.
This is a pre-merge step: order matters for renames on Windows (the merge must delete and then add, else delete may remove differently cased names).

Assumes fromroot is a __bkp__ subfolder (or at least has an __added__.txt), but 
does not fail if not -- in all cases, ignore exceptions here.  The user may have
deleted __added__.txt to back out removals+replacements only (not adds), and may 
have created a custom __added__.txt elsewhere in another tree to be merged.

We need to care about closing the file on exceptions; this is now a pre-merge step.
The __bkp__ folder's __added__.txt will be copied over to the TO root by the normal
merge; it's deleted manually later, rather than skipping the name during the merge.

CAVEAT: because noteaddition() records added items using the path syntax of the
platform on which the prior mergeall ran, it's not possible to remove prior
additions on a different platform having incompatible path syntax without editing
either the additions file or the code here.  A file added on Windows, for example,
will be noted with "\" in its path, which likely won't work in a restore on Linux.
This could be addressed by always using "/" in additions file paths and running
os.path.normpath() to convert to "\" on Windows only, but this seems a rare use case. 
As is, restores with additions should be run on the same platform as the prior merge.

  [3.0] UPDATE: this former caveat has been lifted, by converting paths in the
  backup's __added__.txt file for the hosting platform's separators.  Thus,
  backups made on Windows can now be restored on Unix, and vice versa.
  [3.2] CAVEAT: this may turn '\' in Unix filenames into path separators, but there 
  seems no reliable way to know if __added__.txt was created on Unix or Windows. 

[3.0] Handle symlinks (to both files or dirs) by passing them to os.remove() just
like normal/simple files; shutil.rmtree() has issues with symlinks at the top;
symlinks are removed, not followed: delete the link itself, not what it refers to!

[3.0] Changed arguably-confusing message format here from "restore removed file:"
to the more consistent "removed added file:" -- it was removed, not restored.

[3.2] Backup deletions here: to support rollbacks of delta sets made by the new 
deltas.py, this now always backs up the items it removes from TO, instead of simply
deleting them.  For deltas applied with mergeall.py's "-restore -backup", this means
that items which were unique in TO will be put back by a later run with "-restore", 
along with other undos.  This also makes it possible to roll back true rollbacks 
made with "-restore -backup", restoring an archive to its former post-sync state;
this use case is unlikely, but is supported at a small cost in extra backups size. 

[3.2] Mangle __added__.txt names on Windows, so they match mangled names that may
have been saved by ziptools extracts for indirect deltas.py syncs; see trymangle().

[3.3] Morph __added__.txt pathnames to the new TO device for delta applies, in 
case any component has an equivalent but different decoded Unicode representation. 
Also change "removed added..." to generic "removed listed..." for delta-sets usage,
and overloaded "-quiet" to suppress messages for normalized names (here, in paths).
------------------------------------------------------------------------------------
"""
addsname = '__added__.txt'
addspath = os.path.join(fromroot, addsname)
numfilesdel = numdirsdel = 0

if os.path.exists(addspath) and os.path.isfile(addspath):
    # [2.3] use utf8, not platform default
    addsfile = unicode_open(addspath, encoding=ADDENC)  # propagate open() exceptions: cancel merge
    try:                                                # OLD: adds file uses default encoding in 3.X
        while True:                                     # decodes can fail - catch via while, not for
            try:
                line = addsfile.readline()
            except:
                print('**Error: restore cannot read added file name: file retained')
                print(sys.exc_info()[0], sys.exc_info()[1])
                continue
            else:
                if not line: break  # eof

            # [3.0] make restore paths portable
            delpath = line.rstrip()
            delpath = delpath.replace('/', os.sep).replace('\\', os.sep)                    
            delpath = os.path.join(toroot, delpath)
            
            # [3.2] replace nonportable characters on windows
            if RunningOnWindows:
                delpath = trymangle(delpath)

            # [3.3] morph Unicode in path to the TO device tree
            tracer = print if not quiet else lambda *args: None
            delpath = matchUnicodePathnames(delpath, tracer)

            if os.path.isfile(FWP(delpath)) or os.path.islink(FWP(delpath)):   # [3.0] +symlinks
                try:                                                           # [3.0] longpaths
                    backupitem(delpath, toroot, dobackup, quiet)               # [3.2] backup
                    os.remove(FWP(delpath))
                except:
                    print('**Error: restore cannot delete file, retained:', delpath)
                    print(sys.exc_info()[0], sys.exc_info()[1])
                else:
                    numfilesdel += 1
                    print(indent1 + 'removed listed file:', delpath)

            elif os.path.isdir(FWP(delpath)):
                try:
                    backupitem(delpath, toroot, dobackup, quiet)               # [3.2] backup
                    shutil.rmtree(FWP(delpath, force=True), onerror=rmtreeworkaround)
                except:
                    print('**Error: restore cannot delete dir, retained:', delpath)
                    print(sys.exc_info()[0], sys.exc_info()[1])
                else:
                    numdirsdel += 1
                    print(indent1 + 'removed listed dir:', delpath)

            else:
                print('**Error: restore skipped missing or unknown type file:', delpath)

    except:
        print('**Error during prior adds removal')    # others? don't reraise: do merge
        print(sys.exc_info()[0], sys.exc_info()[1])
    finally:
        addsfile.close()                              # close, except or not (non-CPython)

return (numfilesdel, numdirsdel)                      # sums: add to merge's delete counts

def dropaddsfile(toroot): """ ------------------------------------------------------------------------------- In "-restore" mode, as a post-merge step get rid of the added.txt that the normal merge may have copied over to TO's root. This is a special case, but it's quicker to drop it forcibly from the root here than to check for it as a skipped filename at each tree level during the merge (though the merge's code supports skipping bkp at top, added.txt otherwise).

[3.2] This is no longer used: __added__.txt is now skipped during the 
comparisons phase, but only at the trees' top levels; see mergeall.py.
-------------------------------------------------------------------------------
"""
addsname = '__added__.txt'
mergedaddspath = os.path.join(toroot, addsname)
if os.path.exists(mergedaddspath):
    os.remove(mergedaddspath)
    return True
else:
    return False   # don't adjust merge's counters or print message

def rmtreeworkaround(function, fullpath, exc_info): """ --------------------------------------------------------------------------------------- Catch and try to recover from failures in Python's shutil.rmtree() folder removal tool, which calls this function automatically on all system errors.

---------------------------------------------------------------------------------------
[2.0] PENDING DELETION FAILURES:

On Windows, deletes may be marked as pending, and not finalized atomically,
leaving an item in place after the delete call returns.  This can cause
rmtree (shutil or custom) operations to fail with a directory-not-empty
error in rare cases, subject to devices and other activity on the machine.

This seems a shortcoming (bug?) in shutil.rmtree for Windows, and may be
improved in the future.  In fact, Python's own test system uses a custom
rmtree with wait loops to avoid the issue.  Here, update failures are mostly
harmless (leaving a difference to be resolved on the next run), and rare
(seen on only 1 machine in 1 year's usage), but errors are better avoided.

Short of low-level C API possibilities, the two solutions seem to be to move
(os.rename) to a temp file and delete from there, or fall into a brief wait loop
to watch for the file removal to be finalized.  The former is subject to some
os.rename oddness (see ahead), and the latter is used in Python's own test system
for rmtree calls.  Adopt the latter here -- this function is a callback on errors
in shutil.rmtree(), and retries the rmdir in a wait-but-bounded loop.

Note that this applies to, and is used by, _both_ backup folder removals here
and general content-tree removals in mergeall.py for unique dirs in the TO
tree; it's here because it was first observed during backup folder removals.
The fix here mimics Python's test system's wait-timing technique of exponentially
increasing delay times up to half a second.  Usually 0 or 1, but at most 10,
retries are run, with delays from .001 to .512 seconds (to see how this is
computed, run code [x = 0.001, while x < 1.0: print(x); x *= 2]).

Caveats: Deletes that are only pending seem a curious property for a filesystem,
and this fix feels hackish.  But this is harmless (it kicks in only on os.rmdir
failures, adding a minor delay), and there's no budget for further research...
Could watch for not empty (ENOTEMPTY) only, but other errors are not inconceivable.

Related threads (though something more authoritative from Microsoft would be nice):
http://stackoverflow.com/questions/3764072/c-win32-how-to-wait-for-a-pending-delete-to-complete
http://bugs.python.org/issue19811

Note: shutil.rmtree could also be replaced with the following (sans some Unix cruft):
    for (root, dirs, files) in os.walk(top, topdown=False):
        for name in files:
            os.remove(os.path.join(root, name))
        for name in dirs:
            os.rmdir(os.path.join(root, name))
but this doesn't fix the Windows pending-deletes issue, and may be less robust
and portable than shutil's time-honed alternative.

UPDATE: the retry loop has now been seen to fire in more use cases: both when a
file is truly in use (in which case, the loop and removal ultimately fail), and
when it is not (in which case, the loop generally runs one or two times, and the
removal succeeds).  Strange, but true...

---------------------------------------------------------------------------------------   
[3.0] READ-ONLY PERMISSION FAILURES (pass):

As a different issue, rmtree operations can also fail due to read-only files in
the tree.  To work around this, an onerror handler like the following can be used,
which is portable to both Unix-en and Windows, and works like a Unix "rm -rf":

import stat
def onerror(func, path, exc_info):           # this is portable code 
    if not os.access(path, os.W_OK):         # read-only permission?
        os.chmod(path, stat.S_IWRITE)        # change to allow writes
        func(path)                           # and retry operation 
    else:
        raise

This workaround was also incorporated into the general onerror handler below as
a first step, before attempting the retry loop described above.  However, this
code is currently DISABLED (via the False), because it makes no sense to override
read-only permissions in this single context only (what about simple file deletes?),
and the user may have marked an item read-only on purpose to protect it.  Instead,
users are expected to fix their read-only permissions and run mergeall again.

It can be argued that permission changes may be okay in this context, because
users understand that mergeall intends to remove items, and read-only items will
leave trash behind if not fixed.  Moreover, files on Windows are sometimes marked
read-only without any user action (e.g., camera card copies), causing rmtree to
fail unexpectedly.  This argument was rejected in the end, in favor of mergeall's
overarching policy that your data is your property; read-only files should never
be deleted without user intervention, even if this incurs extra manual steps.

---------------------------------------------------------------------------------------   
[3.0]..[3.3] FILE-NOT-FOUND ERRORS:

On macOS, a hidden AppleDouble "._xxx" resource-fork file is automatically
deleted with its "xxx" data-fork (real) file.  This occasionally causes
exceptions in Python's shutil.rmtree, if the AppleDouble is deleted
automatically _before_ rmtree gets around to deleting it manually.  Though
rare, this has been seen to happen both on fast internal SSD and slow USB
flash drives.  Skip the exception (and file) here for Mac "._" names only.

A similar coupling can occur on Windows (e.g., for web-site folders and
their HTML files, though this may be an Explorer kludge).  As of [3.2], this 
case is now handled explicitly for foldrs before the pending-deletion loop.  
Should this simply always ignore all file-not-found errors?  A file is gone 
if it's gone, but file-not-found might be triggered in other contexts that 
shouldn't be muted (e.g., long paths).

UPDATE: Windows long paths seem moot here.  They can trigger file-not-found 
too, but this should never arise in Mergeall.  It prefixes all paths by FWP() 
to lift the default path-length limit, before they are passed to system calls.
Hence, this context seems impossible, though file-not-found may arise otherwise.

UPDATE: per testing, Windows auto-deletion of folders with their same-named
files appears to happen in Explorer only, and not in the shell or Python.  
Conversely, macOS auto-deletion of "._" AppleDouble files with their non-"._"
files IS automatic in all contexts tested, including the shell and Python
(create and view a file on an exFAT or FAT drive to see for yourself).  This
seems to be embedded deeply in macOS's libraries, and likely reflects the 
odd and proprietary resource/data-fork files split on that platform--a topic
generally best ignored... except in POSIX tree-removal code like this.

UPDATE [3.3]: per later research in the Android Deltas Sync system, Windows 
auto-deletes are part of a Windows concept known as Connected Files, which
can be tweaked in the registry; Explorer just happens to recognize this
concept.  Still, auto-deletes do NOT happen in either the Windows shell or 
Python file ops, so the test here is moot, and has been disabled as of [3.3]. 
For a fuller exploration, visit the parallel common.rmtreeworkaround() tools 
in A-D-S, at https://learning-python.com/android-deltas-sync/common.py.
---------------------------------------------------------------------------------------
"""

"""
# NO: why chmod in this context only?
# try to fix read-only errors first? [3.0]
if (not os.access(fullpath, os.W_OK)
    and function in [os.rmdir, os.remove, os.unlink]):
    try:
        print(indent2 + 'fixing read-only on', fullpath)
        os.chmod(fullpath, stat.S_IWRITE)
        function(fullpath)
    except:
        pass     # fail: try other workaround below (or not? tbd)
    else:
        return   # okay: this fix worked, proceed with shutil.rmtree
"""

# Windows only, directory deletes only [2.0]
if RunningOnWindows and function == os.rmdir:

    """
    # NO: moot per above - disable till use case arises [3.3]
    # assume auto-removed with associated file by Windows, or other [3.2]
    if exc_info[0] == FileNotFoundError:
        msg = '**Note: ignored FileNotFoundError for Windows dir'
        print(msg, fullpath)
        return                                   # folder deleted with file: proceed                               
    """

    # wait for pending deletes of contents
    timeout = 0.001                              # nit: need to try iff ENOTEMPTY
    while timeout < 1.0:                         # 10 tries only, increasing delays
        print(indent2 + 'retrying rmdir')        # set off, but not just for pruning
        try:
            os.rmdir(fullpath)                   # rerun the failed delete (post FWP!)
        except os.error as exc:
            if exc.errno == errno.ENOENT:        # no such file (not-empty=ENOTEMPTY) 
                return                           # it's now gone: proceed with rmtree
            else:
                time.sleep(timeout)              # wait for a fraction of second (.001=1 msec)
                timeout *= 2                     # and try again, with longer delay
        else:
            return                               # it's now gone: proceed with rmtree

# macOS only, ignore file-not-found for AppleDouble files [3.0]
if exc_info[0] == FileNotFoundError:

    if RunningOnMac:
        itemname = os.path.basename(fullpath)
        if itemname.startswith('._'):
            # assume auto-removed with associated file by macOS, or other
            print('**Note: ignored FileNotFoundError for AppleDouble', fullpath)
            return
        
    elif RunningOnWindows:
        # NO: this seems moot - isdir() should be False if FileNotFound [3.3];
        # if this once had a purpose in a prior coding, it's been lost to time;
        # it may have been for files auto-deleted with folders - that's moot too;
        pass
        """
        if os.path.isdir(fullpath):
            # assume removed by Windows, or other
            print('**Note: ignored FileNotFoundError for Windows dir', fullpath)
            return   # or just ignore all not-found excs on Windows?
        """

raise  # all other cases, or wait loop end: reraise exception to kill rmtree caller

r'''DEFUNCT (r""" jun25, else py3.14? makes ? escapes errors)

============================================================================================== THE FOLLOWING FUNCTIONS ARE NO LONGER USED (but retained as examples and lessons)

Instead of the following, restructured mergeall's recursive comparison algorithm to skip 'bkp' items in the top-level os.listdir result only.

Testing in Python 3.X showed os.rename to be unreliable. On Windows, it fails when the directories are on different devices (e.g., a USB stick and C:, possibly due to differing file systems). It also generated unexplainable permission errors on one Windows test machine, even when the source and destination were on the same file system. As recursive copies and deletes are slow, recoded comparisons to skip the folders in pure Python code instead. The shutil.move call tries os.rename and falls back on copy+delete too, but it's prone to the same issues seen here.

import tempfile, stat

def excludebkpdirs(toroot, fromroot): """ --------------------------------------------------------------------------------------- Remove both archive's bkp dirs from consideration, before diffs detection begins. To avoid complicating and slowing (or rewriting) change detection, simply move (rename) these out to a temp dir, and restore them after change detection finishes. They will not register changes, and so won't be propagated to any other archives.

This and restorebkpdirs are coded fairly defensively, as this requires system calls;
os.rename has been seen to fail for a true temp dir on Windows due to permissions
(for no readily apparent reason...), so resort to program's cwd as a fallback option.

On Windows, the destination of os.rename cannot exist, even for dirs; use new subdirs.
mkdtemp adds random 6-character sequences to dir name till unique; add pid to be sure.
Caveat: this may run up against directory path-length limits on some platforms?
---------------------------------------------------------------------------------------
"""
global tempdir, temptobkp, tempfrombkp
tempdir = temptobkp = tempfrombkp = None
exists, join = os.path.exists, os.path.join
try:
    tobkp   = join(toroot, '__bkp__')
    frombkp = join(fromroot, '__bkp__')
    if exists(tobkp) or exists(frombkp):          
        try:
            tempdir = tempfile.mkdtemp(prefix='mergeall-', suffix=str(os.getpid()))
            if RunningOnWindows:
                os.chmod(tempdir, stat.S_IWRITE)   # may require force writeable?
            open('temp.txt', 'w').write('try rename\n')
            try:
                os.rename('temp.txt', join(tempdir, 'temp.txt'))
            except:
                os.remove('temp.txt')
                raise  # reraise
            else:
                os.remove(join(tempdir, 'temp.txt'))
        except:
            print('using cwd as temp dir fallback')         # show sys.exc_info()[0,1]?
            print(sys.exc_info()[0], sys.exc_info()[1])
            tempdir = os.getcwd()                           # temp unusable; or os.curdir

        if exists(tobkp):
            print('excluding', tobkp)
            temptobkp = join(tempdir, 'to.__bkp__')
            os.rename(tobkp, temptobkp)                     # quick move, not copy

        if exists(frombkp):
            print('excluding', frombkp)
            tempfrombkp = join(tempdir, 'from.__bkp__')
            os.rename(frombkp, tempfrombkp)                 # either can exist or not
except:
    print('Cannot move __bkp__ to temp: rerun after manually moving out of archive')
    assert False, 'mergeall changes cancelled'

def restorebkpdirs(toroot, fromroot): """ --------------------------------------------------------------------------------------- Restore bkp folders from temp dir, after diffs detection, and before changes begin. See excludebkpdirs above for more details. --------------------------------------------------------------------------------------- """ global tempdir, temptobkp, tempfrombkp join = os.path.join try: if temptobkp: tobkp = join(toroot, 'bkp') print('restoring', tobkp) os.rename(temptobkp, tobkp)

    if tempfrombkp:
        frombkp = join(fromroot, '__bkp__')
        print('restoring', frombkp)
        os.rename(tempfrombkp, frombkp)
except:
    print('Cannot restore __bkp__ from temp, changes cancelled: restore from %s' % tempdir)
    assert False, 'mergeall changes cancelled'
else:
    if tempdir != None and tempdir != os.getcwd():    # not if fallback to cwd!
        os.rmdir(tempdir)

def isbkpdir(path, archroot): """ --------------------------------------------------------------------------------------- Original idea: Call this from mergeall to skip a TO or FROM bkp path during diffs detection phase. Because these are skipped, they won't trigger any changes in the changes phase. This is required to avoid including bkp in synchronization (see top docstring). Later replaced with os.rename moves, which was later replaced with comparison recoding. --------------------------------------------------------------------------------------- """ bkproot = os.path.join(archroot, 'bkp') return os.path.normpath(path).startswith(os.path.normpath(bkproot)) # equate / and ; case?

# that is...
#return path[(len(archroot) + len(os.sep)):] == '__bkp__'

DEFUNCT'''