msg51299 - (view) |
Author: Enoch Julias (enochjul) |
Date: 2006-10-31 05:05 |
Add a call to file.truncate() to inform Windows of the size of the target file in makefile(). This helps guide cluster allocation in NTFS to avoid fragmentation. |
|
|
msg51300 - (view) |
Author: Lars Gustäbel (lars.gustaebel) *  |
Date: 2006-11-01 15:27 |
Logged In: YES user_id=642936 Is this merely an NTFS problem or is it the same with FAT fs? How do you detect file fragmentation? Doesn't this problem apply to all other modules or scripts that write to file objects as well? Shouldn't a decent filesystem be able to handle growing files in a correct manner? |
|
|
msg51301 - (view) |
Author: Enoch Julias (enochjul) |
Date: 2006-11-06 17:19 |
Logged In: YES user_id=6071 I have not really tested FAT/FAT32 yet as I don't use these filesystems now. The Disk Defragmenter tool in Windows 2000/XP shows the number of files/directories fragmented in its report. NTFS does handle growing files, but the operating system can only do so much without knowing the size of the file. Extracting from archives consisting of only several files does not cause fragmentation. However, if the archive has many files, it is much more likely that the default algorithm will fail to allocate contiguous clusters for some files. It may also depend on the amount of free space fragmentation on a particular partition and whether other processes are writing to other files in the same partition. Some details of the cluster allocation algorithm used in Windows can be found at http://support.microsoft.com/kb/841551. |
|
|
msg51302 - (view) |
Author: Lars Gustäbel (lars.gustaebel) *  |
Date: 2006-11-06 21:57 |
Logged In: YES user_id=642936 Personally, I think disk defragmenters are evil ;-) They create the need that they are supposed to satisfy at the same time. On Linux we have no defragmenters, so we don't bother about it. I think your proposal is some kind of a performance hack for a particular filesystem. In principle, this problem exists for all filesystems on all platforms. Fragmentation is IMO a filesystem's problem and is not so much a state but more like a process. Filesystem fragment over time and you can't do anything about it. For those people who care, disk fragmenter were invented. It is not tarfile.py's job to care about a fragmented filesystem, that's simply too low level. I admit that it is a small patch, but I'm -1 on having this applied. |
|
|
msg51303 - (view) |
Author: Josiah Carlson (josiahcarlson) *  |
Date: 2006-11-08 16:33 |
Logged In: YES user_id=341410 I disagree with user gustaebel. We should be adding automatic truncate calls for all possible supported platforms, in all places where it could make sense. Be it in tarfile, zipfile, where ever we can. It would make sense to write a function that can be called by all of those modules so that there is only one place to update if/when changes occur. If the function were not part of the public Python API, then it wouldn't need to wait until 2.6, unless it were considered a feature addition rather than bugfix. One would have to wait on a response from Martin or Anthony to know which it was, though I couldn't say for sure if operations that are generally performance enhancing are bugfixes or feature additions. |
|
|
msg51304 - (view) |
Author: Lars Gustäbel (lars.gustaebel) *  |
Date: 2006-11-08 21:30 |
Logged In: YES user_id=642936 You both still fail to convince me and I still don't see need for action. The only case ATM where this addition makes sense (in your opinion) is the Windows OS when using the NTFS filesystem and certain conditions are met. NTFS has a preallocation algorithm to deal with this. We don't know if there is any advantage on FAT filesystems. On Linux for example there is a plethora of supported filesystems. Some of them may take advantage, others may not. Who knows? We can't even detect which filesystem type we are currently writing to. Apart from that, the behaviour of truncate(arg) with arg > filesize seems to be system-dependent. So, IMO this is a very special optimization targeted at a single platform. The TarFile class is easily subclassable, just override the makefile() method and add the two lines of code. I think that's what ActiveState's Python Cookbook is for. BTW, I like my files to grow bit by bit. In case of an error, I can detect if a file was not extracted completely by comparing the file sizes. Furthermore, a file that grows is more common and more what a programmer who uses this module might expect. |
|
|
msg51305 - (view) |
Author: Lars Gustäbel (lars.gustaebel) *  |
Date: 2006-12-23 19:03 |
Any progress on this one? |
|
|
msg51306 - (view) |
Author: Lars Gustäbel (lars.gustaebel) *  |
Date: 2007-01-22 19:08 |
Closed due to lack of interest. |
|
|