msg91196 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-02 19:19 |
See discussion started right at the end of the month at http://mail.python.org/pipermail/python-dev/2009-July/090928.html And continued at http://mail.python.org/pipermail/python-dev/2009-August/thread.html Basically, the mimetypes module is fragile and very confusing code, built up over years of feature creep without refactoring or careful overall design. I'd like to cut it down to a more manageable code size, fix some bugs, update the included list of mime types, and use some nice Python features of versions 2.2+. Ideally someone reading the module once through would be able to understand what it does. Patches to be attached shortly. |
|
|
msg91200 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-02 20:08 |
This diff should leave the semantics of the module essentially unchanged (including lazy-loading of default files), and also leave the particular MIME types used unchanged, even though these are out of date and should be updated; a subsequent suggested version will address that, perhaps after some discussion. |
|
|
msg91203 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-02 20:23 |
Here is a version of the patch which does away with the lazy loading: these are a small handful of easy-to-parse ~40k files; if the import takes an extra eye-blink, it shouldn't be too big a deal. |
|
|
msg91204 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-02 20:26 |
A fixed version of the patch from , 2009-08-02 20:08 |
|
|
msg91205 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-02 20:43 |
This version (#4) switches to expressing the default types as a list of tuples instead of as a dict, so that we can include duplicate rows so that "reverse" type -> extension lookups will behave properly, once we start changing the actual content of the defaults. The types_map and common_types dictionaries (aliases to the singleton MimeTypes object's types_map property) have been left behaving as before for backwards compatibility. The tests still pass. |
|
|
msg91208 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-02 21:52 |
Here is a list I generated of all the current Apache mime.types: I would just as soon include this in the python standard library, either just the Apache file as is, or even these python object literals (maybe in a file outside of mimetypes.py), and then *not* import from Apache files by default, to cut down on external dependencies. There are several alternate MIME types for various types that should be added to this list (in earlier positions so they only are used in the type -> extension map). The only issue is that some users may have added to their Apache mime.types files for the sake of getting mailman or other python programs to do what they want. So I'm not entirely sure to what extent we should be 100% backwards compatible in such edge cases. My personal opinion is that the 'strict' option is unnecessary and should be set to do nothing, because users are more likely to want the predictable behavior where an unorthodox type gives back the proper extension, than the behavior where their code fails unless they pass a flag in: I don't see any reason for a user to want a 'type doesn't exist' message back for non-registered types. This isn't a "test for IANA registration" module. |
|
|
msg91489 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-11 22:58 |
Plone uses this thing, which has *much* more complexity than necessary for the standard library, but it might be nice to pick up the code for pulling types out of the windows registry, for instance. http://svn.plone.org/svn/archetypes/Products.MimetypesRegistry/trunk/Produ cts/MimetypesRegistry/MimeTypesRegistry.py |
|
|
msg91583 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-15 02:20 |
Okay, here's a version of this patch which (a) adds deprecation warnings, and (b) doesn't bother with lazy init. It should still be nearly completely backwards compatible with the previous mimetypes module. |
|
|
msg91585 - (view) |
Author: Jacob Rus (jrus) * |
Date: 2009-08-15 02:30 |
And at Rietveld, patch version 5: http://codereview.appspot.com/107042 |
|
|
msg91884 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2009-08-23 02:17 |
See also issue 6763. |
|
|
msg93829 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2009-10-10 14:38 |
Putting this here for the record rather than leaving it in Rietveld: I appreciate the desire for a cleaner API for handling mimetypes, but this isn't the way to get it. Finding projects that have their own mimetypes implementations, asking them why they created their own rather than using the standard one, seeing what features are common to those APIs, etc, are all things that need to be done before making major changes to the standard library API. What you see as a critical bug (custom MimeTypes instances inheriting their initial settings from the mimetypes._db instance), you can bet some developers are relying on as a feature. If code is in the standard library, someone, somewhere, is relying on it working just the way it is now. Even bug fixes can sometimes break code that was designed to work around the presence of the bug. The concept of having a master copy that new instances are cloned from isn't even particularly objectionable, so long as people clearly understand that is what is going on (e.g. this happens with decimal.DefaultContext being used as the basis for new decimal.Context instances). With code this old, 'softly, softly' is the way to go, and the fewer user visible changes in semantics the better. |
|
|
msg128251 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2011-02-09 23:18 |
Thanks for working on cleaning up that module. I have to agree with Nick though (see also minor comments on Rietveld): code in the stdlib just can’t move as freely as outside of it. I’m updating the version to 3.3, given that this patch adds new features and refactors things (stable branches only get bug fixes). |
|
|
msg209140 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2014-01-25 03:41 |
Note that I still believe there are substantial improvements that could be made without a wholesale rewrite of the module that poses significant backwards compatibility risks (just improving the documentation regarding how the list of types is populated could likely help some users, as would updating the default list we use if we can't retrieve one from the environment). Alternatively, even if we can't get anyone interested in such a refactoring task, it may be feasible to introduce an improved mimetypes handling interface that is easier to maintain and keep up to date, again without risking backwards compatibility issues for users of the current module. Some potentially relevant links for anyone wanting to investigate improving the standard library's MIME type support: The discussions with Jacob in Rietveld regarding his original approach: https://codereview.appspot.com/107042 PyPI libraries: https://pypi.python.org/pypi/mimeparse/ https://pypi.python.org/pypi/mime https://pypi.python.org/pypi/zope.mimetype https://pypi.python.org/pypi/Products.MimetypesRegistry (Jacob pointed this one out above) The various PyPI wrappers around libmagic and the *nix "file" utility are also of potential interest for research purposes (but aren't especially useful on Windows, where those tools are significantly less likely to be available). |
|
|