[Python-Dev] standard library mimetypes module pathologically broken? (original) (raw)

Jacob Rus jacobolus at gmail.com
Sat Aug 1 01:07:34 CEST 2009


Brett Cannon wrote:

 * It creates a defaultmimetypes() function which declares a  bunch of global variables, and then immediately calls  defaultmimetypes() below the definition. There is literally  no difference in result between this and just putting those  variables at the top level of the file, so I have no idea why  this function exists, except to make the code more confusing.

It could potentially be used for testing, but that's a guess. Here's an abridged version of this function. I don’t think there’s any reason for this that I can see.  def defaultmimetypes():  global suffixmap  global encodingsmap  global typesmap  global commontypes  suffixmap = {  '.tgz': '.tar.gz', #...  }  encodingsmap = {  '.gz': 'gzip', #...  }  typesmap = {  '.a'      : 'application/octet-stream', #...  }  commontypes = {  '.jpg' : 'image/jpg', #...  }  defaultmimetypes() As R. David pointed out, it is being used by regrtest to clean up after running the test suite.

Yeah, basically the issue is that the default mime types should be separate objects from the final set after apache's files have been parsed and custom additions have been made. If these ones at the top level are renamed and not modified after creation, if new objects with all the updated stuff is put at these names, and if the test code is changed to instead reset the ones at these names based on the default objects, I think that will maybe fix things. I'll try to write some potential patches in the next day or two and submit them here for advice.

The problem is that the semantics as documented are really ambiguous, and what I would consider the reasonable interpretation is different from what the code actually does. So anyone using this code naively is going to run into trouble, and anyone relying on how the code actually works is going behind the back of the docs, but they sort of have to in order to use much of the functionality of the module. I agree this puts us in a tricky spot. Well, perhaps the docs can be updated to match the code where cleanup would change the semantics.

I think that would make the docs extremely confusing, and I’m not even sure it would be possible. The current semantics are vaguely okay if an API consumer sticks to straight-forward use cases, such as any which don’t break when the current docs are followed (anything complicated is going to break unless the code is read a few times), and assuming such uses it would be possible to swap out most of the implementation for something relatively straight-forward. But if any of the edges are pushed, the semantics quickly turn insane, to the point I’m not sure they’re document-able. Anyone expecting the code to work that way is going to have a buggy program anyway, so I’m not sure it makes sense to bend over backwards leaving the particular set of bugs unchanged.

Cheers, Jacob Rus



More information about the Python-Dev mailing list