Issue 1097597: SimpleHTTPServer sends wrong Content-Length header (original) (raw)

On Microsoft Windows, text files use \r\n for newline. The SimpleHTTPServer class's "send_head()" method opens files with "r" or "rb" mode depending on the MIME type. Files opened in "r" mode will have \r\n -> \n translation performed automatically, so the stream of bytes sent to the client will be smaller than the size of the file on disk.

Unfortunately, the send_head() method sets the Content-Length header using the file size on disk, without compensating for the \r\n -> \n translation.

I remedied this on my copy thusly:

  if mode == "r":
    content = f.read()
    contentLength = str(len(content))
    f.seek(0)
  else:
    contentLength = str(os.fstat(f.fileno())[6])

  self.send_header("Content-Length", contentLength)

This may not be as inefficient as it seems: the entire file was going to be read in anyway for the newline translation.

Hmmm. The code could be slightly simpler:

 if mode == "r":
    contentLength = len(f.read())
    f.seek(0)
  else:
    contentLength = os.fstat(f.fileno())[6]

  self.send_header("Content-Length",

str(contentLength))

The documentation for SimpleHTTPServer in Python 2.3.4 for Windows says:

A 'Content-type:' with the guessed content type is output, and then a blank line, signifying end of headers, and then the contents of the file. The file is always opened in binary mode.

Actually, after Content-type, the Content-Length header is sent.

It would probably be nice if "Content-Length" was "Content-length" or if "Content-type" was "Content-Type", for consistency. The latter is probably best, per RFC 2016.

By the way, clients weren't caching the files I sent. I added another line after the Content-Length handling:

  self.send_header("Expires", "Fri, 31 Dec 2100

12:00:00 GMT")

This is egregiously wrong in the general case and just fine in my case.

Logged In: YES user_id=341410

Would it be wrong to open all files with a mode of 'rb', regardless of file type?

While I don't know MIME embeddings all that well, I do have experience with email and that most codecs that use MIME embeddings (like base 64, 85, 95, etc.) are \r, \n and \r\n agnostic..