[Python-Dev] python hangs when parsing a bad-formed email (original) (raw)

Alberto Casado Martín alberto.casado.martin at gmail.com
Tue Apr 22 09:43:02 CEST 2008


Hi all, First of all, sorry if this isn't the list where I have to post this. And sorry for my english.

As the subject says, I'm having problems with the attached email, when I try to get a email object reading the attached file, the python process gets hang and gets all cpu.

I have debuged my code to find where it happens, and I found that is _parsegen method of the FeedParser class. I know that the email format is wrong but I don't know why python hangs.

following paste the code showing where hangs.

def _parsegen(self): # Create a new message and start by parsing headers. self._new_message() headers = [] # Collect the headers, searching for a line that doesn't match the RFC # 2822 header or continuation pattern (including an empty line). for line in self._input: if line is NeedMoreData: yield NeedMoreData continue if not headerRE.match(line): # If we saw the RFC defined header/body separator # (i.e. newline), just throw it away. Otherwise the line is # part of the body so push it back. if not NLCRE.match(line): self._input.unreadline(line) break headers.append(line) # Done with the headers, so parse them and figure out what we're # supposed to see in the body of the message. self._parse_headers(headers) # Headers-only parsing is a backwards compatibility hack, which was # necessary in the older parser, which could throw errors. All # remaining lines in the input are thrown into the message body. if self._headersonly: lines = [] while True: line = self._input.readline() if line is NeedMoreData: yield NeedMoreData continue if line == '': break lines.append(line) self._cur.set_payload(EMPTYSTRING.join(lines)) return if self._cur.get_content_type() == 'message/delivery-status': !!!!!! AT THIS POINT HANGS, AND STRAT TO GET ALL CPU FOR THE PROCESS # message/delivery-status contains blocks of headers separated by # a blank line. We'll represent each header block as a separate # nested message object, but the processing is a bit different # than standard message/* types because there is no body for the # nested messages. A blank line separates the subparts. ... ... ...

I have workaround the problem adding this line in _parse_headers method

def _parse_headers(self, lines): # Passed a list of lines that make up the headers for the current msg lastheader = '' lastvalue = [] for lineno, line in enumerate(lines): # Check for continuation if line[0] in ' \t': if not lastheader: # The first line of the headers was a continuation. This # is illegal, so let's note the defect, store the illegal # line, and ignore it for purposes of headers. defect = errors.FirstHeaderLineIsContinuationDefect(line) self._cur.defects.append(defect) continue if line.strip()!='': !!!!!!! IF THE CONTINUATION LINE IS NOT EMPTY ADD THE LINE TO THE HEADER. lastvalue.append(line) continue if lastheader: ... ... ...

I don't know why it hangs and I'm not sure why with this line works......

I have tried to parse this email in python 2.3.3 SunOs, python 2.3.3 gcc python 2.5.1 SunOs,gcc, Windows Xp, and linux SUSE 10. And I have alway the same result.

bash-3.00$ python Python 2.5.1 (r251:54863, Feb 28 2008, 07:48:25) [GCC 3.4.6] on sunos5 Type "help", "copyright", "credits" or "license" for more information.

import email fp = open('raro.txt') mail = email.messagefromfile(fp) never return............

I don't know if someone can tell me what is happening....

Best Regards.

Alberto Casado. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: raro.txt URL: <http://mail.python.org/pipermail/python-dev/attachments/20080422/46c9e000/attachment-0001.txt>



More information about the Python-Dev mailing list