Issue 1243730: Big speedup in email message parsing (original) (raw)

Created on 2005-07-23 22:07 by lpd, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
t.dif	lpd,2005-07-23 22:07	Patches for email/FeedParser.py
1243730.diff	barry,2006-05-28 01:12	review

Messages (5)
msg48615 - (view)	Author: L. Peter Deutsch (lpd)	Date: 2005-07-23 22:07
Python 2.4.1, Red Hat Linux 7.3. Speeds up message parsing on files with large attachments by approximately 4x, mostly by replacing REs by direct string processing.
msg48616 - (view)	Author: Steve Holden (holdenweb) *	Date: 2006-05-25 22:55
Logged In: YES user_id=88157 A first examinaation reveals no particular speedup on an email with approximately 30 MB of attachments. Can the OP perhaps provide some code and test data I could time to verify the assertions of speedup? Otherwise I can't see much point in applying the patch.
msg48617 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2006-05-28 01:12
Logged In: YES user_id=12800 Here's a slightly better version, cleaned up for style and applicable to Python 2.5 (which is the only place I'd feel comfortable applying it). I've verified that this provides about a 3x speed up at least for some messages with really big attachments.
msg124717 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-12-27 17:17
Since this is a performance hack and is considerably invasive of the feedparser code (and needs updating), I'm deferring it to 3.3.
msg184191 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-03-14 20:46
Test fails with stack overflow: ====================================================================== ERROR: test_pushCR_LF (email.test.test_email.TestIterators) FeedParser BufferedSubFile.push() assumed it received complete ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/serhiy/py/cpython2.7/Lib/email/test/test_email.py", line 2585, in test_pushCR_LF bsf.push(il) File "/home/serhiy/py/cpython2.7/Lib/email/feedparser.py", line 140, in push parts = _splitlines(data) File "/home/serhiy/py/cpython2.7/Lib/email/feedparser.py", line 170, in _splitlines lines.extend(_splitlines(part)) ... File "/home/serhiy/py/cpython2.7/Lib/email/feedparser.py", line 170, in _splitlines lines.extend(_splitlines(part)) RuntimeError: maximum recursion depth exceeded

History
Date	User	Action	Args
2022-04-11 14:56:12	admin	set	status: pending -> opengithub: 42213
2017-10-28 12:23:16	serhiy.storchaka	set	status: open -> pending
2014-01-23 21:32:49	serhiy.storchaka	set	versions: + Python 3.5, - Python 3.4
2013-04-13 17:08:13	serhiy.storchaka	set	stage: patch review -> needs patch
2013-03-14 20:46:00	serhiy.storchaka	set	nosy: + serhiy.storchakamessages: +
2013-03-14 08:00:07	ezio.melotti	set	versions: + Python 3.4, - Python 3.3
2012-05-16 01:23:15	r.david.murray	set	keywords: - easyassignee: r.david.murray -> components: + email
2010-12-27 17:17:26	r.david.murray	set	nosy:barry, holdenweb, lpd, ajaksu2, r.david.murrayversions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2messages: + stage: test needed -> patch review
2010-07-20 03🔞04	BreamoreBoy	set	versions: + Python 3.2
2010-05-05 13:45:09	barry	set	assignee: barry -> r.david.murraynosy: + r.david.murray
2009-04-22 14:36:47	ajaksu2	set	keywords: + easy
2009-03-20 22:12:45	ajaksu2	set	versions: + Python 3.1, Python 2.7, - Python 2.5nosy: + ajaksu2components: + Library (Lib), - Interpreter Coretype: performancestage: test needed
2005-07-23 22:07:37	lpd	create