Issue 1302: Fixes for profile/cprofile (original) (raw)

Created on 2007-10-19 22:18 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
py3k_profile_fix.patches	christian.heimes,2007-10-19 22:31
py3k_profile_fix2.patches	christian.heimes,2007-10-21 04:31
py3k_profile_fix-3.patch	alexandre.vassalotti,2007-10-22 01:10
py3k_profile_fix3.patch	christian.heimes,2007-10-25 10:47

Messages (13)
msg56569 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-19 22:18
The patch fixes the output for profile and cProfile. Another patch from Alexandre and me added additional calls to the UTF-8 codec.
msg56570 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-19 22:29
There is no patch. =)
msg56571 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-19 22:31
Sure there is a patch ... well it's ... uhm ... it's hidden under your bed. O:-)
msg56621 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-21 04:31
The new patch does a far more better job. I had the idea after a discussion with Alexandre on #python and a small debugging session. The tests for profile, cProfile and doctest are failing on my Linux box because Py_FileSystemDefaultEncoding is "UTF-8" but PyUnicode_Decode() has a shortcut for "utf-8" only. I've added a tolower loop and shortcuts for UTF-16 and UTF-32 to PyUnicode_Decode(). I've also disabled the tests on system with a fs encoding that isn't built in.
msg56635 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-22 01:10
I don't think it's possible to add shortcuts in PyUnicode_Decode for UTF-16 and UTF-32 because the byte-order can be different depending of the platform. So, these two need to pass through the codecs module. I am sure if it's better, but I factored out the normalization routine into its own function.
msg56637 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-22 01:17
Alexandre Vassalotti wrote: > I don't think it's possible to add shortcuts in PyUnicode_Decode for > UTF-16 and UTF-32 because the byte-order can be different depending of > the platform. So, these two need to pass through the codecs module. utf-16 and utf-32 are the the names for the native codecs. The explicit names are e.g. utf-16-be or utf-32-le. The last argument 0 also means "native byte order". I used a shorter algorithm to optimize the normalization for the special cases of the strcmp() shortcuts. Your version is fine but takes several CPU cycles longer. I don't think it has a large performance impact. ;) Christian
msg56718 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-24 19:07
I like Chris's patch better, but he should use (sizeof(lower)-1) instead of 8, and he should use the TOLOWER macro from bytes_methods.h. Please submit a new one and I'll check it in!
msg56739 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-25 10:47
Why should I use (sizeof(lower)-1)? Do you mean PyMem_Malloc(strlen(encoding) + 1)? Changes since last patch: * added #include "bytes_methods.h" in unicodeobject.c * fix all is* to use the macros from bytes_methods.h * use malloc/free the lower version instead of using a fixed buffer * replace '_' with '-' in encoding I still think that a fixed buffer of 12 chars (strlen(iso-8859-1) + 1 + \0) is sufficient and faster. * also check for iso-8859-1
msg56764 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-25 23:48
Committed revision 58659. I'm sorry I confused you; I was fine with the version that has char lower[N] but I wanted you to not repeat N-1 later in the code. See what I checked in. :-)
msg56765 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-26 00:00
Guido van Rossum wrote: > Guido van Rossum added the comment: > > Committed revision 58659. > > I'm sorry I confused you; I was fine with the version that has char > lower[N] but I wanted you to not repeat N-1 later in the code. See what > I checked in. :-) I still don't understand why you are using (sizeof lower) - 2 and what &lower[(sizeof lower) - 2] returns. Is it the memory address of lower[17]?
msg56766 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-26 00:19
> I still don't understand why you are using (sizeof lower) - 2 It is simply to avoid duplicating the constant (a.k.a. the Don't Repeat Yourself (DRY) rule). > and what &lower[(sizeof lower) - 2] returns. Is it the memory address > of lower[17]? It the address of lower[18] to be exact. (l < &lower[(sizeof lower) - 2]) is simply tricky notation to check the bound of the array. Personally, I used like to subtract pointer, ((lower - l + 1) < (sizeof lower)) to get the bound. But now, I find Guido's trick more cute (and less error-prone). :)
msg56767 - (view)	Author: Alexandre Vassalotti (alexandre.vassalotti) *	Date: 2007-10-26 00:29
> Personally, I used like to subtract pointer, ((lower - l + 1) < > (sizeof lower)) to get the bound. Doh. I got it backward. It's (l - lower + 1), not (lower - l + 1). > But now, I find Guido's trick more cute (and less error-prone). At least, I got that right. :)
msg56768 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-26 00:30
Alexandre Vassalotti wrote: > It the address of lower[18] to be exact. (l < &lower[(sizeof lower) - > 2]) is simply tricky notation to check the bound of the array. > Personally, I used like to subtract pointer, ((lower - l + 1) < (sizeof > lower)) to get the bound. But now, I find Guido's trick more cute (and > less error-prone). :) Wow, that's a cool trick. The old wizard has still some astonishing tricks in the sleeves of his robe. :]

History
Date	User	Action	Args
2022-04-11 14:56:27	admin	set	github: 45643
2008-01-06 22:29:45	admin	set	keywords: - py3kversions: Python 3.0
2007-10-26 00:30:36	christian.heimes	set	messages: +
2007-10-26 00:29:04	alexandre.vassalotti	set	messages: +
2007-10-26 00:19:29	alexandre.vassalotti	set	messages: +
2007-10-26 00:00:48	christian.heimes	set	messages: +
2007-10-25 23:48:39	gvanrossum	set	status: open -> closedresolution: acceptedmessages: +
2007-10-25 10:47:36	christian.heimes	set	files: + py3k_profile_fix3.patchmessages: +
2007-10-24 19:07:21	gvanrossum	set	priority: normalmessages: +
2007-10-22 01:17:03	christian.heimes	set	messages: +
2007-10-22 01:11:00	alexandre.vassalotti	set	files: + py3k_profile_fix-3.patchmessages: +
2007-10-21 04:32:01	christian.heimes	set	files: + py3k_profile_fix2.patchesnosy: + alexandre.vassalottitype: behaviormessages: +
2007-10-20 04:58:37	loewis	set	keywords: + patch
2007-10-19 22:31:03	christian.heimes	set	files: + py3k_profile_fix.patchesmessages: +
2007-10-19 22:29:48	brett.cannon	set	keywords: + py3kassignee: gvanrossummessages: + nosy: + brett.cannon
2007-10-19 22🔞02	christian.heimes	create