Issue 8469: struct - please make sizes explicit (original) (raw)

Created on 2010-04-20 12:30 by kiilerix, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
struct.diff kiilerix,2010-06-12 23:35 ideas for further improvements
Messages (10)
msg103699 - (view) Author: Mads Kiilerich (kiilerix) * Date: 2010-04-20 12:30
The struct module is often used (at least by me) to implement protocols and binary formats. That makes the exact sizes (number of bits/bytes) of the different types very important. Please add the sizes to for example the table on http://docs.python.org/library/struct . I know that some of the sizes varies with the platform, and in these cases it is fine to define it in terms of the C types, but for Python programmers writing cross-platform code such variable types doesn't matter and are "never" used. (I assume that it is possible to specify all possible types in a cross-platform way, but I'm not sure and the answer is not obvious from the documentation.)
msg103712 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-04-20 14:11
It is very easy to generate the size table programmatically: >>> for c in "xcbB?hHiIlLqQfdspP": ... print(c, struct.calcsize(c)) ... x 1 c 1 b 1 B 1 ? 1 h 2 H 2 i 4 I 4 l 8 L 8 q 8 Q 8 f 4 d 8 s 1 p 1 P 8 However, all values above except trivial 1-byte entries are platform dependent and C types are already well documented.
msg103720 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-04-20 14:30
As Alexander says, *all* the sizes except those for bytes are platform-dependent: there are platforms where sizeof(short) isn't 2, for example, or where sizeof(int) isn't 4. It would be possible to add the 'standard' sizes to that table (i.e. the sizes that you get when using '<', '>', etc.); would that be helpful? If you're trying to write cross-platform code then you should probably be using standard size, alignment and byte order anyway.
msg103721 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-04-20 14:44
On Tue, Apr 20, 2010 at 10:30 AM, Mark Dickinson <report@bugs.python.org> wrote: .. > It would be possible to add the 'standard' sizes to that table (i.e. the sizes that you get when using '<', '>', etc.);  would that be helpful? The documentation already includes standard sizes in text: "Standard size and alignment are as follows: no alignment is required for any type (so you have to use pad bytes); short is 2 bytes; int and long are 4 bytes; long long (__int64 on Windows) is 8 bytes; float and double are 32-bit and 64-bit IEEE floating point numbers, respectively. _Bool is 1 byte." It may be helpful to add "Standard size" column to the code table with a footnote that it only applies when <, > or ! code is used and that for native sizes one should consult struct.calcsize().
msg103806 - (view) Author: Mads Kiilerich (kiilerix) * Date: 2010-04-21 08:05
The more times I read the documentation and your comments I can see that the implementation is OK and the documentation is "complete" and can be read correctly. Please take this as constructive feedback to improving the documentation to make it easier to understand and harder to read incorrectly. Yes, adding a "Standard size" column would have been very helpful. (I had missed the section on "standard" sizes.) "Standard" is a very general term. And slightly confusing that standard isn't the default. Could the term "platform independent" (or "fixed"?) be added as an explanation of "standard" - or perhaps used instead? Programming skills and platform knowledge at C level should not be a requirement to understand and use struct, so perhaps the references to C should be less high-profile, and perhaps something like this could be added: "All sizes except trivial 1-byte entries (whatever that means) are platform dependent - use calcsize to get the size on your platform." Perhaps the sections explaining 's', 'p', 'ILqQ', 'P' and '?' could be changed to (foot)notes to the table to make it easier to see where they belongs and if they can be skipped. Perhaps "@" in the byte order table could be replaced with "@ (default)"? (And perhaps drop "If the first character is not one of these, '@' is assumed.") The byte order character must come first in the format string and is a key to understand the other format characters, so perhaps everything related to that should come first?
msg103807 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-04-21 08:23
Thanks for the doc suggestions. Actually, the current docs were revised recently; this issue is a helpful reminder to me that those doc revisions need to be backported. :) If you want to see the current docs, look at: http://docs.python.org/dev/library/struct.html I'm +0 on adding the standard sizes to the table of format codes. I also agree it might make sense to swap the 'Format Character' section and the 'Byte Order, Size and Alignment' section. That's all for now; I'll look at this properly sometime soon. The standard/native terminology is fairly ingrained; I'm not sure whether it's really worth changing it, but we can look at the explanations and make sure that they're clear. > Programming skills and platform knowledge at C level should not be a > requirement to understand and use struct, so perhaps the references to > C should be less high-profile, Agreed, though I think the references to C should certainly be there, since they will help some users, and since part of the struct module's raison d'etre is to allow communication with data written/read by C programs. The note about ILqQ returning Python longs might be better omitted; the difference between int and long should be irrelevant to most users.
msg107682 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-12 18:53
I've added sizes to the table, reordered some of the sections, and made a couple of other tweaks (like renaming the 'Objects' section to 'Classes') in r81957 (trunk) and r81955-81956 (py3k). I'll backport these changes to release26-maint and release31-maint; leaving open for that.
msg107685 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-12 19:20
Merged to maintenance branches in r81959 (release26-maint) and r81960 (release31-maint). Closing.
msg107717 - (view) Author: Mads Kiilerich (kiilerix) * Date: 2010-06-12 23:35
Thanks for improving the documentation! A couple of comments for possible further improvements: I think it would be helpful to also see an early notice about how to achieve platform independence, versus the default of the local platform. And perhaps the description of "standard" perhaps could be improved. Perhaps something like the following could be used. Relative to release26-maint/Doc/library/struct.rst rev 81959.
msg107855 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-15 08:53
Thanks for the additional suggestions and patch. I've implemented most of them in revisions r81992 through r81995. I've left the note about 'native size and alignment': native alignment *is* determined using sizeof, and I think this is important information. I've also re-added the information that the 'f' and 'd' formats use IEEE binary32 and binary64, as a note to the format characters table. And I've moved the information that the 'P' format is only available in native mode to the 'format characters' section. Additional suggestions for improvments welcome!
History
Date User Action Args
2022-04-11 14:57:00 admin set github: 52715
2010-06-17 17:54:12 mark.dickinson set status: open -> closed
2010-06-15 08:53:00 mark.dickinson set status: closed -> openmessages: +
2010-06-12 23:35:48 kiilerix set files: + struct.diffkeywords: + patchmessages: +
2010-06-12 19:20:56 mark.dickinson set status: open -> closedversions: + Python 3.1, Python 2.7, Python 3.2messages: + resolution: fixedstage: resolved
2010-06-12 18:53:20 mark.dickinson set messages: +
2010-05-29 20:58:15 mark.dickinson set priority: low
2010-05-29 20:56:52 eric.araujo set priority: normal -> (no value)nosy:mark.dickinson, kiilerix, Alexander.Belopolskycomponents: + Documentation, - Library (Lib)
2010-04-21 08:23:02 mark.dickinson set messages: +
2010-04-21 08:05:09 kiilerix set messages: +
2010-04-20 14:44:29 Alexander.Belopolsky set messages: +
2010-04-20 14:30:46 mark.dickinson set assignee: mark.dickinsonmessages: + nosy: + mark.dickinson
2010-04-20 14:11:40 Alexander.Belopolsky set nosy: + Alexander.Belopolskymessages: +
2010-04-20 12:30:48 kiilerix create