[Python-Dev] Python-Dev Digest, Vol 102, Issue 35 (original) (raw)

python neo_python at 126.com
Mon Jan 16 11:23:51 CET 2012

Previous message: [Python-Dev] Dinsdale is no more
Next message: [Python-Dev] Python-Dev Digest, Vol 102, Issue 35
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

jbk

Send Python-Dev mailing list submissions to python-dev at python.org

To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/python-dev or, via email, send a message with subject or body 'help' to python-dev-request at python.org You can reach the person managing the list at python-dev-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Python-Dev digest..."

Today's Topics: 1. Re: Status of the fix for the hash collision vulnerability (Gregory P. Smith) 2. Re: Status of the fix for the hash collision vulnerability (Barry Warsaw) 3. Re: Sphinx version for Python 2.x docs (?ric Araujo) 4. Re: Status of the fix for the hash collision vulnerability (martin at v.loewis.de) 5. Re: Status of the fix for the hash collision vulnerability (Guido van Rossum) 6. Re: [Python-checkins] cpython: add test, which was missing from d64ac9ab4cd0 (Nick Coghlan) 7. Re: Status of the fix for the hash collision vulnerability (Terry Reedy) 8. Re: Status of the fix for the hash collision vulnerability (Jack Diederich) 9. Re: cpython: Implement PEP 380 - 'yield from' (closes #11682) (Nick Coghlan) 10. Re: Status of the fix for the hash collision vulnerability (Nick Coghlan) ---------------------------------------------------------------------- Message: 1 Date: Fri, 13 Jan 2012 19:06:00 -0800 From: "Gregory P. Smith" <greg at krypto.org> Cc: python-dev at python.org Subject: Re: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: <CAGE7PNKkHW-WqiuQC9bhqxnoU77f+eprsq3nqmycstM3JZag at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at krypto.org> wrote:

On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum <guido at python.org>wrote: On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at pitrou.net>wrote:

On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum <guido at python.org> wrote: > Hm... I started out as a big fan of the randomized hash, but thinking more > about it, I actually believe that the chances of some legitimate app having > >1000 collisions are way smaller than the chances that somebody's code will > break due to the variable hashing.

Breaking due to variable hashing is deterministic: you notice it as soon as you upgrade (and then you use PYTHONHASHSEED to disable variable hashing). That seems better than unpredictable breaking when some legitimate collision chain happens.

Fair enough. But I'm now uncomfortable with turning this on for bugfix releases. I'm fine with making this the default in 3.3, just not in 3.2, 3.1 or 2.x -- it will break too much code and organizations will have to roll back the release or do extensive testing before installing a bugfix release -- exactly what we don't want for those. FWIW, I don't believe in the SafeDict solution -- you never know which dicts you have to change. Agreed. Of the three options Victor listed only one is good. I don't like SafeDict. -1. It puts the onerous on the coder to always get everything right with regards to data that came from outside the process never ending up hashed in a non-safe dict or set anywhere. "Safe" needs to be the default option for all hash tables. I don't like the "too many hash collisions" exception. -1. It provides non-deterministic application behavior for data driven applications with no way for them to predict when it'll happen or where and prepare for it. It may work in practice for many applications but is simply odd behavior. I do like randomly seeding the hash. +1. This is easy. It can easily be back ported to any Python version. It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We willprovide a flag and/or environment variable that can be set to turn the feature off at their own peril which they can use in their test harnesses that are stupid enough to use doctests with order dependencies. What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. add logic to allow a particular seed to be specified or forced to 0 from the command line or environment. add the logic to grab random bytes. add the autoconf glue to disable it. done. -gps This approach worked fine for Perl 9 years ago. https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120113/3fb82673/attachment-0001.html> ------------------------------ Message: 2 Date: Sat, 14 Jan 2012 04:19:38 +0100 From: Barry Warsaw <barry at python.org> To: python-dev at python.org Subject: Re: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: <20120114041938.098fd14b at rivendell> Content-Type: text/plain; charset=US-ASCII On Jan 13, 2012, at 05:38 PM, Guido van Rossum wrote: On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: Breaking due to variable hashing is deterministic: you notice it as soon as you upgrade (and then you use PYTHONHASHSEED to disable variable hashing). That seems better than unpredictable breaking when some legitimate collision chain happens. Fair enough. But I'm now uncomfortable with turning this on for bugfix releases. I'm fine with making this the default in 3.3, just not in 3.2, 3.1 or 2.x -- it will break too much code and organizations will have to roll back the release or do extensive testing before installing a bugfix release -- exactly what we don't want for those. +1 -Barry ------------------------------ Message: 3 Date: Sat, 14 Jan 2012 04:24:52 +0100 From: ?ric Araujo <merwok at netwok.org> To: <python-dev at python.org> Subject: Re: [Python-Dev] Sphinx version for Python 2.x docs Message-ID: <ff8dc5d4bd1c5d3583c3ff9c18e2445e at netwok.org> Content-Type: text/plain; charset=UTF-8; format=flowed Hi Sandro, Thanks for getting the ball rolling on this. One style for markup, one Sphinx version to code our extensions against and one location for the documenting guidelines will make our work a bit easier. During the build process, there are some warnings that I can understand: I assume you mean ?can?t?, as you later ask how to fix them. As a general rule, they?re only warnings, so they don?t break the build, only some links or stylings, so I think it?s okay to ignore them *right now*. Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal That?s a mistake I did in cefe4f38fa0e. This sentence should be removed. Doc/library/stdtypes.rst:2372: WARNING: more than one target found for cross-reference u'next': Need to use :meth:.next to let Sphinx find the right target (more info on request :) Doc/library/sys.rst:651: WARNING: unknown keyword: None Should use None. Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not I don?t know if these should work (i.e. create a link to the appropriate language reference section) or abuse the markup (there are ?not? and ?in? keywords, but no ?not in? keyword ? use not in). I?d say ignore them. Cheers ------------------------------ Message: 4 Date: Sat, 14 Jan 2012 04:45:57 +0100 From: martin at v.loewis.de To: python-dev at python.org Subject: Re: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA at webmail.df.eu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; DelSp=Yes What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. I think this statement (and the patch) is wrong. You also need to change the byte string hashing, at least for 2.x. This I consider the biggest flaw in that approach - other people may have written string-like objects which continue to compare equal to a string but now hash different. Regards, Martin ------------------------------ Message: 5 Date: Fri, 13 Jan 2012 20:00:54 -0800 From: Guido van Rossum <guido at python.org> To: "Gregory P. Smith" <greg at krypto.org> Cc: Antoine Pitrou <solipsis at pitrou.net>, python-dev at python.org Subject: Re: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: <CAP7+vJL+Qrz0oiqbLPCg3QxVqZLjbOeMQpeQykiidiGC2uN9FQ at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at krypto.org> wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We willprovide a flag and/or environment variable that can be set to turn the feature off at their own peril which they can use in their test harnesses that are stupid enough to use doctests with order dependencies. No, that is not how we usually take compatibility between bugfix releases. "Your code is already broken" is not an argument to break forcefully what worked (even if by happenstance) before. The difference between CPython and Jython (or between different CPython feature releases) also isn't relevant -- historically we have often bent over backwards to avoid changing behavior that was technically undefined, if we believed it would affect a significant fraction of users. I don't think anyone doubts that this will break lots of code (at least, the arguments I've heard have been "their code is broken", not "nobody does that"). This approach worked fine for Perl 9 years ago. https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 I don't know what the Perl attitude about breaking undefined behavior between micro versions was at the time. But ours is pretty clear -- don't do it. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120113/16511835/attachment-0001.html> ------------------------------ Message: 6 Date: Sat, 14 Jan 2012 15:16:32 +1000 From: Nick Coghlan <ncoghlan at gmail.com> To: python-dev at python.org Cc: python-checkins at python.org Subject: Re: [Python-Dev] [Python-checkins] cpython: add test, which was missing from d64ac9ab4cd0 Message-ID: <CADiSq7fcjLgkrjQEqBhb0oNu9eiLnHhovtoZRDzNSTDvjzx3ZQ at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson <python-checkins at python.org> wrote: http://hg.python.org/cpython/rev/be85914b611c changeset: ? 74363:be85914b611c parent: ? ? ?74361:609482c6710e user: ? ? ? ?Benjamin Peterson <benjamin at python.org> date: ? ? ? ?Fri Jan 13 14:39:38 2012 -0500 summary: ?add test, which was missing from d64ac9ab4cd0 Ah, that's where that came from, thanks. I still haven't fully trained myself to use hg import instead of patch, which would avoid precisely this kind of error :P Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia ------------------------------ Message: 7 Date: Sat, 14 Jan 2012 00:43:04 -0500 From: Terry Reedy <tjreedy at udel.edu> To: python-dev at python.org Subject: Re: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: <jer4lp$qe4$1 at dough.gmane.org> Content-Type: text/plain; charset=UTF-8; format=flowed On 1/13/2012 8:58 PM, Gregory P. Smith wrote: It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. Given that the doc says "Return the hash value of the object", I do not think we should be so hard-nosed. The above clearly implies that there is such a thing as the Python hash value for an object. And indeed, that has been true across many versions. If we had written "Return a hash value for the object, which can vary from run to run", the case would be different. -- Terry Jan Reedy ------------------------------ Message: 8 Date: Sat, 14 Jan 2012 01:24:54 -0500 From: Jack Diederich <jackdied at gmail.com> To: Guido van Rossum <guido at python.org> Cc: Python Dev <Python-Dev at python.org> Subject: Re: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: <CACLn2+3Z1EW8Rxox7Zif=20P2SDHxYhv+Wo6dhXKKnO09+-uxQ at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum <guido at python.org> wrote: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having 1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. Python's dicts are designed to avoid hash conflicts by resizing and keeping the available slots bountiful. 1000 conflicts sounds like a number that couldn't be hit accidentally unless you had a single dict using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're good). The hashes also look to exploit cache locality but that is very unlikely to get one thousand conflicts by chance. If you get that many there is an attack. This is depending on how the counting is done (I didn't look at MAL's patch), and assuming that increasing the hash table size will generally reduce collisions if items collide but their hashes are different. The patch counts conflicts on an individual insert and not lifetime conflicts. Looks sane to me. That said, even with collision counting I'd like a way to disable it without changing the code, e.g. a flag or environment variable. Agreed. Paranoid people can turn the behavior off and if it ever were to become a problem in practice we could point people to a solution. -Jack ------------------------------ Message: 9 Date: Sat, 14 Jan 2012 16:53:39 +1000 From: Nick Coghlan <ncoghlan at gmail.com> To: Georg Brandl <g.brandl at gmx.net> Cc: python-dev at python.org Subject: Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682) Message-ID: <CADiSq7dA6P8U3MiweM9=s-q49+y0KndeQX=ZNGWog-dZ-hzMA at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl <g.brandl at gmx.net> wrote: On 01/13/2012 12:43 PM, nick.coghlan wrote: diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst There should probably be a "versionadded" somewhere on this page. Good catch, I added versionchanged notes to this page, simplestmts and the StopIteration entry in the library reference. ?PEP 3155: Qualified name for classes and functions ?================================================== This looks like a spurious (and syntax-breaking) change. Yeah, it was an error I introduced last time I merged from default. Fixed. diff --git a/Grammar/Grammar b/Grammar/Grammar -argument: test [compfor] | test '=' test ?# Really [keyword '='] test +argument: (test) [compfor] | test '=' test ?# Really [keyword '='] test This looks like a change without effect? Fixed. It was a lingering after-effect of Greg's original patch (which also modified the function call syntax to allow "yield from" expressions with extra parens). I reverted the change to the function call syntax, but forgot to ditch the added parens while doing so. diff --git a/Include/genobject.h b/Include/genobject.h - ? ? /* List of weak reference. */ - ? ? PyObject *giweakreflist; + ? ? ? ?/* List of weak reference. */ + ? ? ? ?PyObject *giweakreflist; ?} PyGenObject; While these change tabs into spaces, it should be 4 spaces, not 8. Fixed. +PyAPIFUNC(int) PyGenFetchStopIterationValue(PyObject **); Does this API need to be public? If yes, it needs to be documented. Hmm, good point - that one needs a bit of thought, so I've put it on the tracker: http://bugs.python.org/issue13783 (that issue also covers your comments regarding the docstring for this function and whether or not we even need the StopIteration instance creation API) -#define CALLFUNCTION ? ? ? ?131 ? ? /* #args + (#kwargs<<8) */_ _-#define MAKEFUNCTION ? ? ? ?132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */_ _-#define BUILDSLICE ?133 ? ? /* Number of items */_ _+#define CALLFUNCTION ? 131 ? ? /* #args + (#kwargs<<8) */_ _+#define MAKEFUNCTION ? 132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */_ _+#define BUILDSLICE ? ? 133 ? ? /* Number of items */_ _Not sure putting these and all the other cosmetic changes into an already_ _big patch is such a good idea..._ _I agree, but it's one of the challenges of a long-lived branch like_ _the PEP 380 one (I believe some of these cosmetic changes started life_ _in Greg's original patch and separating them out would have been quite_ _a pain). Anyone that wants to see the gory details of the branch_ _history can take a look at my bitbucket repo:_ _https://bitbucket.org/ncoghlan/cpythonsandbox/changesets/tip/branch%28%22pep380%22%29_ _diff --git a/Objects/abstract.c b/Objects/abstract.c_ _--- a/Objects/abstract.c_ _+++ b/Objects/abstract.c_ _@@ -2267,7 +2267,6 @@_ _? ? ?func = PyObjectGetAttrString(o, name);_ _? ? ?if (func == NULL) {_ _- ? ? ? ?PyErrSetString(PyExcAttributeError, name);_ _? ? ? ? ?return 0;_ _? ? ?}_ _@@ -2311,7 +2310,6 @@_ _? ? ?func = PyObjectGetAttrString(o, name);_ _? ? ?if (func == NULL) {_ _- ? ? ? ?PyErrSetString(PyExcAttributeError, name);_ _? ? ? ? ?return 0;_ _? ? ?}_ _? ? ?vastart(va, format);_ _These two changes also look suspiciously unrelated?_ _IIRC, I removed those lines while working on the patch because the_ _message they produce (just the attribute name) is worse than the one_ _produced by the call to PyObjectGetAttrString (which also includes_ _the type of the object being accessed). Leaving the original_ _exceptions alone helped me track down some failures I was getting at_ _the time._ _I've now made the various CallMethod helper APIs in abstract.c (1_ _public, 3 private) consistently leave the GetAttr exception alone and_ _added an explicit C API note to NEWS._ _(Vaguely related tangent: the new code added by the patch probably has_ _a few parts that could benefit from the new GetAttrId private API)_ _diff --git a/Objects/genobject.c b/Objects/genobject.c_ _+ ? ? ? ?} else {_ _+ ? ? ? ? ? ?PyObject *e = PyStopIterationCreate(result);_ _+ ? ? ? ? ? ?if (e != NULL) {_ _+ ? ? ? ? ? ? ? ?PyErrSetObject(PyExcStopIteration, e);_ _+ ? ? ? ? ? ? ? ?PyDECREF(e);_ _+ ? ? ? ? ? ?}_ _Wouldn't PyErrSetObject(PyExcStopIteration, value) suffice here_ _anyway?_ _I think you're right - so noted in the tracker issue about the C API additions._ _Thanks for the thorough review, a fresh set of eyes is very helpful :)_ _Cheers,_ _Nick._ _--_ _Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia_ _------------------------------_ _Message: 10_ _Date: Sat, 14 Jan 2012 17:01:48 +1000_ _From: Nick Coghlan <ncoghlan at gmail.com> To: Jack Diederich <jackdied at gmail.com> Cc: Guido van Rossum <guido at python.org>, Python Dev <Python-Dev at python.org> Subject: Re: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: <CADiSq7cmNjM8mEEhktFjA5Ss+K0Z8uCF7tmMucn56dWOzVFUQ at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich <jackdied at gmail.com> wrote: This is depending on how the counting is done (I didn't look at MAL's patch), and assuming that increasing the hash table size will generally reduce collisions if items collide but their hashes are different. The patch counts conflicts on an individual insert and not lifetime conflicts. ?Looks sane to me. Having a hard limit on the worst-case behaviour certainly sounds like an attractive prospect. And there's nothing to worry about in terms of secrecy or sufficient randomness - by default, attackers cannot generate more than 1000 hash collisions in one lookup, period. That said, even with collision counting I'd like a way to disable it without changing the code, e.g. a flag or environment variable. Agreed. ?Paranoid people can turn the behavior off and if it ever were to become a problem in practice we could point people to a solution. Does MAL's patch allow the limit to be set on a per-dict basis (including setting it to None to disable collision limiting completely)? If people have data sets that need to tolerate that kind of collision level (and haven't already decided to move to a data structure other than the builtin dict), then it may make sense to allow them to remove the limit when using trusted input. For maintenance versions though, it would definitely need to be possible to switch it off without touching the code. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia ------------------------------

Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev

End of Python-Dev Digest, Vol 102, Issue 35 *******************************************

Previous message: [Python-Dev] Dinsdale is no more
Next message: [Python-Dev] Python-Dev Digest, Vol 102, Issue 35
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list