Issue 10021: Format parser is too permissive (original) (raw)
Issue10021
Created on 2010-10-04 16:24 by belopolsky, last changed 2022-04-11 14:57 by admin.
Messages (11) | ||
---|---|---|
msg117961 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-10-04 16:24 |
According to the Format String Syntax section [1], attribute_name must be an identifier. However, the parser does not catch a violation of this rule and happily passes non-indentifier strings to getattribute: >>> class X: ... def __getattribute__(self, a): return 'foo' ... >>> '{.$#@}'.format(X()) 'foo' If this is a desirable feature, I think it should be clearly documented because in some cases, for example when formatted objects are proxies to database entries, passing arbitrary strings to __getattribute__ may be wasteful at best and a security hole at worst. [1] http://docs.python.org/dev/py3k/library/string.html#format-string-syntax | ||
msg117964 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-10-04 16:38 |
PEP 3101 has the following """ Implementation note: The implementation of this proposal is not required to enforce the rule about a simple or dotted name being a valid Python identifier. Instead, it will rely on the getattr function of the underlying object to throw an exception if the identifier is not legal. The str.format() function will have a minimalist parser which only attempts to figure out when it is "done" with an identifier (by finding a '.' or a ']', or '}', etc.). """ Apparently CPython takes advantage of this note in its implementation. Thus this is not a bug, but I think this implementation note should be added to CPython documentation. | ||
msg117965 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2010-10-04 16:54 |
Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through. I don't see this as any different from: >>> class X: ... def __getattribute__(self, a): return 'foo' ... >>> getattr(X(), '$#@') 'foo' | ||
msg117966 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2010-10-04 16:58 |
2010/10/4 Eric Smith <report@bugs.python.org>: > > Eric Smith <eric@trueblade.com> added the comment: > > Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through. You can always use "str.isidentifier()" (I don't remember if there's a capi). | ||
msg117967 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2010-10-04 17:02 |
Ah, but I don't need to in order to comply with the PEP! | ||
msg117969 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-10-04 17:10 |
On Mon, Oct 4, 2010 at 1:02 PM, Eric Smith <report@bugs.python.org> wrote: .. > Ah, but I don't need to in order to comply with the PEP! This is true and this is the reason I changed this issue from bug to doc. I seem to remember this having been discussed before, but I cannot find the right thread. There are at least two reasons cpython docs should mention this: 1. From current documentation, users are likely to expect a value error from format(".$#@", ..) rather than an attribute error. 2. Naive proxy objects may implement __getattribute__ that blindly inserts attribute name into database queries leading to all kinds of undesired behaviors. | ||
msg117971 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2010-10-04 17:37 |
I agree it should be documented as a CPython specific behavior. I should also add a CPython specific test for it, modeled on your code (if one doesn't already exist). I'll look into it. | ||
msg117992 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2010-10-05 07:21 |
> I seem to remember this having been discussed before, but I cannot find the right thread. It came up in the issue 7951 discussion, I think. | ||
msg118009 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2010-10-05 13:57 |
This should not be classified as an "implementation detail". Either we should document it and cause other implementations to support it or check it ourselves. | ||
msg118011 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2010-10-05 14:32 |
I agree that it being an implementation detail is not a good thing. I think we should just document the current CPython behavior as the language standard: once parsed, any string after a dot is passed to getattr. I can't see why we should pay the penalty of validating it as an identifier when the behavior is well defined and matches my getattr example in msg 117965. | ||
msg118232 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2010-10-08 22:55 |
This is a bug report in that there is a discrepancy between the grammar in the doc and the behavior. Laxiness can lead to portability problems if CPython is lax compared to a normal reading of the spec and another implementation takes the spec seriously. I agree that implementation details that lead to an exception here and not there, or vice versa, are best avoided. For getattr: ''' getattr(object, name[, default]) Return the value of the named attributed of object. name must be a string. ''' the doc is careful to just say that name must be a string, not specifically an identifier. Given that, I suppose "attribute_name ::= identifier" should be changed to match so that string formats can always (all implementations) also access non-identifier attributes. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:07 | admin | set | github: 54230 |
2015-10-02 21:38:35 | belopolsky | set | stage: needs patchversions: + Python 3.6, - Python 3.2 |
2010-10-08 22:55:18 | terry.reedy | set | nosy: + terry.reedymessages: + |
2010-10-05 14:32:45 | eric.smith | set | messages: + |
2010-10-05 13:57:46 | benjamin.peterson | set | messages: + |
2010-10-05 07:21:54 | mark.dickinson | set | messages: + |
2010-10-04 17:37:26 | eric.smith | set | messages: + |
2010-10-04 17:10:32 | belopolsky | set | messages: + |
2010-10-04 17:02:57 | eric.smith | set | messages: + |
2010-10-04 16:58:45 | benjamin.peterson | set | nosy: + benjamin.petersonmessages: + |
2010-10-04 16:54:41 | eric.smith | set | messages: + |
2010-10-04 16:46:10 | mark.dickinson | set | nosy: + mark.dickinson, eric.smith |
2010-10-04 16:39:21 | belopolsky | set | assignee: docs@pythoncomponents: + Documentation, - Interpreter Corenosy: + docs@python |
2010-10-04 16:38:10 | belopolsky | set | messages: + |
2010-10-04 16:29:00 | eric.araujo | set | nosy: + eric.araujo |
2010-10-04 16:24:04 | belopolsky | create |