Issue 10021: Format parser is too permissive (original) (raw)

Issue10021

Created on 2010-10-04 16:24 by belopolsky, last changed 2022-04-11 14:57 by admin.

Messages (11)
msg117961 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-04 16:24
According to the Format String Syntax section [1], attribute_name must be an identifier. However, the parser does not catch a violation of this rule and happily passes non-indentifier strings to getattribute: >>> class X: ... def __getattribute__(self, a): return 'foo' ... >>> '{.$#@}'.format(X()) 'foo' If this is a desirable feature, I think it should be clearly documented because in some cases, for example when formatted objects are proxies to database entries, passing arbitrary strings to __getattribute__ may be wasteful at best and a security hole at worst. [1] http://docs.python.org/dev/py3k/library/string.html#format-string-syntax
msg117964 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-04 16:38
PEP 3101 has the following """ Implementation note: The implementation of this proposal is not required to enforce the rule about a simple or dotted name being a valid Python identifier. Instead, it will rely on the getattr function of the underlying object to throw an exception if the identifier is not legal. The str.format() function will have a minimalist parser which only attempts to figure out when it is "done" with an identifier (by finding a '.' or a ']', or '}', etc.). """ Apparently CPython takes advantage of this note in its implementation. Thus this is not a bug, but I think this implementation note should be added to CPython documentation.
msg117965 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-04 16:54
Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through. I don't see this as any different from: >>> class X: ... def __getattribute__(self, a): return 'foo' ... >>> getattr(X(), '$#@') 'foo'
msg117966 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-04 16:58
2010/10/4 Eric Smith <report@bugs.python.org>: > > Eric Smith <eric@trueblade.com> added the comment: > > Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through. You can always use "str.isidentifier()" (I don't remember if there's a capi).
msg117967 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-04 17:02
Ah, but I don't need to in order to comply with the PEP!
msg117969 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-04 17:10
On Mon, Oct 4, 2010 at 1:02 PM, Eric Smith <report@bugs.python.org> wrote: .. > Ah, but I don't need to in order to comply with the PEP! This is true and this is the reason I changed this issue from bug to doc. I seem to remember this having been discussed before, but I cannot find the right thread. There are at least two reasons cpython docs should mention this: 1. From current documentation, users are likely to expect a value error from format(".$#@", ..) rather than an attribute error. 2. Naive proxy objects may implement __getattribute__ that blindly inserts attribute name into database queries leading to all kinds of undesired behaviors.
msg117971 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-04 17:37
I agree it should be documented as a CPython specific behavior. I should also add a CPython specific test for it, modeled on your code (if one doesn't already exist). I'll look into it.
msg117992 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-10-05 07:21
> I seem to remember this having been discussed before, but I cannot find the right thread. It came up in the issue 7951 discussion, I think.
msg118009 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010-10-05 13:57
This should not be classified as an "implementation detail". Either we should document it and cause other implementations to support it or check it ourselves.
msg118011 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-10-05 14:32
I agree that it being an implementation detail is not a good thing. I think we should just document the current CPython behavior as the language standard: once parsed, any string after a dot is passed to getattr. I can't see why we should pay the penalty of validating it as an identifier when the behavior is well defined and matches my getattr example in msg 117965.
msg118232 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-10-08 22:55
This is a bug report in that there is a discrepancy between the grammar in the doc and the behavior. Laxiness can lead to portability problems if CPython is lax compared to a normal reading of the spec and another implementation takes the spec seriously. I agree that implementation details that lead to an exception here and not there, or vice versa, are best avoided. For getattr: ''' getattr(object, name[, default]) Return the value of the named attributed of object. name must be a string. ''' the doc is careful to just say that name must be a string, not specifically an identifier. Given that, I suppose "attribute_name ::= identifier" should be changed to match so that string formats can always (all implementations) also access non-identifier attributes.
History
Date User Action Args
2022-04-11 14:57:07 admin set github: 54230
2015-10-02 21:38:35 belopolsky set stage: needs patchversions: + Python 3.6, - Python 3.2
2010-10-08 22:55:18 terry.reedy set nosy: + terry.reedymessages: +
2010-10-05 14:32:45 eric.smith set messages: +
2010-10-05 13:57:46 benjamin.peterson set messages: +
2010-10-05 07:21:54 mark.dickinson set messages: +
2010-10-04 17:37:26 eric.smith set messages: +
2010-10-04 17:10:32 belopolsky set messages: +
2010-10-04 17:02:57 eric.smith set messages: +
2010-10-04 16:58:45 benjamin.peterson set nosy: + benjamin.petersonmessages: +
2010-10-04 16:54:41 eric.smith set messages: +
2010-10-04 16:46:10 mark.dickinson set nosy: + mark.dickinson, eric.smith
2010-10-04 16:39:21 belopolsky set assignee: docs@pythoncomponents: + Documentation, - Interpreter Corenosy: + docs@python
2010-10-04 16:38:10 belopolsky set messages: +
2010-10-04 16:29:00 eric.araujo set nosy: + eric.araujo
2010-10-04 16:24:04 belopolsky create