[Python-Dev] cpython: replace custom validation logic in the parse module with a simple DFA validator (original) (raw)

Benjamin Peterson benjamin at python.org
Mon Jun 6 02:51:54 EDT 2016


On Sat, Jun 4, 2016, at 17:26, Christian Heimes wrote:

On 2016-06-02 11:32, benjamin.peterson wrote: > https://hg.python.org/cpython/rev/4a9159ea2536 > changeset: 101601:4a9159ea2536 > user: Benjamin Peterson <benjamin at python.org> > date: Thu Jun 02 11:30:18 2016 -0700 > summary: > replace custom validation logic in the parse module with a simple DFA validator (closes #26526) > > Patch from A. Skrobov. > > files: > Misc/NEWS | 3 + > Modules/parsermodule.c | 2545 +-------------------------- > 2 files changed, 96 insertions(+), 2452 deletions(-) > > > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -22,6 +22,9 @@ > Library > ------- > > +- Issue #26526: Replace custom parse tree validation in the parser > + module with a simple DFA validator. > + > - Issue #27114: Fix SSLContext.loadwindowsstorecerts fails with > PermissionError > > diff --git a/Modules/parsermodule.c b/Modules/parsermodule.c > --- a/Modules/parsermodule.c > +++ b/Modules/parsermodule.c > @@ -670,9 +670,75 @@ > > > static node* buildnodetree(PyObject *tuple); > -static int validateexprtree(node *tree); > -static int validatefileinput(node *tree); > -static int validateencodingdecl(node *tree); > + > +static int > +validatenode(node *tree) > +{ > + int type = TYPE(tree); > + int nch = NCH(tree); > + dfa *ntdfa; > + state *dfastate; > + int pos, arc; > + > + assert(ISNONTERMINAL(type)); > + type -= NTOFFSET; > + if (type >= PyParserGrammar.gndfas) { > + PyErrFormat(parsererror, "Unrecognized node type %d.", TYPE(tree)); > + return 0; > + } > + ntdfa = &PyParserGrammar.gdfa[type]; > + REQ(tree, ntdfa->dtype); > + > + /* Run the DFA for this nonterminal. */ > + dfastate = &ntdfa->dstate[ntdfa->dinitial]; > + for (pos = 0; pos < nch; ++pos) {_ _> + node *ch = CHILD(tree, pos); > + int chtype = TYPE(ch); > + for (arc = 0; arc < dfastate->snarcs; ++arc) { > + short alabel = dfastate->sarc[arc].albl; > + assert(alabel < PyParserGrammar.gll.llnlabels);_ _> + if (PyParserGrammar.gll.lllabel[alabel].lbtype == chtype) { > + /* The child is acceptable; if non-terminal, validate it recursively. */ > + if (ISNONTERMINAL(chtype) && !validatenode(ch)) > + return 0; > + > + /* Update the state, and move on to the next child. */ > + dfastate = &ntdfa->dstate[dfastate->sarc[arc].aarrow]; > + goto arcfound; > + } > + } > + /* What would this state have accepted? */ > + { > + short alabel = dfastate->sarc->albl; > + int nexttype; > + if (!alabel) /* Wouldn't accept any more children */ > + goto illegalnumchildren; > + > + nexttype = PyParserGrammar.gll.lllabel[alabel].lbtype; > + if (ISNONTERMINAL(nexttype)) > + PyErrFormat(parsererror, "Expected node type %d, got %d.", > + nexttype, chtype); > + else > + PyErrFormat(parsererror, "Illegal terminal: expected %s.", > + PyParserTokenNames[nexttype]);

Coverity doesn't that line: CID 1362505 (#1 of 1): Out-of-bounds read (OVERRUN) 20. overrun-local: Overrunning array PyParserTokenNames of 58 8-byte elements at element index 255 (byte offset 2040) using index nexttype (which evaluates to 255).

I don't think this can cause a problem because it doesn't ever come from user-provided input.



More information about the Python-Dev mailing list