[Python-Dev] a different approach to argument parsing (original) (raw)

Russ Cox rsc@plan9.bell-labs.com
Tue, 12 Feb 2002 03:44:19 -0500


[Hi. I'm responsible for the Plan 9 port of Python; I typically just lurk here.]

Regarding the argument parsing discussion, it seems like many of the "features" of the various argument parsing packages are aimed at the fact that in C (whence this all originated) the original getopt interface wasn't so great. To use getopt, you end up specifying the argument set twice: once to the parser and then once when processing the list of returned results. Packages like Optik make this a little better by letting you wrap up the actual processing in some form and hand that to the parser too. Still, you have to wrap up your argument parsing into little actions; the getopt style processing loop is usually a bit clearer. Ultimately, I find getopt unsatisfactory because of the duplication; and I find Optik and other similar packages unsatisfactory because of the contortions you have to go through to invoke them. I don't mean to pick on Optik, since many others appear to behave in similar ways, but it seems to be the yardstick. For concreteness, I'd much rather write:

if o=='-n' or o=='--num':
    ncopies = opt.optarg(opt.needtype(int))

than:

parser.add_option("-n", "--num", action="store", type="int", dest="ncopies")

The second strikes me as clumsy at best.

The Plan 9 argument parser (for C) avoids these problems by making the parser itself small enough to be a collection of preprocessor macros. Although the implementation is ugly, the external interface that programmers see is trivial. A modified version of the example at http://optik.sourceforge.net would be rendered:

char *usagemessage = 
"usage: example [-f FILE] [-h] [-q] who where\n"
"\n"
"    -h            show this help message\n"
"    -f FILE       write report to FILE\n"
"    -q            don't print status messages to stdout\n";

void
usage(void)
{
    write(2, usagemessage, strlen(usagemessage));
    exits("usage");
}

void
main(int argc, char **argv)
{
    ...
    ARGBEGIN{
    case 'f':
        report = EARGF(usage());
        break;
    case 'q':
        verbose = 0;
        break;
    case 'h':
    default:
        usage();
    }ARGEND
    if(argc != 2)
        usage();
    ...

[This is documented at http://plan9.bell-labs.com/magic/man2html/2/ARGBEGIN, for anyone who is curious.]

Notice that the argument parsing machinery only gets the argument parameters in one place, and is kept so simple because it is driven by what happens in the actions: if I run "example -frsc" and the f option case doesn't call EARGF() to fetch the "rsc", the next iteration through the loop will be for option 'r'; a priori there's no way to tell.

Now that Python has generators, it is easy to do a similar sort of thing, so that the argument parsing can be kept very simple. The running example would be written using the attached argument parser as:

usagemessage=\
'''usage: example.py [-h] [-f FILE] [-n N] [-q] who where
    -h, --help                  show this help message
    -f FILE, --file=FILE        write report to FILE
    -n N, --num=N               print N copies of the report
    -q, --quiet                 don't print status messages to stdout
'''

def main():
    opt = OptionParser(usage=usagemessage)
    report = 'default.file'
    ncopies = 1
    verbose = 1
    for o in opt:
        if o=='-f' or o=='--file':
            report = opt.optarg()
        elif o=='-n' or o=='--num':
            ncopies = opt.optarg(opt.typecast(int, 'integer'))
        elif o=='-q' or o=='--quiet':
            verbose = 0
        else:
            opt.error('unknown option '+o)
    if len(opt.args()) != 2:
        opt.error('incorrect argument count')
    print 'report=%s, ncopies=%s verbose=%s' % (report, ncopies, verbose)
    print 'arguments: ', opt.args()

It's fairly clear what's going on, and the option parser itself is very simple too. While it may not have all the bells and whistles that some packages do, I think it's simplicity makes most of them irrelevant. It or something like it might be the right approach to take to present a simpler interface.

The simplicity of the interface has the benefit that users (potentially anyone who writes a Python program) don't have to learn a lot of stuff to parse their command-line arguments. Suppose I want to write a program with an option that takes two arguments instead of one. Given the Optik-style example it's not at all clear how to do this. Given the above example, there's one obvious thing to try: call opt.optarg() twice. That sort of thing.

Addressing the benchmark set by Optik:

[1]

* it ties short options and long options together, so once you define your options you never have to worry about the fact that -f and --file are the same

Here the code does that for you, and if you want to use some other convention, you're not tied to anything. (You do have to tie -f and --file in the usage message too, see answer to [3].)

[2]

* it's strongly typed: if you say option --foo expects an int, then Optik makes sure the user supplied a string that can be int()'ified, and supplies that int to you

There are plenty of ways you could consider adding this. The easiest is what I did in the example. The optarg argument fetcher takes a function to transform the argument before returning. Here, our function calls opt.error() if the argument cannot be converted to an int. The added bells and whistles that Optik adds (choice sets, etc.) can be added in this manner as well, as external functions that the parser doesn't care about, or as internally-supplied helper functions that the user can call if he wants.

[3]

* it automatically generates full help based on snippets of help text you supply with each option

This is the one shortcoming: you have to write the usage message yourself. I feel that the benefit of having much clearer argument parsing makes it worth bearing this burden. Also, tools like Optik have to work fairly hard to present the usage message in a reasonable manner, and if it doesn't do what you want you either have to write extension code or just write your own usage message anyway. I'd rather give this up and get the rest of the benefits.

[4]

* it has a wide range of "actions" -- ie. what to do with the value supplied with each option. Eg. you can store that value in a variable, append it to a list, pass it to an arbitrary callback function, etc.

Here the code provides the widest possible range of actions: you run arbitrary code for each option, and it's all in once place rather than scattered.

[5]

* you can add new types and actions by subclassing -- how to do this is documented and tested

The need for new actions is obviated by not having actions at all.

The need for new types could be addressed by the argument transformer, although I'm not really happy with that and wouldn't mind seeing it go away. In particular,

ncopies = opt.optarg(opt.typecast(int, 'integer'))

seems a bit more convoluted and slightly ad hoc compared to the straightforward:

try:
    ncopies = int(opt.optarg())
except ValueError:
    opt.error(opt.curopt+' requires an integer argument')

especially when the requirements get complicated, like the integer has to be prime. Perhaps a hybrid is best, using a collection of standard transformers for the common cases and falling back on actual code for the tough ones.

[6]

* it's dead easy to implement simple, straightforward, GNU/POSIX- style command-line options, but using callbacks you can be as insanely flexible as you like

Here, ditto, except you don't have to use callbacks in order to be as insanely flexible as you like.

[7]

* provides lots of mechanism and only a tiny bit of policy (namely, the --help and (optionally) --version options -- and you can trash that convention if you're determined to be anti-social)

In this version there is very little mechanism (no need for lots), and no policy. It would be easy enough to add the --help and --version hacks as a standard subclass.

Anyhow, there it is. I've attached the code for the parser, which I just whipped up tonight. If people think this is a promising thing to explore and someone else wants to take over exploring, great. If yes promising but no takers, I'm willing to keep at it.

Russ

--- opt.py from future import generators import sys, copy

class OptionError(Exception): pass

class OptionParser: def init(self, argv=sys.argv, usage=None): self.argv0 = argv[0] self.argv = argv[1:] self.usage = usage

def __iter__(self):
    # this assumes the "
    while self.argv:
        if self.argv[0]=='-' or self.argv[0][0]!='-':
            break
        a = self.argv.pop(0)
        if a=='--':
            break
        if a[0:2]=='--':
            i = a.find('=')
            if i==-1:
                self.curopt = a
                yield self.curopt
                self.curopt = None
            else:
                self.curarg = a[i+1:]
                self.curopt = a[0:i]
                yield self.curopt
                if self.curarg:		# wasn't fetched with optarg
                    self.error(self.curopt+' does not take an argument')
                self.curopt = None
            continue
        self.curarg = a[1:]
        while self.curarg:
            a = self.curarg[0:1]
            self.curarg = self.curarg[1:]
            self.curopt = '-'+a
            yield self.curopt
            self.curopt = None

def optarg(self, fn=lambda x:x):
    if self.curarg:
        ret = self.curarg
        self.curarg=''
    else:
        try:
            ret = self.argv.pop(0)
        except IndexError:
            self.error(self.curopt+' requires argument')
    return fn(ret)

def _typecast(self, t, x, desc=None):
    try:
        return t(x)
    except ValueError:
        d = desc
        if d == None:
            d = str(t)
        self.error(self.curopt+' requires '+d+' argument')

def typecast(self, t, desc=None):
    return lambda x: self._typecast(t, x, desc)

def args(self):
    return self.argv

def error(self, msg):
    if self.usage != None:
        sys.stderr.write('option error: '+msg+'\n\n'+self.usage)
        sys.stderr.flush()
        sys.exit(0)
    else:
        raise OptionError(), msg

########

import sys

usagemessage=
'''usage: example.py [-h] [-f FILE] [-n N] [-q] who where -h, --help show this help message -f FILE, --file=FILE write report to FILE -n N, --num=N print N copies of the report -q, --quiet don't print status messages to stdout '''

def main(): opt = OptionParser(usage=usagemessage) report = 'default.file' ncopies = 1 verbose = 1 for o in opt: if o=='-f' or o=='--file': report = opt.optarg() elif o=='-n' or o=='--num': ncopies = opt.optarg(opt.typecast(int, 'integer')) elif o=='-q' or o=='--quiet': verbose = 0 else: opt.error('unknown option '+o) if len(opt.args()) != 2: opt.error('incorrect argument count') print 'report=%s, ncopies=%s verbose=%s' % (report, ncopies, verbose) print 'arguments: ', opt.args()

if name=='main': main()