[Python-Dev] Shorter float repr in Python 3.1? (original) (raw)

Mark Dickinson dickinsm at gmail.com
Tue Apr 7 16:39:47 CEST 2009


Executive summary (details and discussion points below)

Some time ago, Noam Raphael pointed out that for a float x, repr(x) can often be much shorter than it currently is, without sacrificing the property that eval(repr(x)) == x, and proposed changing Python accordingly. See

http://bugs.python.org/issue1580

For example, instead of the current behaviour:

Python 3.1a2+ (py3k:71353:71354, Apr 7 2009, 12:55:16) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information.

0.01 0.01 0.02 0.02 0.03 0.029999999999999999 0.04 0.040000000000000001 0.04 == eval(repr(0.04)) True

we'd have this:

Python 3.1a2+ (py3k-short-float-repr:71350:71352M, Apr 7 2009, ) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information.

0.01 0.01 0.02 0.02 0.03 0.03 0.04 0.04 0.04 == eval(repr(0.04)) True

Initial attempts to implement this encountered various difficulties, and at some point Tim Peters pointed out (I'm paraphrasing horribly here) that one can't have all three of {fast, easy, correct}.

One PyCon 2009 sprint later, Eric Smith and I have produced the py3k-short-float-repr branch, which implements short repr of floats and also does some major cleaning up of the current float formatting functions. We've gone for the {fast, correct} pairing. We'd like to get this into Python 3.1.

Any thoughts/objections/counter-proposals/...?

More details

Our solution is based on an adaptation of David Gay's 'perfect rounding' code for inclusion in Python. To make eval(repr(x)) roundtripping work, one needs to have correctly rounded float -> decimal and decimal -> float conversions: Gay's code provides correctly rounded dtoa and strtod functions for these two conversions. His code is well-known and well-tested: it's used as the basis of the glibc strtod, and is also in OS X. It's available from

http://www.netlib.org/fp/dtoa.c

So our branch contains a new file Python/dtoa.c, which is a cut down version of Gay's original file. (We've removed stuff for VAX and IBM floating-point formats, hex NaNs, hex floating-point formats, locale-aware interpretation of the decimal separator, K&R headers, code for correct setting of the inexact flag, and various other bits and pieces that Python doesn't care about.)

Most of the rest of the work is in the existing file Python/pystrtod.c. Every float -> string or string -> float conversion goes through a function in this file at some point.

Gay's code also provides the opportunity to clean up the current float formatting code, and Eric has reworked a lot of the float formatting in the py3k-short-float-repr branch. This reworking should make finishing off the implementation of things like thousands separators much more straightforward.

One example of this: the previous string -> float conversion used the system strtod, which is locale-aware, so the code had to first replace the '.' by the current locale's decimal separator, then call strtod. There was a similar dance in the reverse direction when doing float -> string conversion. Both these are now unnecessary.

The current code is pretty close to ready for merging to py3k. I've uploaded a patchset to Rietveld:

http://codereview.appspot.com/33084/show

Apart from the short float repr, and a couple of bugfixes, all behaviour should be unchanged from before. There are a few exceptions:

Discussion points

(1) Any objections to including this into py3k? If there's controversy, then I guess we'll need a PEP.

(2) Should other Python implementations (Jython, IronPython, etc.) be expected to use short float repr, or should it just be considered an implementation detail of CPython? I propose the latter, except that all implementations should be required to satisfy eval(repr(x)) == x for finite floats x.

(3) There's a PEP 3101 line we don't know what to do with. In py3k, we currently have:

format(1e200, '<') '1.0e+200'

but in our py3k-short-float-repr branch:

format(1e200, '<') '1e+200'

Which is correct? The py3k behaviour comes from the 'Standard Format Specifiers' section of PEP 3101, where it says:

""" The available floating point presentation types are:

[... list of other format codes omitted here ...]

'' (None) - similar to 'g', except that it prints at least one digit after the decimal point. """

It's that 'at least one digit after the decimal point' bit that's at issue. I understood this to apply only to floats converted to a string without an exponent; this is the way that repr and str work, adding a .0 to floats formatted without an exponent, but leaving the .0 out when the exponent is present.

Should the .0 always be added? Or is it required only when it would be necessary to distinguish a float string from an integer string?

My preference is for the latter (i.e., format(x, '<') should behave in the same way as repr and str in this respect). But I'm biased, not least because the other behaviour would be a pain to implement. Does anyone care?

This email is already too long. I'll stop now.

Mark



More information about the Python-Dev mailing list