Issue 532180: fix xmlrpclib float marshalling bug (original) (raw)

Issue532180

Created on 2002-03-19 22:28 by bquinlan, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
xmlrpc-float.diff	bquinlan,2002-03-19 22:28	xmlrpclib patch
xmlrpc-float2.diff	bquinlan,2002-03-20 20:48	Revised double marshalling code
Test-double.py	bquinlan,2002-03-20 20:48	Test suite for double marshalling code

Messages (17)
msg39277 - (view)	Author: Brian Quinlan (bquinlan) *	Date: 2002-03-19 22:28
As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead.
msg39278 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-03-20 07:28
Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation.
msg39279 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 15:02
Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n).
msg39280 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2002-03-20 16:03
Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression.
msg39281 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 16:23
Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do?
msg39282 - (view)	Author: Brian Quinlan (bquinlan) *	Date: 2002-03-20 17:31
Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch.
msg39283 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 17:53
Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane .
msg39284 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 18:08
Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" intended to outlaw exponent notation.
msg39285 - (view)	Author: Brian Quinlan (bquinlan) *	Date: 2002-03-20 18:57
Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format.
msg39286 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 19:04
Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format").
msg39287 - (view)	Author: Brian Quinlan (bquinlan) *	Date: 2002-03-20 19:32
Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case.
msg39288 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 20:13
Logged In: YES user_id=31435 If you think XML-RPC users are keen to see multi-hundred character strings produced for ordinary doubles, Python isn't going to be much help (you'll have to write your own float -> string conversion); or if you think they're happy to get an exception if they want to pass (e.g.) 1e20, you can keep using repr() and complain because repr(1e20) produces an exponent. "decimal format" is simply two extremely common words pasted together <+.9 wink>. I expect the Python docs here ended up so vague because whoever wrote this part of the docs didn't know the full story and didn't have time to figure it out. But I expect the same is true of the part of this spec dealing with doubles (it doesn't define what it means by "double-precision", and then goes on to say stuff that doesn't make sense for what C or Java mean by double, or by what IEEE-754 means by double precision -- it's off in its own world, so if you take it at face value you'll have to guess what the world is, and implement it yourself).
msg39289 - (view)	Author: Brian Quinlan (bquinlan) *	Date: 2002-03-20 20:48
Logged In: YES user_id=108973 Ooops, I already wrote the converter (see new patch). I'm not very concerned about sending 300 character strings for large doubles, but I guess someone might be. I am concerned about how large and ugly the code is. XML-RPC is very poorly specified but the grammar for doubles seems reasonably clear (silly, but clear). If you don't like my double marshalling code, you could please just checkin your infinity/NaN detection code (also part of my patch)?
msg39290 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 22:53
Logged In: YES user_id=31435 I don't use XML-RPC, so I'm assigning this to /F (it was his code at the start, and he wants to keep it in synch with his company's version). Formatting floats is a difficult job if you pay attention to accuracy. The original code had the property that converting a Python float to an XML-RPC string, then back to a float again, reproduced the original input exactly. The code in the patch enjoys that property only by accident; much of the time a roundtrip conversion using it won't reproduce the number that was passed in. Is that OK? There's no way to tell, since the XML-RPC spec has scant idea what it's doing here, so leaves important questions unanswered. OTOH, it seems to me that the point of this porotocol is to transport values across boxes, so of course it should move heaven and earth to transport them faithfully. Is it OK that it loses accuracy? Is it OK that it produces 16 trailing zeroes for 1e-250? Is it OK that it raises OverflowError for the normal double 1e-300? No matter what's asked, the spec has no answers.
msg39291 - (view)	Author: Brian Quinlan (bquinlan) *	Date: 2002-03-20 23:24
Logged In: YES user_id=108973 OK, this floating point stuff is over my head. Is it OK that it loses accuracy? - No Is it OK that it produces 16 trailing zeroes for 1e-250? - Yes Is it OK that it raises OverflowError for the normal double 1e-300? - No Would exposing and using the C %f specifier, along with repr, make for identical roundtrips?
msg39292 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-03-20 23:55
Logged In: YES user_id=31435 Python's internal format buffers are too small to use C %f in its full generality, so you're suggesting something there that's much harder to get done than you suspect. Note that %f isn't a cureall anyway, as in either Python or C, e.g., '%f' % 1e-10 throws away all information, producing a string of zeroes. What you did is usually much better than that. Let's wait to hear what /F wants to do. If he's inclined to take this part of the spec at face value, I can work with him to write a "conforming" float->string that's numerically sound. Else it's a lot of tedious work for no reason.
msg39293 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-03-28 23:25
Logged In: YES user_id=21627 I'll conclude that it is a lot of tedious work for no reason, and close this patch.

History
Date	User	Action	Args
2022-04-10 16:05:07	admin	set	github: 36287
2002-03-19 22:28:46	bquinlan	create