[Python-Dev] Performance of various marshallers (original) (raw)

Skip Montanaro skip@pobox.com (Skip Montanaro)
Tue, 2 Oct 2001 19:33:59 -0500


>> That's precisely why py-xmlrpc is faster.  Should it behave some
>> other way?  I don't think there is another XML-RPC parser out there
>> that is available from Python but that doesn't use Python.

Paul> Okay, so we agree that the fast part is probably not so much the
Paul> parser but the handing of data to Python. So why rewrite a parser?
Paul> Nothing requires an Expat-using XML-RPC implementation to call
Paul> back into Python for every element. It can collect the results in
Paul> C and then call Python when it has values.

You're asking the wrong person. Shilad will be the only person who can describe his motivations. We happen to work in the same building, but we don't work for the same company. That's a coincidence about on par with the chances of winning the Powerball lottery. We never met each other formally until about a week ago. Not trying to put words in his mouth, but my guess would be that he was not approaching it as an XML problem, but as a parsing problem.

>> I don't understand see how you can't make that connection.  XML-RPC
>> has a fixed vocabulary and never needs to look at intermediate
>> results.

Paul> Let me suggest an analogy. Someone writes "CGIPython". It uses a
Paul> specially optimized parser designed for parsing only Python CGI
Paul> scripts.  Do you think it would run much faster than the regular
Paul> Python parser?

Bad analogy. CGI scripts can contain the entire realm of "stuff" that goes into any other Python program. XML-RPC encodings can't contain arbitrary XML tags or attributes. A better analogy would have been (Martin's I think) hypothetical Swallow - a subset of Python that could be efficiently compiled.

Paul> I don't personally see much benefit using XML if you don't adhere
Paul> to the XML spec.  Just perusing the code quickly I believe I've
Paul> found a few bugs that it would not have had if it built on Expat
Paul> or some other XML parser.

Paul, you have to stop looking at XML-RPC with your Elton John-style XML-colored glasses. XML-RPC is not meant to be some sort of highly structured hierarchical data representation that you can sniff around in with arbitrary XML tools of one sort or another. That its on-the-wire representation happens to be XML is almost ridiculously unimportant. Dave Winer created an RPC tool that used XML at about the same time every computer journalist was wetting their pants every time they heard the letters X-M-L. Many implementations were able to leverage existing XML parsing tools to get going quickly, and Dave got some well-deserved publicity that he and XML-RPC wouldn't have gotten if he'd chosen some other serliazation format like Pickle, or invented something new. Next step: make it go faster. Can that be done with standard XML tools? Yeah, I'm sure it can be. Not everybody approaches the problem with the same background you have though.

Paul>  1. It doesn't handle ? syntax.

Paul>  2. It doesn't handle <methodCall > (extra whitespace)

Paul>  3. I strongly suspect it won't handle comments in the XML.

Paul>  4. It won't handle the mandatory UTF-16 encoding from XML

Paul>  5. It won't handle CDATA sections.

Fine. I'm sure Shilad appreciates the input. I think your approach to bug detection and reporting could have been a bit less heavy handed.

As for handling things like CDATA, UTF-16 and extra whitespace after tag names, I suspect some other XML-RPC packages would exhibit similar problems if they were exposed to a standards-toting XML gunslinger like yourself. That it's not a problem in practice is probably because the set of XML-RPC encoding and decoding software is fairly small and that the stuff that encodes into XML-RPC is fairly well-behaved.

XML-RPC's widespread availability and practical interoperability (the XML-RPC website lists 48 implementations) probably owes more to the cooperative nature of the people involved than the purity of the parsers.

Skip