Rewrite of IBM doublebyte charsets (original) (raw)

Xueming Shen Xueming.Shen at Sun.COM
Thu May 21 19:52:35 UTC 2009

Previous message: Rewrite of IBM doublebyte charsets
Next message: Rewrite of IBM doublebyte charsets
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Ulf Zibis wrote:

Am 21.05.2009 01:48, Xueming Shen schrieb:

Thanks for the 5 minutes:-)

Your FindXYZcoderBugs tests are indeed very helpful to catch most of the "inconsistent" behaviors between different paths by feeding the "random" inputs. The TestIBMDB.java is diffing the behaviors of old implementation and new implementation with all "decode-able" bytes and "encode-able" chars...so it gives us some of the guarantee. Why do we try to stick on old behaviour in case of malformed and/or unmappable input, if we don't diff new against old ? Then we also could try, to treat malformed and/or unmappable input most accurate. As you mentioned, most users don't distinguish between those, so they won't be affected. On the other hand, user's, who did this distinction, would probably happy to return more accurate results, even if not identical to recent results. This is the approach/plan I decided to go with to achieve the goals I listed last time. Sticking with the old behavior for now make it easy, or say possible, to push in such a big change. You don't want to be stuck on this kind of "arguable" issues when it's not the main goal of the project, detour yourself to defend/argue whether or not this is the "correct" change, if it's correct, then is this the right thing to do to break the compatibility, is there people depend on them. If you just start a new implementation, you definitely should do all the "right" things. It is a different story when you maintenance some existing products. As I said last time, with this change, the implementation, the data structure are now real open and ready for further optimization (instead of looking at a big chunk of data without knowledge where they come from), you can now work on the issue, if any, one by one, including starting the argument of which error should be "malformed" and which one should "unmapped". We're (I'm) 60% done after this:-)

Previous message: Rewrite of IBM doublebyte charsets
Next message: Rewrite of IBM doublebyte charsets
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the core-libs-dev mailing list