RFR for 6443578: Continuation lines in JAR manifests do not follow RFC-822 (original) (raw)
Philipp Kunz philipp.kunz at paratix.ch
Fri May 4 08:25:18 UTC 2018
- Previous message: RFR for 6202130: java.util.jar.Attributes.writeMain() can't handle multi-byte chars
- Next message: RFR for 6443578 and 6202130: UTF-8 in Manifests / interesting impl and use of deprecated code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Sherman,
Thanks again for pointing out that fewer changes are sufficient. I prepared another patch for 6443578.
http://files.paratix.ch/jdk/6443578/webrev.02/ http://files.paratix.ch/jdk/6443578/webrev.02.zip
The approach is basically, not to put a line break where a utf character continuation byte follows which can be tested with a bit mask.
For telling first from continuation bytes of utf characters I re-used a method isNotUtfContinuationByte from either StringCoding or UTF_8.Decoder. Unfortunately I found no way not to duplicate it which I consider subject of already known bug 6184334. In case this patch is accepted, I suggest to add a comment to 6184334 that refers to the new copy in Manifest#make72Safe.
LineBrokenMultiByteCharacter should not be removed or not so immediately at least because someone might attempt to sign an older jarfile created without that patch with a newer jarsigner that already contains it if the jar previously contained digests presumably from another signature.
Looking forward to any feedback and with kind regards, Philipp
On 03.05.2018 08:56, Xueming Shen wrote: > Philipp, >> I kinda recalled JDK-6202130 which I read long time ago and I believed > it's not a bug but > just wanted to leave it open and for double check, then it's off the > priority list somehow... >> Reread the code below I think it handles utf8 and the 72-length-limit > appropriately, though > a little tricky, >> (1) value.getByte("utf8") + deprecated String(byte[], int, int, int); > which creates a String that each "char" inside that String object > actually represents one byte > value of the resulted utf8 byte sequence, with its utf16 value = > [hi-byte= 0 and low-byte="utf8-byte"] >> (2) append this String to the StringBuffer (--> StringBuilder), now > the sb.length() actually is the > length of the utf8 byte sequence, make72Safe(...) is used to > adjust the length to 72 working > on "chars" inside the buffer. >> (3) write out the adjusted buffer via DataOutputStream.writeBytes() > in which the "write" cuts off the hi-byte of that utf16, so you > actually get the original > utf-8 byte sequence output to the stream. >> Sure the implementation looks "interesting" and uses deprecated String > constructor, but it was > written 2 decades ago. >> Back then I think we were thinking maybe the fix is to simply remove > the "misleading" comment > line below in the source code, if the implementation actually supports > utf-8 + 72-limit. >> " * XXX Need to handle UTF8 values and break up lines longer than > 72 bytes" >> Any reason/test case to believe the mechanism does not work? >> BTW the logic of line below in writeMain() appears to be incorrect. I > do have another bug for it. > " if ((version != null) && > !(name.equalsIgnoreCase(vername))) {" >> ---------------------- >> StringBuffer buffer = new StringBuffer(name); > buffer.append(": "); >> String value = (String) e.getValue(); > if (value != null) { > byte[] vb = value.getBytes("UTF8"); > value = new String(vb, 0, 0, vb.length); > } > buffer.append(value); >> Manifest.make72Safe(buffer); > buffer.append("\r\n"); > out.writeBytes(buffer.toString()); >> ------------------------- >> sherman >>>> On 5/1/18, 10:21 PM, Philipp Kunz wrote: >> Hi, >>>> Recently, I tried to fix only bug 6202130 with the intention to fix >> bug 6443578 later with the intention to get some opportunity for >> feedback, but haven't got any, and propose now a fix for both >> together which in my opinion makes more sense. >>>> See attached patch. >>>> Some considerations, assumptions, and explanations >>>> * In my opinion, the code for writing manifests was distributed in the >> two classes Attributes and Manifest in an elegant way but somewhat >> difficult to explain the coherence. I chose to group the code that >> writes manifests into a new class ManifestWriter. The main incentive >> for that was to prevent or reduce duplicated code I would have had >> to change twice otherwise. This also results in a source file of a >> suitable size. >> * I could not support the assumption that the write and writeMain >> methods in Attributes couldn't be referenced anywhere so I >> deprecated them rather than having them removed. >> * I assumed the patch will not make it into JDK 10 and, hence, the >> deprecated annotations are attributed with since = 11. >> * I could not figure out any reason for the use of DataOutputStream >> and did not use it. >> * Performance-wise I assume that the code is approximately comparable >> to the previous version. The biggest improvement in this respect I >> hope comes from removing the String that contains the byte array >> constructed with deprecated String(byte[], int, int, int) and then >> copying it over again to a StringBuffer and from there to a String >> again and then Characters. On the other hand, keeping whole >> characters together when breaking lines might make it slightly >> slower. I hope my changes are an overall improvement, but I haven't >> measured it. >> * For telling first from continuation bytes of utf-8 characters apart >> I re-used a method isNotUtfContinuationByte from either StringCoding >> or UTF8.Decoder. Unfortunately I found no way not to duplicate it. >> * Where it said before "XXX Need to handle UTF8 values and break up >> lines longer than 72 bytes" in Attributes#writeMain I did not dare >> to remove the comment completely because it still does not deal >> correctly with version headers longer than 72 bytes and the set of >> allowed values. I changed it accordingly. Two similar comments are >> removed in the patch. >> * I added two tests, WriteDeprecated and NullKeysAndValues, to >> demonstrate compatibility as good as I could. Might however not be >> desired to keep and having to maintain. >> * LineBrokenMultiByteCharacter for jarsigner should not be removed or >> not so immediately because someone might attempt to sign an older >> jarfile created without that patch with a newer jarsigner that >> already contains it. >>>>>>>> suggested changes or additions to the bug database: (i have no >> permissions to edit it myself) >>>> * Re-combine copies of isNotUtfContinuationByte (three by now). >> Relates to 6184334. Worth to file another issue? >> * Manifest versions have specific specifications, cannot break across >> lines and can contain a subset of characters only. Bug 6910466 >> relates but is not exactly the same. If someone else is convinced >> that writing a manifest should issue a warning or any other way to >> deal with a version that does not conform to the specification, I'd >> suggest to create a separate bug for that. >>>>>> Now, I would be glad if someone sponsored a review. This is only my >> third attempt to submit a patch which is why I chose a lesser >> important subject to fix in order to get familiar and now I >> understand it's not the most attractive patch to review. Please don't >> hesitate to suggest what I could do better or differently. >>>> As a bonus, with these changes, manifest files will always be >> displayed correctly with just any utf capable viewer even if they >> contain multi-byte utf characters that would have been broken across >> a line break with the current/previous implementation and all >> manifests will become also valid strings in Java. >>>
Gruss Philipp
Paratix GmbH St Peterhofstatt 11 8001 Zürich
+41 (0)76 397 79 35 philipp.kunz at paratix.ch <mailto:philipp.kunz at paratix.ch> -------------- next part -------------- A non-text attachment was scrubbed... Name: 6443578.patch Type: text/x-patch Size: 43396 bytes Desc: not available URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20180504/c029c971/6443578-0001.patch>
- Previous message: RFR for 6202130: java.util.jar.Attributes.writeMain() can't handle multi-byte chars
- Next message: RFR for 6443578 and 6202130: UTF-8 in Manifests / interesting impl and use of deprecated code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]