RFR for 6443578 and 6202130: UTF-8 in Manifests (original) (raw)

Philipp Kunz philipp.kunz at paratix.ch
Wed May 2 05:21:37 UTC 2018


Hi,

Recently, I tried to fix only bug 6202130 with the intention to fix bug 6443578 later with the intention to get some opportunity for feedback, but haven't got any, and propose now a fix for both together which in my opinion makes more sense.

See attached patch.

Some considerations, assumptions, and explanations

suggested changes or additions to the bug database: (i have no permissions to edit it myself)

Now, I would be glad if someone sponsored a review. This is only my third attempt to submit a patch which is why I chose a lesser important subject to fix in order to get familiar and now I understand it's not the most attractive patch to review. Please don't hesitate to suggest what I could do better or differently.

As a bonus, with these changes, manifest files will always be displayed correctly with just any utf capable viewer even if they contain multi-byte utf characters that would have been broken across a line break with the current/previous implementation and all manifests will become also valid strings in Java.

Regards, Philipp

On 20.04.2018 00:58, Philipp Kunz wrote:

Hi,

I tried to fix bug 6202130 about manifest utf support and come up now with a test as suggested in the bug's comments that shows that utf charset actually works before removing the comments from the code. When I wanted to remove the XXX comments about utf it occurred to me that version attributes ("Signature-Version" and "Manifest-Version") would never be broken across lines and should anyway not support the whole utf character set which sounds more like related to bugs 6910466 or 4935610 but it's not a real fit. Therefore, I could not remove one such comment of Attributes#writeMain but I changed it. The first comment in bug 6202130 mentions only two comments but there are three in Attributes. In the attached patch I removed only two of three and changed the remaining third to not mention utf anymore. At the moment, at least until 6443578 is fixed, multi-byte utf characters can be broken across lines. It might be worth a consideration to test that explicitly as well but then I guess there is not much of a point in testing the current behavior that will change with 6443578, hopefully soon. There are in my opinion enough characters broken across lines in the attached test that demonstrate that this still works like it did before. I would have preferred also to remove the calls to deprecated String(byte[], int, int, int) but then figured it relates more to bug 6443578 than 6202130 and now prefer to do that in another separate patch. Bug 6202130 also states that lines are broken by String.length not by byte length. While it looks so at first glance, I could not confirm. The combination of getBytes("UTF8"), String(byte[], int, int, int), and then DataOutputStream.writeBytes(String) in that combination does not drop high-bytes because every byte (whether a whole character or only a part of a multi-byte character) becomes a character in String(...) containing that byte in its low-byte which will be read again from writeBytes(...). Or put in a different way, every utf encoded byte becomes a character and multi-byte utf characters are converted into multiple string characters containing one byte each in their lower bytes. The current solution is not nice, but at least works. With that respect I'd like to suggest to deprecate DataOutputStream.writeBytes(String) because it does something not exactly expected when guessing from its name and that would suit a byte[] parameter better very much like it has been done with String(byte[], int, int, int). Any advice about the procedure to deprecate something? I was surprised that it was not trivial to list all valid utf characters. If someone has a better idea than isValidUtfCharacter in the attached test, let me know. Altogether, I would not consider 6202130 resolved completely, unless maybe all remaining points are copied to 6443578 and maybe another bug about valid values for "Signature-Version" and "Manifest-Version" if at all desired. But still I consider the attached patch an improvement and most of the remainder can then be solved in 6443578 and so far I am looking forward to any kind of feedback. Regards, Philipp

diff -r 2ace90aec488 src/java.base/share/classes/java/util/jar/Attributes.java --- a/src/java.base/share/classes/java/util/jar/Attributes.java Mon Apr 30 21:56:54 2018 -0400 +++ b/src/java.base/share/classes/java/util/jar/Attributes.java Wed May 02 07:20:46 2018 +0200 @@ -296,27 +296,13 @@

  /*
   * Writes the current attributes to the specified data output stream.

IOException { diff -r 2ace90aec488 src/java.base/share/classes/java/util/jar/Manifest.java --- a/src/java.base/share/classes/java/util/jar/Manifest.java Mon Apr 30 21:56:54 2018 -0400 +++ b/src/java.base/share/classes/java/util/jar/Manifest.java Wed May 02 07:20:46 2018 +0200 @@ -26,7 +26,6 @@ package java.util.jar;

import java.io.FilterInputStream; -import java.io.DataOutputStream; import java.io.InputStream; import java.io.OutputStream; import java.io.IOException; @@ -143,31 +142,19 @@ * @exception IOException if an I/O error has occurred * @see #getMainAttributes */

diff -r 2ace90aec488 src/java.base/share/classes/java/util/jar/ManifestWriter.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/java.base/share/classes/java/util/jar/ManifestWriter.java Wed May 02 07:20:46 2018 +0200 @@ -0,0 +1,224 @@ +/*

href="{@docRoot}/../specs/jar/jar.html#Notes_on_Manifest_and_Signature_Files">

form.

of 72

writes

current line

break

IOException {

line width

characters max

earlier

occurs

non-characters

order marks

order marks

length");

line break

followed by

break)

to place

to the

have to be

space

relative to

pairs)

to the

int p,

value)

before

of the

the same

required to

@@ -37,8 +41,10 @@

@@ -49,6 +55,11 @@ final static int MAX_HEADER_NAME_LENGTH = 70;

  /**

header

header * name length or that an exception occurs already on the attempt to write * an invalid one otherwise and that no invalid manifest can be written. *

could

width, such

could

width, such * a situation should never happen, which is the subject of this test.

one-byte utf

multi-byte

subject of

this test. */

@@ -111,18 +129,18 @@ }

  /**

provided * the manifest is valid otherwise. *

href="{@docRoot}/../specs/jar/jar.html#Notes_on_Manifest_and_Signature_Files">

earlier or

href="{@docRoot}/../specs/jar/jar.html#Notes_on_Manifest_and_Signature_Files">

(not

sure

sure * that manifest files broken at 70 bytes line width written with the * previous version of {@link Manifest} before this fix still work well. */ @@ -186,7 +204,8 @@ * form. */ String lineBrokenSectionName = breakLines(lineWidth, "Name: "

" + value);

@@ -264,7 +283,7 @@ return mfBytes; }

"----------------------------------------------" + "---------------------|-|-|"; // |-positions: ---68-70-72 System.out.println(sepLine); @@ -272,13 +291,179 @@ System.out.println(sepLine); }

name, + static void assertMainAndSectionValues(Manifest mf, String name, String value) { String mainValue = mf.getMainAttributes().getValue(name); - String sectionValue = mf.getAttributes(name).getValue(name);

      assertEquals(value, mainValue, "value different in main section");

section"); }

a/test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java Mon Apr 30 21:56:54 2018 -0400 +++ b/test/jdk/sun/security/tools/jarsigner/LineBrokenMultiByteCharacter.java Wed May 02 07:20:46 2018 +0200 @@ -21,35 +21,48 @@

-/* - * @test - * @bug 6695402 - * @summary verify signatures of jars containing classes with names - * with multi-byte unicode characters broken across lines - * @library /test/lib - */

import java.io.IOException; import java.io.InputStream; import java.nio.file.Files; import java.nio.file.Paths; +import java.util.HashMap; +import java.util.jar.Attributes.Name; import java.util.jar.JarFile; -import java.util.jar.Attributes.Name; import java.util.jar.JarEntry; +import java.util.zip.ZipEntry; +import java.util.zip.ZipFile;

import static java.nio.charset.StandardCharsets.UTF_8;

import jdk.test.lib.SecurityTools; import jdk.test.lib.util.JarUtils;

+import org.testng.annotations.; +import static org.testng.Assert.; + +/**

letter * at its exact position. - * - * because no file with such a name exists {@link JarUtils} will add the + *

+ * Because no file with such a name exists {@link JarUtils} will add the * name itself as contents to the jar entry which would have contained a * compiled class in the original bug. For this test, the contents of the * files contained in the jar file is not important as long as they get @@ -58,55 +71,157 @@ * @see #verifyClassNameLineBroken(JarFile, String) */ static final String testClassName =

"LineBrokenMultiByteCharacterA1234567890B1234567890C123456789D1234\u00E9xyz.class";

existing + * signed jar that already contains an entry with this name which of course + * has to be distinct from the one tested. + */ static final String anotherName =

"LineBrokenMultiByteCharacterA1234567890B1234567890C123456789D1234567890.class";

1.0\r\n\r\n".getBytes(UTF_8);

earlier jdk

"-genkeypair", "-storepass", "changeit", "-keypass", "changeit", "-storetype", "JKS", "-alias", alias, "-dname", "CN=X", "-validity", "366") .shouldHaveExitValue(0);

1.0\r\n").getBytes(UTF_8));

"-storetype",

alias)

refJar.getInputStream(mfEntry).readAllBytes();

testClassName);

Exception {

testClassName);

Exception {

anotherName); SecurityTools.jarsigner("-keystore", keystoreFileName, "-storetype", "JKS", "-storepass", "changeit", "-debug", initialFileName, alias).shouldHaveExitValue(0); JarUtils.updateJar(initialFileName, updatedFileName, testClassName);

anotherName);

"-storetype",

testClassName);

"-storetype", "JKS", "-storepass", "changeit", "-debug", jarFileName, alias) .shouldHaveExitValue(0); @@ -114,34 +229,34 @@ try ( JarFile jar = new JarFile(jarFileName); ) {

expectBrokenEAcute); verifyCodeSigners(jar, jar.getJarEntry(testClassName)); } }

  /**

renaming an

renaming an * identifier so that the multi-byte encoded character would not any longer * be broken across a line break. *

based on

based on * the manifest and not based on the signature file because at the moment, * the signature file does not even contain the desired entry at all.

@@ -156,10 +271,12 @@ bytesMatched = 0; } }

@@ -175,11 +292,16 @@ // a check for the presence of code signers is sufficient to check // bug JDK-6695402. no need to also verify the actual code signers // attributes here.

jarEntry.getName());

jarFileName,

-------------- next part -------------- A non-text attachment was scrubbed... Name: 6372077and6443578.patch Type: text/x-patch Size: 79654 bytes Desc: not available URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20180502/3367c096/6372077and6443578-0001.patch>



More information about the core-libs-dev mailing list