Hashing files/bytes Re: RFR(JDK11/NIO) 8202285: (fs) Add a method to Files for comparing file contents (original) (raw)

forax at univ-mlv.fr forax at univ-mlv.fr
Wed May 2 06:53:21 UTC 2018


----- Mail original -----

De: "John Rose" <john.r.rose at oracle.com> À: "Remi Forax" <forax at univ-mlv.fr> Cc: "Paul Sandoz" <paul.sandoz at oracle.com>, "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev" <core-libs-dev at openjdk.java.net> Envoyé: Mercredi 2 Mai 2018 07:35:38 Objet: Re: Hashing files/bytes Re: RFR(JDK11/NIO) 8202285: (fs) Add a method to Files for comparing file contents

Here's another potential stacking:

Define an interface ByteSequence, similar to CharSequence, as a zero-copy reference to some stored bytes somewhere. (Give it a long length.) Define bulk methods on it like hash and mismatch and transferTo. Then make File and ByteBuffer implement it. Deal with the cross-product of source and destination types underneath the interface. (Also I want ByteSequence as a way to encapsulate resource data for class files and condy, using zero-copy methods. The types byte[] and String don't scale and require copies.)

your ByteSequence is ByteBuffer ! a ByteBuffer can be a mapped file or wrapped a byte array, mismatch is compareTo, transferTo is put(ByteBuffer), and hash should be messageDigest.digest(ByteBuffer) which doesn't exist but should.

— John

Rémi

On May 1, 2018, at 3:04 PM, forax at univ-mlv.fr wrote:

----- Mail original ----- De: "Paul Sandoz" <paul.sandoz at oracle.com> À: "Remi Forax" <forax at univ-mlv.fr> Cc: "Alan Bateman" <Alan.Bateman at oracle.com>, "nio-dev" <nio-dev at openjdk.java.net>, "core-libs-dev" <core-libs-dev at openjdk.java.net> Envoyé: Mardi 1 Mai 2018 00:37:57 Objet: Hashing files/bytes Re: RFR(JDK11/NIO) 8202285: (fs) Add a method to Files for comparing file contents

Thanks, better then i expected with the transferTo method we recently added, but i think we could do even better for the ease of use case of “give me the hash of this file contents or these bytes or this byte buffer". yes, it can be a nice addition to java.nio.file.Files and in that case the method that compare content can have reference in its documentation to this new method.

Paul. Rémi

On Apr 30, 2018, at 3:23 PM, Remi Forax <forax at univ-mlv.fr> wrote:

To Remi’s point this might dissuade/guide developers from using this method when there are other more efficient techniques available when operating at larger scales. However, it is unfortunately harder that it should be in Java to hash the contents of a file, a byte[] or ByteBuffer, according to some chosen algorithm (or a good default). it's 6 lines of code var digest = MessageDigest.getInstance("SHA1"); try(var input = Files.newInputStream(Path.of("myfile.txt")); var output = new DigestOutputStream(OutputStream.nullOutputStream(), digest)) { input.transferTo(output); } var hash = digest.digest(); or 3 lines if you don't mind to load the whole file in memory var digest = MessageDigest.getInstance("SHA1"); digest.update(Files.readAllBytes(Path.of("myfile.txt"))); var hash = digest.digest(); Paul. >>> Rémi



More information about the core-libs-dev mailing list