NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed. (sponsored)
In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.
Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.
Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
A human-readable hexedecimal string representing the MD5 summation of the contents of a file or even just a text string can be extremely valuable in programming contexts as well when handling user input. The MD5 concept is covered in detail
on Wikipedia.org
.
MD5 is very valuable, but the process of calculating MD5 for a array of bytes in Java is not as straightforward as it may seem. In many languages and operating systems (like Linux) calculating MD5 is as simple as saying 'md5sum [filename] --[ascii|binary]'; the Java implementation is lower level, and is meant to be able to MD5 arbitrary lengths of binary data. In addition, the Java API handles other digest/checksum algorithms, such as SHA and MD2.
Here is a complete method for calculating MD5 from a file (as a common example):
First, we need to get an implementation of
java.security.MessageDigest
that is meant to handle MD5 specifically:
There are several algorithms typically available. References to what versions are typically available and the standard identifiers are detailed and referenced in the
MessageDigest
javadoc. Now that we have this digest, we need to start calculating the MD5, using a piece of the data at a time. To do this, we need to break the binary data up into byte[] chunks. For this process we can use a standard
java.io.InputStream
(in this case
FileInputStream
) buffering into a byte[].
File f = new File("c:\\myfile.txt");
InputStream is = new FileInputStream(f);
byte[] buffer = newbyte[8192];
int read = 0;
try {
while( (read = is.read(buffer)) > 0) {
// --- MD5 work here.---
}
}
catch(IOException e) {
thrownew RuntimeException("Unable to process file for MD5", e);
}
finally {
try {
is.close();
}
catch(IOException e) {
thrownew RuntimeException("Unable to close input stream for MD5 calculation", e);
}
}
This loop should look familiar to most people, and doesn't really have any unique MD5 component in it. This algorithm is, in general, the same that could be used to copy a file in Java or numerous other IO processing tasks. Note that if you are MD5-ing something completely different from a file, you don't need this loop. You could easily be processing user input, segments from an XML file, or even a serialized object. The point is, all MD5 needs is byte data.
During each iteration of the loop, we need to take the new read-in data and tell the message digest class to add new data as part of the calculation - we do this by calling
MessageDigest.update(byte[])
(or one of its variants). In this case, we can use the variant that also takes and offset and length:
digest.update(buffer, 0, read);
The last step which is often important, however isn't required in all cases is to take the finished result of the MD5 calculation, and turn it into some managable format. By default, the MessageDigest class provides a byte[] representing the finished calculation. It is common practice in the modern tech world to throw MD5 data around as a hexadecimal text string. I've seen common solutions using Apache commons libraries (such as the
org.apache.commons.codec.binary.Hex.encodeHex(byte[])
method in the
Commons Codec
project), although using
java.lang.BigInteger
it is possible to perform the calculation (albeit possibly at a higher processing expense) - the only trick with BigInteger is knowing which bit represents the sign bit (hence the '1' in the following code):
BigInteger bigInt = new BigInteger(1, md5sum);
String output = bigInt.toString(16);
Here is the a full example:
publicstaticvoid main(String[] args) throws NoSuchAlgorithmException, FileNotFoundException {
MessageDigest digest = MessageDigest.getInstance("MD5");
File f = new File("c:\\myfile.txt");
InputStream is = new FileInputStream(f);
byte[] buffer = newbyte[8192];
int read = 0;
try {
while( (read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
byte[] md5sum = digest.digest();
BigInteger bigInt = new BigInteger(1, md5sum);
String output = bigInt.toString(16);
System.out.println("MD5: " + output);
}
catch(IOException e) {
thrownew RuntimeException("Unable to process file for MD5", e);
}
finally {
try {
is.close();
}
catch(IOException e) {
thrownew RuntimeException("Unable to close input stream for MD5 calculation", e);
}
}
}
Very good straightforward explanation.
I wouldn't worry about the overhead of the BigInteger, there's only one and its created once.
The code of converting the array of bytes into an integer may be faster but, not much. Besides, the loop reading the file will take orders of magnitudes more time so any performance gain would be immeasurable.
Actually, the reason I brought up the BigInteger is that we have an application that deals in a significant number of files that have to be MD5'ed, and the Hex algorithm in Apache proved faster for us in frequent invocations in a tight loop over files.
In general, however; I agree - performance optimizations shouldn't be made just 'because'. It was effectively free for us since we already had Apache Codec in our classpath.
Getting MD5 Sums in Java
At 1:42 PM on Nov 15, 2006, R.J. Lorimer wrote:
Fresh Jobs for Developers Post a job opportunity
A human-readable hexedecimal string representing the MD5 summation of the contents of a file or even just a text string can be extremely valuable in programming contexts as well when handling user input. The MD5 concept is covered in detail on Wikipedia.org .
MD5 is very valuable, but the process of calculating MD5 for a array of bytes in Java is not as straightforward as it may seem. In many languages and operating systems (like Linux) calculating MD5 is as simple as saying 'md5sum [filename] --[ascii|binary]'; the Java implementation is lower level, and is meant to be able to MD5 arbitrary lengths of binary data. In addition, the Java API handles other digest/checksum algorithms, such as SHA and MD2.
Here is a complete method for calculating MD5 from a file (as a common example):
First, we need to get an implementation of
java.security.MessageDigestthat is meant to handle MD5 specifically:MessageDigest digest = MessageDigest.getInstance("MD5");There are several algorithms typically available. References to what versions are typically available and the standard identifiers are detailed and referenced in the
MessageDigestjavadoc. Now that we have this digest, we need to start calculating the MD5, using a piece of the data at a time. To do this, we need to break the binary data up into byte[] chunks. For this process we can use a standardjava.io.InputStream(in this caseFileInputStream) buffering into a byte[].File f = new File("c:\\myfile.txt"); InputStream is = new FileInputStream(f); byte[] buffer = new byte[8192]; int read = 0; try { while( (read = is.read(buffer)) > 0) { // --- MD5 work here.--- } } catch(IOException e) { throw new RuntimeException("Unable to process file for MD5", e); } finally { try { is.close(); } catch(IOException e) { throw new RuntimeException("Unable to close input stream for MD5 calculation", e); } }This loop should look familiar to most people, and doesn't really have any unique MD5 component in it. This algorithm is, in general, the same that could be used to copy a file in Java or numerous other IO processing tasks. Note that if you are MD5-ing something completely different from a file, you don't need this loop. You could easily be processing user input, segments from an XML file, or even a serialized object. The point is, all MD5 needs is byte data.
During each iteration of the loop, we need to take the new read-in data and tell the message digest class to add new data as part of the calculation - we do this by calling
MessageDigest.update(byte[])(or one of its variants). In this case, we can use the variant that also takes and offset and length:The last step which is often important, however isn't required in all cases is to take the finished result of the MD5 calculation, and turn it into some managable format. By default, the MessageDigest class provides a byte[] representing the finished calculation. It is common practice in the modern tech world to throw MD5 data around as a hexadecimal text string. I've seen common solutions using Apache commons libraries (such as the
org.apache.commons.codec.binary.Hex.encodeHex(byte[])method in the Commons Codec project), although usingjava.lang.BigIntegerit is possible to perform the calculation (albeit possibly at a higher processing expense) - the only trick with BigInteger is knowing which bit represents the sign bit (hence the '1' in the following code):Here is the a full example:
public static void main(String[] args) throws NoSuchAlgorithmException, FileNotFoundException { MessageDigest digest = MessageDigest.getInstance("MD5"); File f = new File("c:\\myfile.txt"); InputStream is = new FileInputStream(f); byte[] buffer = new byte[8192]; int read = 0; try { while( (read = is.read(buffer)) > 0) { digest.update(buffer, 0, read); } byte[] md5sum = digest.digest(); BigInteger bigInt = new BigInteger(1, md5sum); String output = bigInt.toString(16); System.out.println("MD5: " + output); } catch(IOException e) { throw new RuntimeException("Unable to process file for MD5", e); } finally { try { is.close(); } catch(IOException e) { throw new RuntimeException("Unable to close input stream for MD5 calculation", e); } } }2 replies so far (
Post your own)
Re: Getting MD5 Sums in Java
Very good straightforward explanation.I wouldn't worry about the overhead of the BigInteger, there's only one and its created once.
The code of converting the array of bytes into an integer may be faster but, not much. Besides, the loop reading the file will take orders of magnitudes more time so any performance gain would be immeasurable.
Re: Getting MD5 Sums in Java
Paul,Actually, the reason I brought up the BigInteger is that we have an application that deals in a significant number of files that have to be MD5'ed, and the Hex algorithm in Apache proved faster for us in frequent invocations in a tight loop over files.
In general, however; I agree - performance optimizations shouldn't be made just 'because'. It was effectively free for us since we already had Apache Codec in our classpath.
Regards,