NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed. (sponsored)
In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.
Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.
Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
Our reference is a typical java application packaged as a JAR archive (d). All other formats will be compared to this one.
(a) TAR is an uncompressed archive format. It includes a lot of
information like owner, rights, symbolic links, ... Each file has a
quite big (and almost empty) header.
(b) The jar command can build uncompressed archives (option -0). The
resulting file should be considered as the refenrece for uncompressed
archives.
(c,e) The pack200 command has a repack option (-r). On both, the gain
is really minimal so I don't know what the purpose of this option is.
One important point is that (a..e) provides direct access to a
particular when (f..l) don't. This feature is very important because
of the dynamic class loading.
(f) This is the uncompressed result of the pack200 command. What we
see is a 20% gain resulting of a "simple" reorganisation of the
archive.
(g) This is a TAR archive compressed with gzip. This file is 22%
smaller and contains more information than the standard JAR file.
(h,i) This is the uncompressed JAR compressed with gzip and bzip2
respectively.
(j) The TAR file compressed with bzip2 is quite common and provides a
very good compression ratio.
(k,l) The pack200 format is an uncompressed archive format. These files
should then be compressed. Surprisingly, gzip gives a better result
than bzip2.
(m) We used the options -O (--no-keep-file-order) and -E9
(--effort=9). The additional gain over (l) doesn't justify the use of
these options.
Conclusion
The new pack200 format does an excellent job to reduce the size of JAR
archives. Combined with Gzip, it results in an impressive 56%
gain. This is more than enough to justify the addition of this new API
in the JRE. The main problem with it is the lack of direct access to the
contained files (you have to unpack the JAR archive first). I think it
would be very usefull to include this feature in the JRE. When will we
see a PackFile class (similar to JarFile) ? If it is not possible, the
JCP should propose a new format file with high compression ratio of
Pack200 and a direct access. An other interesting point is that this
API is based on interfaces (this is so rare that it should be
mentioned). It means you can implement your own Packer.
PS: Support for Pack200 files in JDistro is planned.
> The main problem with it is the lack of
> direct access to the
> contained files (you have to unpack the JAR archive
> first). I think it
> would be very usefull to include this feature in the
> JRE. When will we
> see a PackFile class (similar to JarFile) ? If it is
> not possible, the
> JCP should propose a new format file with high
> compression ratio of
> Pack200 and a direct access.
Have you ever looked at the specification for the format? It's frightening in its complexity. The only fixed-length field within the entire structure is the magic number (0xCAFED00D) that identifies the file format, so finding any particular piece of data would be a real problem.
It simply isn't possible to access individual files within a Pack200 archive. The data for each class is interwoven with all the others so you'd have a difficult time finding all the bits you need to assemble the class. Plus, as far as I can tell, the decompression step relies on data from *all* of the files in the archive so it just isn't possible to decompress a single file without also processing all the others.
J2ME programmers count bytes the way a super-model counts calories.
I've compressed packed jars with LZMA and that can shrink even better than pack200 + gzip. The LZMA compressor is used by 7z; a Java SDK is available too. Here are my results using the jar file from JDiskReport 1.2.2:
That's well and good. But the main question I have is, how does this affect startup speed? These files need to be decompressed before being executed, and having a very high compression might jeopardize the start up speed. Memory consumption might suffer too, depending on how the compression algorithm works.
Pack200 is strictly a network transmission format. If you were distributing your app, say via WebStart, then it would be downloaded and extracted to a JAR. Form then on, it is executed from the JAR file. You would never execute it directly from the Pack200 file.
J2ME programmers count bytes the way a super-model counts calories.
I have an idea, but i dont know its valuable or not.
It looks that the classfile is bigger because of the contant pool, and most of them are string from signature, and most class share same string mostly.
So can we define a new class file which share strings(the shared constant pool is saved as a single file in the jar file), and when load the class back, we may rebuild the class for JVM(oh, we maybe rewrite the JVM to known such a new class format).
It looks like the .net assemler which share the metadata for an entire assembler. but with less modify the class format(just format not semantec).
As a transmission format, this starts to make sense to me. As a local storage format, decompression of class data on the fly is pretty grotesque - even just jar compression is grotesque - because it adds an element of indeterminacy - heavy classloading can cause an app to seem unresponsive or behave unpredictably, performance-wise, which is why it's almost always best to create jars with no compression at all.
In a perfect world, loading a class would be memory mapping a segment of data from an archive and using that data in place. NIO can't quite do that yet (you still do memory copies to actually access the data), but it's close.
Tim Boudreau NetBeans.org
Evangelist/Senior Staff Engineer, Sun Microsystems
The Java 5 core doesn't contain an LZMA decompressor. However, there's a Java SDK for LZMA compression/decompression, see the reference to the SDK given in my previous post.
Just by the way, another - complementary - way for reducing jar size is doing static analysis. It has its drawbacks and may require knowledge of the application and if it does dynamic classloading, but for certain apps it's well worth the effort.
> As a transmission format, this starts to make sense
> to me. As a local storage format, decompression of
> class data on the fly is pretty grotesque - even just
> jar compression is grotesque - because it adds an
> element of indeterminacy - heavy classloading can
> cause an app to seem unresponsive or behave
> unpredictably, performance-wise, which is why it's
> almost always best to create jars with no compression
> at all.
Yep. We had one client that was *very* concerned with start-up time. Profiling showed that a significant portion of the time was spent just doing IO in the classloaders. So we told them to make sure that any classes that need to be loaded during start-up MUST be uncompressed. That was a hard sell cuz in the J2ME world, footprint is a *very* important concern too. But profiling the app showed that loading the uncompressed class files took less than half as long.
> In a perfect world, loading a class would be memory
> mapping a segment of data from an archive and using
> that data in place. NIO can't quite do that yet (you
> still do memory copies to actually access the data),
> but it's close.
Yep, which is why a number of J2ME VMs support a "ROMized" form of a class/JAR file that is directly executable. Of course, the in-memory layout of the class data is very much dependent on the VM, so this kind of thing will always be proprietary.
J2ME programmers count bytes the way a super-model counts calories.
ProGuard
does all of that and more. It analyzes your class files to locate unused fields and methods, and removes them. By default, it also obfuscates the names, replacing them with one or two character names which further reduces the class size, but you can turn that feature off if you prefer.
J2ME programmers count bytes the way a super-model counts calories.
That's precisely what the Pack200 format does. It removes *all* duplication of any kind, including duplicate constant pool entries. Constant pool entries are also grouped by type, and sorted to improve their compressability.
If you're interested, you should lookup JSR 200 (which defines the Pack200 format). It includes references to previous experiments to reduce classfile size. As I recall, one of the earlier formats was essentially a JAR file with a common constant pool for all classes, just as you described.
J2ME programmers count bytes the way a super-model counts calories.
Pack200: reduce JAR file size by 56.18%
URL: Pack200 Javadoc
At 9:43 AM on Apr 26, 2005, Guillaume Desnoix
wrote:
Fresh Jobs for Developers Post a job opportunity
Pack200 is a new file format to distribute Java archives. There is also a minimal API .
Size--- Filename--------- %Size- %Gain
(a) 7362560 appli.tar 218.80 -
(b) 5882349 appli-0.jar 174.80 -
(c) 5881813 appli-0repack.jar 174.80 -
(d) 3364097 appli.jar 100.00 -
(e) 3311845 appli-repack.jar 98.44 1.56
(f) 2683596 appli.pack 79.77 20.23
(g) 2591313 appli.tar.gz 77.02 22.98
(h) 2571792 appli-0gz9.jar.gz 76.44 23.56
(i) 2347698 appli-0bz9.jar.bz2 69.78 30.22
(j) 2299132 appli.tar.bz2 68.34 31.66
(k) 1519341 appli-bz9.pack.bz2 45.16 54.84
(l) 1474374 appli.pack.gz 43.82 56.18
(m) 1453391 appli-OE9.pack.gz 43.20 56.80
Our reference is a typical java application packaged as a JAR archive (d). All other formats will be compared to this one.
One important point is that (a..e) provides direct access to a particular when (f..l) don't. This feature is very important because of the dynamic class loading.
Conclusion
The new pack200 format does an excellent job to reduce the size of JAR archives. Combined with Gzip, it results in an impressive 56% gain. This is more than enough to justify the addition of this new API in the JRE. The main problem with it is the lack of direct access to the contained files (you have to unpack the JAR archive first). I think it would be very usefull to include this feature in the JRE. When will we see a PackFile class (similar to JarFile) ? If it is not possible, the JCP should propose a new format file with high compression ratio of Pack200 and a direct access. An other interesting point is that this API is based on interfaces (this is so rare that it should be mentioned). It means you can implement your own Packer.
PS: Support for Pack200 files in JDistro is planned.20 replies so far (
Post your own)
Re: Pack200: 56.18%
What problem is this solving?Re: Pack200: reduce JAR file size by 56.18%
> The main problem with it is the lack of> direct access to the
> contained files (you have to unpack the JAR archive
> first). I think it
> would be very usefull to include this feature in the
> JRE. When will we
> see a PackFile class (similar to JarFile) ? If it is
> not possible, the
> JCP should propose a new format file with high
> compression ratio of
> Pack200 and a direct access.
Have you ever looked at the specification for the format? It's frightening in its complexity. The only fixed-length field within the entire structure is the magic number (0xCAFED00D) that identifies the file format, so finding any particular piece of data would be a real problem.
It simply isn't possible to access individual files within a Pack200 archive. The data for each class is interwoven with all the others so you'd have a difficult time finding all the bits you need to assemble the class. Plus, as far as I can tell, the decompression step relies on data from *all* of the files in the archive so it just isn't possible to decompress a single file without also processing all the others.
LZMA can beat gzip and bzip2
I've compressed packed jars with LZMA and that can shrink even better than pack200 + gzip. The LZMA compressor is used by 7z; a Java SDK is available too. Here are my results using the jar file from JDiskReport 1.2.2:I suggest that you post the jar file you used, so others can experiment with it.
Regards,
Karsten
References:
LZMA - http://www.7-zip.org/sdk.html
JDiskReport - http://www.jgoodies.com/downloads/
How does this affect startup speed?
That's well and good. But the main question I have is, how does this affect startup speed? These files need to be decompressed before being executed, and having a very high compression might jeopardize the start up speed. Memory consumption might suffer too, depending on how the compression algorithm works.Re: How does this affect startup speed?
Pack200 is strictly a network transmission format. If you were distributing your app, say via WebStart, then it would be downloaded and extracted to a JAR. Form then on, it is executed from the JAR file. You would never execute it directly from the Pack200 file.Re: LZMA can beat gzip and bzip2
Is LZMA supported by Java's built-in compression library? I'm guessing not, unless its output is compatible with the DEFLATE algorithm.Re: Pack200: reduce JAR file size by 56.18%
I have an idea, but i dont know its valuable or not.It looks that the classfile is bigger because of the contant pool, and most of them are string from signature, and most class share same string mostly.
So can we define a new class file which share strings(the shared constant pool is saved as a single file in the jar file), and when load the class back, we may rebuild the class for JVM(oh, we maybe rewrite the JVM to known such a new class format).
It looks like the .net assemler which share the metadata for an entire assembler. but with less modify the class format(just format not semantec).
Re: How does this affect startup speed?
As a transmission format, this starts to make sense to me. As a local storage format, decompression of class data on the fly is pretty grotesque - even just jar compression is grotesque - because it adds an element of indeterminacy - heavy classloading can cause an app to seem unresponsive or behave unpredictably, performance-wise, which is why it's almost always best to create jars with no compression at all.In a perfect world, loading a class would be memory mapping a segment of data from an archive and using that data in place. NIO can't quite do that yet (you still do memory copies to actually access the data), but it's close.
NetBeans.org
Evangelist/Senior Staff Engineer, Sun Microsystems
Re: LZMA can beat gzip and bzip2
The Java 5 core doesn't contain an LZMA decompressor. However, there's a Java SDK for LZMA compression/decompression, see the reference to the SDK given in my previous post.- Karsten
Re: Pack200: reduce JAR file size by 56.18%
Just by the way, another - complementary - way for reducing jar size is doing static analysis. It has its drawbacks and may require knowledge of the application and if it does dynamic classloading, but for certain apps it's well worth the effort.See http://sadun-util.sourceforge.net/pack.html
Re: How does this affect startup speed?
> As a transmission format, this starts to make sense> to me. As a local storage format, decompression of
> class data on the fly is pretty grotesque - even just
> jar compression is grotesque - because it adds an
> element of indeterminacy - heavy classloading can
> cause an app to seem unresponsive or behave
> unpredictably, performance-wise, which is why it's
> almost always best to create jars with no compression
> at all.
Yep. We had one client that was *very* concerned with start-up time. Profiling showed that a significant portion of the time was spent just doing IO in the classloaders. So we told them to make sure that any classes that need to be loaded during start-up MUST be uncompressed. That was a hard sell cuz in the J2ME world, footprint is a *very* important concern too. But profiling the app showed that loading the uncompressed class files took less than half as long.
> In a perfect world, loading a class would be memory
> mapping a segment of data from an archive and using
> that data in place. NIO can't quite do that yet (you
> still do memory copies to actually access the data),
> but it's close.
Yep, which is why a number of J2ME VMs support a "ROMized" form of a class/JAR file that is directly executable. Of course, the in-memory layout of the class data is very much dependent on the VM, so this kind of thing will always be proprietary.
Checkout ProGuard
ProGuard does all of that and more. It analyzes your class files to locate unused fields and methods, and removes them. By default, it also obfuscates the names, replacing them with one or two character names which further reduces the class size, but you can turn that feature off if you prefer.Re: Pack200: reduce JAR file size by 56.18%
That's precisely what the Pack200 format does. It removes *all* duplication of any kind, including duplicate constant pool entries. Constant pool entries are also grouped by type, and sorted to improve their compressability.If you're interested, you should lookup JSR 200 (which defines the Pack200 format). It includes references to previous experiments to reduce classfile size. As I recall, one of the earlier formats was essentially a JAR file with a common constant pool for all classes, just as you described.
Re: LZMA can beat gzip and bzip2
Hi, I unfortunately can not post the jar file. I should have choosen one I could. Here is the results with 7zip:3364097 appli-std.jar 100,00 0,00
2683596 appli-std.pack 79,77 20,23
1474374 appli-std.pack.gz 43,82 56,18
1297993 appli-std.pack.7z 38,58 61,42
7zip provides an additional 5%. Excellent!
Regards, Guillaume