Forum Controls
Spotlight Features

The Rich Engineering Heritage Behind Dependency Injection

Andrew McVeigh takes us on a tour of the rich heritage behind dependency injection, what it represents, and tells us why its here to stay.

NetBeans 6: Matisse Updates

NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed.

Introduction to Groovy Part 3

In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.

Easier Custom Components with Swing Fuse

Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.

Benchmark Analysis: Guice vs Spring

Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
Replies: 20 - Pages: 2   [ 1 2 | Next ]
  Click to reply to this thread Reply

Pack200: reduce JAR file size by 56.18%

URL: Pack200 Javadoc

At 9:43 AM on Apr 26, 2005, Guillaume Desnoix DeveloperZone Top 100 wrote:

Pack200 is a new file format to distribute Java archives. There is also a minimal API .



Size--- Filename--------- %Size- %Gain
(a) 7362560 appli.tar 218.80 -
(b) 5882349 appli-0.jar 174.80 -
(c) 5881813 appli-0repack.jar 174.80 -
(d) 3364097 appli.jar 100.00 -
(e) 3311845 appli-repack.jar 98.44 1.56
(f) 2683596 appli.pack 79.77 20.23
(g) 2591313 appli.tar.gz 77.02 22.98
(h) 2571792 appli-0gz9.jar.gz 76.44 23.56
(i) 2347698 appli-0bz9.jar.bz2 69.78 30.22
(j) 2299132 appli.tar.bz2 68.34 31.66
(k) 1519341 appli-bz9.pack.bz2 45.16 54.84
(l) 1474374 appli.pack.gz 43.82 56.18
(m) 1453391 appli-OE9.pack.gz 43.20 56.80


Our reference is a typical java application packaged as a JAR archive (d). All other formats will be compared to this one.

  • (a) TAR is an uncompressed archive format. It includes a lot of information like owner, rights, symbolic links, ... Each file has a quite big (and almost empty) header.
  • (b) The jar command can build uncompressed archives (option -0). The resulting file should be considered as the refenrece for uncompressed archives.
  • (c,e) The pack200 command has a repack option (-r). On both, the gain is really minimal so I don't know what the purpose of this option is.
    One important point is that (a..e) provides direct access to a particular when (f..l) don't. This feature is very important because of the dynamic class loading.
  • (f) This is the uncompressed result of the pack200 command. What we see is a 20% gain resulting of a "simple" reorganisation of the archive.
  • (g) This is a TAR archive compressed with gzip. This file is 22% smaller and contains more information than the standard JAR file.
  • (h,i) This is the uncompressed JAR compressed with gzip and bzip2 respectively.
  • (j) The TAR file compressed with bzip2 is quite common and provides a very good compression ratio.
  • (k,l) The pack200 format is an uncompressed archive format. These files should then be compressed. Surprisingly, gzip gives a better result than bzip2.
  • (m) We used the options -O (--no-keep-file-order) and -E9 (--effort=9). The additional gain over (l) doesn't justify the use of these options.

Conclusion

The new pack200 format does an excellent job to reduce the size of JAR archives. Combined with Gzip, it results in an impressive 56% gain. This is more than enough to justify the addition of this new API in the JRE. The main problem with it is the lack of direct access to the contained files (you have to unpack the JAR archive first). I think it would be very usefull to include this feature in the JRE. When will we see a PackFile class (similar to JarFile) ? If it is not possible, the JCP should propose a new format file with high compression ratio of Pack200 and a direct access. An other interesting point is that this API is based on interfaces (this is so rare that it should be mentioned). It means you can implement your own Packer.

PS: Support for Pack200 files in JDistro is planned.
1 . At 1:33 PM on Apr 26, 2005, Nils Kilden-Pedersen wrote:
  Click to reply to this thread Reply

Re: Pack200: 56.18%

What problem is this solving?
Easy relational to object mapping: O/R Broker
2 . At 2:31 PM on Apr 26, 2005, Kevin Riff DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Pack200: reduce JAR file size by 56.18%

> The main problem with it is the lack of
> direct access to the
> contained files (you have to unpack the JAR archive
> first). I think it
> would be very usefull to include this feature in the
> JRE. When will we
> see a PackFile class (similar to JarFile) ? If it is
> not possible, the
> JCP should propose a new format file with high
> compression ratio of
> Pack200 and a direct access.

Have you ever looked at the specification for the format? It's frightening in its complexity. The only fixed-length field within the entire structure is the magic number (0xCAFED00D) that identifies the file format, so finding any particular piece of data would be a real problem.

It simply isn't possible to access individual files within a Pack200 archive. The data for each class is interwoven with all the others so you'd have a difficult time finding all the bits you need to assemble the class. Plus, as far as I can tell, the decompression step relies on data from *all* of the files in the archive so it just isn't possible to decompress a single file without also processing all the others.
J2ME programmers count bytes the way a super-model counts calories.
3 . At 3:03 PM on Apr 26, 2005, Karsten Lentzsch wrote:
  Click to reply to this thread Reply

LZMA can beat gzip and bzip2

I've compressed packed jars with LZMA and that can shrink even better than pack200 + gzip. The LZMA compressor is used by 7z; a Java SDK is available too. Here are my results using the jar file from JDiskReport 1.2.2:

Size   File 
920028 jdr.jar     100.00%
606310 jdr.pack     65.90%
298804 jdr.pack.gz  32.47%
269680 jdr.pack.7z  29.31%


I suggest that you post the jar file you used, so others can experiment with it.

Regards,
Karsten

References:
LZMA - http://www.7-zip.org/sdk.html
JDiskReport - http://www.jgoodies.com/downloads/
4 . At 3:45 PM on Apr 26, 2005, Philip Goh wrote:
  Click to reply to this thread Reply

How does this affect startup speed?

That's well and good. But the main question I have is, how does this affect startup speed? These files need to be decompressed before being executed, and having a very high compression might jeopardize the start up speed. Memory consumption might suffer too, depending on how the compression algorithm works.
5 . At 3:59 PM on Apr 26, 2005, Kevin Riff DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: How does this affect startup speed?

Pack200 is strictly a network transmission format. If you were distributing your app, say via WebStart, then it would be downloaded and extracted to a JAR. Form then on, it is executed from the JAR file. You would never execute it directly from the Pack200 file.
J2ME programmers count bytes the way a super-model counts calories.
6 . At 4:06 PM on Apr 26, 2005, Kevin Riff DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: LZMA can beat gzip and bzip2

Is LZMA supported by Java's built-in compression library? I'm guessing not, unless its output is compatible with the DEFLATE algorithm.
J2ME programmers count bytes the way a super-model counts calories.
7 . At 11:46 PM on Apr 26, 2005, wang zaixiang wrote:
  Click to reply to this thread Reply

Re: Pack200: reduce JAR file size by 56.18%

I have an idea, but i dont know its valuable or not.

It looks that the classfile is bigger because of the contant pool, and most of them are string from signature, and most class share same string mostly.

So can we define a new class file which share strings(the shared constant pool is saved as a single file in the jar file), and when load the class back, we may rebuild the class for JVM(oh, we maybe rewrite the JVM to known such a new class format).

It looks like the .net assemler which share the metadata for an entire assembler. but with less modify the class format(just format not semantec).
8 . At 1:03 AM on Apr 27, 2005, Tim Boudreau DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: How does this affect startup speed?

As a transmission format, this starts to make sense to me. As a local storage format, decompression of class data on the fly is pretty grotesque - even just jar compression is grotesque - because it adds an element of indeterminacy - heavy classloading can cause an app to seem unresponsive or behave unpredictably, performance-wise, which is why it's almost always best to create jars with no compression at all.

In a perfect world, loading a class would be memory mapping a segment of data from an archive and using that data in place. NIO can't quite do that yet (you still do memory copies to actually access the data), but it's close.
Tim Boudreau
NetBeans.org
Evangelist/Senior Staff Engineer, Sun Microsystems
9 . At 3:21 AM on Apr 27, 2005, Karsten Lentzsch wrote:
  Click to reply to this thread Reply

Re: LZMA can beat gzip and bzip2

The Java 5 core doesn't contain an LZMA decompressor. However, there's a Java SDK for LZMA compression/decompression, see the reference to the SDK given in my previous post.

- Karsten
10 . At 9:36 AM on Apr 27, 2005, Cristiano Sadun wrote:
  Click to reply to this thread Reply

Re: Pack200: reduce JAR file size by 56.18%

Just by the way, another - complementary - way for reducing jar size is doing static analysis. It has its drawbacks and may require knowledge of the application and if it does dynamic classloading, but for certain apps it's well worth the effort.

See http://sadun-util.sourceforge.net/pack.html
11 . At 11:18 AM on Apr 27, 2005, Kevin Riff DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: How does this affect startup speed?

> As a transmission format, this starts to make sense
> to me. As a local storage format, decompression of
> class data on the fly is pretty grotesque - even just
> jar compression is grotesque - because it adds an
> element of indeterminacy - heavy classloading can
> cause an app to seem unresponsive or behave
> unpredictably, performance-wise, which is why it's
> almost always best to create jars with no compression
> at all.

Yep. We had one client that was *very* concerned with start-up time. Profiling showed that a significant portion of the time was spent just doing IO in the classloaders. So we told them to make sure that any classes that need to be loaded during start-up MUST be uncompressed. That was a hard sell cuz in the J2ME world, footprint is a *very* important concern too. But profiling the app showed that loading the uncompressed class files took less than half as long.

> In a perfect world, loading a class would be memory
> mapping a segment of data from an archive and using
> that data in place. NIO can't quite do that yet (you
> still do memory copies to actually access the data),
> but it's close.

Yep, which is why a number of J2ME VMs support a "ROMized" form of a class/JAR file that is directly executable. Of course, the in-memory layout of the class data is very much dependent on the VM, so this kind of thing will always be proprietary.
J2ME programmers count bytes the way a super-model counts calories.
12 . At 11:26 AM on Apr 27, 2005, Kevin Riff DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Checkout ProGuard

ProGuard does all of that and more. It analyzes your class files to locate unused fields and methods, and removes them. By default, it also obfuscates the names, replacing them with one or two character names which further reduces the class size, but you can turn that feature off if you prefer.
J2ME programmers count bytes the way a super-model counts calories.
13 . At 11:32 AM on Apr 27, 2005, Kevin Riff DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Pack200: reduce JAR file size by 56.18%

That's precisely what the Pack200 format does. It removes *all* duplication of any kind, including duplicate constant pool entries. Constant pool entries are also grouped by type, and sorted to improve their compressability.

If you're interested, you should lookup JSR 200 (which defines the Pack200 format). It includes references to previous experiments to reduce classfile size. As I recall, one of the earlier formats was essentially a JAR file with a common constant pool for all classes, just as you described.
J2ME programmers count bytes the way a super-model counts calories.
14 . At 1:42 PM on Apr 27, 2005, Guillaume Desnoix DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: LZMA can beat gzip and bzip2

Hi, I unfortunately can not post the jar file. I should have choosen one I could. Here is the results with 7zip:

3364097 appli-std.jar 100,00 0,00
2683596 appli-std.pack 79,77 20,23
1474374 appli-std.pack.gz 43,82 56,18
1297993 appli-std.pack.7z 38,58 61,42

7zip provides an additional 5%. Excellent!
Regards, Guillaume
JDistro (shared runtime and swing desktop) -- J NLP (application catalog) -- Alma (source code tool) -- Slaf (swing look and feel) -- Pixels Loupanthère

thread.rss_message