NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed. (sponsored)
In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.
Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.
Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
Replies:
18 -
Pages:
2
[
12
| Next
]
Threads:
[
Previous
|
Next
]
Over optimizing early in the phase of a development cycle is the root of all evil. Wow, what a way to start a tip about optimizing application memory footprints. I simply want to make sure that what is said in this tip is understood to be a potential solution to *
real application problems
* and not a fix-all for sloppy coding. After all, we're all smart, and we know that if we try to optimize before developing a feature, we don't have enough information to optimize properly. Ok, on with the tip.
I must say, I am shamelessly borrowing the information for this tip from Jerome Lanneluc - one of the JDT-Core developers of Eclipse (JDT-Core being the core libraries of the
J
ava
D
evelopment
T
ools). The reason is two-fold. 1.) It presents a real-world example of agressive optimization based on real-world profiling and analysis, and 2.) it's a very interesting (and initially counter-intuitive) tip.
As some of you may know, Eclipse 3.1 has many primary 'goals', and one of those is to improve the scalability and responsiveness of Eclipse. I must say, even if they don't make all of their goals in this arena (due to time constraints), they are certainly putting the right tools and metrics in place to make things right in the near future. I digress, however. My point for all of this is that the Eclipse team is always actively looking at the profiling and performance metrics (that are now part of nightly builds), and are analyzing how they can cut-the-fat, so to speak. With that, Jerome had this to say late last week (on the general Eclipse developers' Mailing List):
I just noticed that String#substring(...) shares the underlying char array.
So if you have a big string, take a substring of it, throw away the big
string, you will still hold on the big char array.
A simple way to solve this is to make a copy of the substring using the
String(String) constructor.
I saved 550KB in JDT Core by changing the following code:
String simpleName = new
String(fullyQualifiedName.substring(fullQualifiedName.lastIndexOf('.')));
Jerome
So, long story short - using the copy constructor of the String class, in some cases, can free up large chunks of memory (in the form of character arrays) for garbage collection.
Analyzing his post, there are some things we should be aware of. First, there is an assumption that the larger string being used initially is being thrown away. If it is not, it is better to use the former example. Second, there is also an assumption that the difference in size of the strings is significant enough to accept the expense of creating an additional String object, creating a new character array, and then copying into it. Keep in mind, JDT core is often going to want to do things like take an entire source-file, and substring out just a portion of a method, or even just a field declaration. That is a pretty significant amount of String to be holding on to if you just want to refer to a field declaration.
So, should you go off changing every place you use substring to use the copy constructor? Obviously not. The point of this post is, careful analysis of the problem (apparently in Eclipse + JDT Core's case, it was memory consumption by Strings) often results in discoveries by you as a developer that you otherwise would not be aware of just by looking at the code. At least, to me, the first example appears to be more efficient than the second. If the source code wasn't documented or I wasn't aware of the situation, I might be prone to revert the performance-fix back to the more standard example, because it seemed unnatural. Why is it important to prefix unnatural changes like this with performance metrics? Because you can never know all of the side effects of your development choices until you run them through some testing and analysis.
Re: Performance: Potentially Save Memory When Using Strings
At work we found that preallocating using StringBuffers of a size that would be big enough to hold the entire string was an "optimization" that was eventually killing our system from a memory perspective.
We had a lot of code that looked like:
StringBuffer sb = new StringBuffer(512);
sb.append("some stuff less, but much less than 512");
return sb.toString();
All of our strings would be 512 long. We found that we should make that return:
return new String(sb.toString());
using the copy constructor as you suggest to truncate the strings to reasonable lengths for any string that was going to be sticking around for the life of the application.
Re: Performance: Potentially Save Memory When Using Strings
Guillermo,
It is, as far as I can tell, an implementation detail - although the documentation seems to at least imply that some copying is occuring - and since the bulk of a String object is the character array, it may be implying that this is the intended behavior (emphasis mine)
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string.
Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
Because they can be assured that the passed in String is immutable (unlike a passed in char[]), they don't explicitly need to copy over the subsequence the String is targeting, so what you say is indeed a valid point.
Re: Performance: Potentially Save Memory When Using Strings
An interesting topic, and I have noticed IDEs such as Visual C++ in the past providing vastly ellaborate combinations of checkboxes and tabbed panes to allow developers the chance to obtain or refute warnings about under-performant development practice - would it be too little to ask of Sun or other Java VM implementors to provide a 'best practice' checklist so that IDE vendors such as Eclipse and Netbeans can provide these warnings via language settings in their products?
Re: Performance: Potentially Save Memory When Using Strings
You have to ask why are the char arrays backing strings shared in the first place? Most strings are not shared. This entails each string holding two int fields, for the start and length, so on a 32-bit CPU, this is 8 bytes wasted per string, on top of what is wasted by holding on to unnecessarily-large char arrays. In fact, there is no reason to allocate two objects per string -- the String instance itself could hold characters, saving even more memory. All in all, Sun's bad design decisions lead to huge waste. How anybody can deny that Java uses too much memory is beyond me.
> You have to ask why are the char arrays backing
> strings shared in the first place?
Because they are needed to provide sharing.
> Most strings are not shared.
That really depends. Long-lasting strings are often not shared (but sometimes they are). Short-lasting strings really gain to share their array. The second case is typical when you are working with a long string (extraction, pattern matching, ...). You can't say one is better than the other, that depends on the context.
> This entails each string holding two int
> fields, for the start and length, so on a 32-bit CPU,
> this is 8 bytes wasted per string, on top of what is
> wasted by holding on to unnecessarily-large char
> arrays.
Yes. OTOH, it is better and faster to have start/length/ref (12 bytes) than to copy a long string of 80 chars (160 bytes). Also in the first case, length() is immediate.
> In fact, there is no reason to allocate two
> objects per string -- the String instance itself
> could hold characters, saving even more memory.
Agreed.
> All in all, Sun's bad design decisions lead to huge
> waste. How anybody can deny that Java uses too much
> memory is beyond me.
Agreed.
String should have been an interface, allowing different implementations depending on the use. Strings should be immutable (very practical) but the way they manage their chars should be hidden.
> The cool thing is you can guess the content of the
> post just by seeing his name in your inbox when you
> receive a notification of reply in a thread ;-)
The good thing is Slava always brings interesting and reasonnable issues leading to good discussions. You may disagree but the remarks of Slava are always backed up by arguments. In this case, String in Java use too much memory (comparing to other languages), can not deal with Unicode (chars above 65536) and can result in memory leaks (small strings referring to huge arrays). These are issues that should be/could have been avoided.
JDistro (shared runtime and swing desktop) -- J NLP (application catalog) -- Alma (source code tool) -- Slaf (swing look and feel) -- PixelsLoupanthère
It was just a nice cutting remark to Slava No harm intended (hence the smiley). I don't alway agree with Slava, but I do with your post. Java is far from being perfect and we all know that. The String issues you are talking about are interesting. Yet, Java does support Unicode 2.0 (but I don't know if it is fully supported).
Having worked a lot with Strings in Jext I can just agree with you two concerning the memory waste. By the way, memory waste issue seems to be one of the flaws that almost anyone find in Java. The simplest Swing application already takes 15 or 20 MB of system memory (I haven't checked for a long time though).
Does anyone knows why Sun did implement String that way? This would be interesting to know.
Re: Performance: Potentially Save Memory When Using Strings
Couldn't this be helped by simply adding the following methods to the API so that the developer can easily decide how the new strings should be created?
So instead of:
String simpleName = new String(fullyQualifiedName.substring(fullQualifiedName.lastIndexOf('.')));
You can use:
String simpleName =
fullyQualifiedName.substring(fullQualifiedName.lastIndexOf('.'), true);
IMHO, it's much easier to understand what's going on in the second line.
Otherwise, unless you've actually looked at the source code for the String and StringBuffer classes (which I've had to do in the past), I don't know how you're even supposed to know when the char arrays are being shared. It's not specified in the Javadocs.
Re: Performance: Potentially Save Memory When Using Strings
James,
Yes, you are right, it is a failing of a.) the API and b.) the documentation.
Strings have always been one of the most 'black magic' parts of Java - I don't think I've ever seen a new Java developer not get confused by all of the various bits of trivia regarding literals, interning, immutability, etc.
In addition, as another reader pointed out, the underlying implementation is Sun JVM specific. That helps to complicate matters.
Unfortunately (as is the example with Eclipse), implementation details like this can make real differences in how external developers can efficiently use the API, so practical need (need for application scalability) overrides the purist in all of us.
> Does anyone knows why Sun did implement String that
> way? This would be interesting to know.
Probably simply for performance.
I look at how we use strings in our J2EE app, and since we don't "hold on" to any of them for any signficant period of time, then the shared structure of a string makes OUR use of strings faster overall.
We can substring et al all day long but at the end of the function, all of the strings vanish anyway, so it doesn't affect us if it's shared.
Also, in many contexts, this behavior will SAVE memory, because it is reusing the underlying char array as much as practical rather than aggressively allocating new arrays, choosing only to create new buffers when entirely new strings are created. The example where you slurp in a big string and hold on to a snippet is simply an edge case in the model that I think most users do not encounter.
But I can certainly say that no matter what Sun did for their implementation, someone would find fault with it because the high level behavior didn't suit their application. They had to make an implementation choice using whatever metrics they were using at the time.
This behavior isn't exposed through the JavaDoc because it's an implementation behavior that doesn't really show through the API, and the Sun Java API is simply that, an API, whereas the Sun JVM is an implementation of that API. We have yet to know whether IBM or JRockit, etc. use the same mechanisms in their String implementations.
It's a good article tho as it does bring to light something that may be necessary to be aware of in some future project. But I'm not going to condemn the design decision because I don't think that for a general purpose String class it's a horribly bad design.
Performance: Potentially Save Memory When Using Strings
At 12:00 AM on Mar 28, 2005, R.J. Lorimer wrote:
Fresh Jobs for Developers Post a job opportunity
Over optimizing early in the phase of a development cycle is the root of all evil. Wow, what a way to start a tip about optimizing application memory footprints. I simply want to make sure that what is said in this tip is understood to be a potential solution to * real application problems * and not a fix-all for sloppy coding. After all, we're all smart, and we know that if we try to optimize before developing a feature, we don't have enough information to optimize properly. Ok, on with the tip.
I must say, I am shamelessly borrowing the information for this tip from Jerome Lanneluc - one of the JDT-Core developers of Eclipse (JDT-Core being the core libraries of the J ava D evelopment T ools). The reason is two-fold. 1.) It presents a real-world example of agressive optimization based on real-world profiling and analysis, and 2.) it's a very interesting (and initially counter-intuitive) tip.
As some of you may know, Eclipse 3.1 has many primary 'goals', and one of those is to improve the scalability and responsiveness of Eclipse. I must say, even if they don't make all of their goals in this arena (due to time constraints), they are certainly putting the right tools and metrics in place to make things right in the near future. I digress, however. My point for all of this is that the Eclipse team is always actively looking at the profiling and performance metrics (that are now part of nightly builds), and are analyzing how they can cut-the-fat, so to speak. With that, Jerome had this to say late last week (on the general Eclipse developers' Mailing List):
So, long story short - using the copy constructor of the String class, in some cases, can free up large chunks of memory (in the form of character arrays) for garbage collection.
Analyzing his post, there are some things we should be aware of. First, there is an assumption that the larger string being used initially is being thrown away. If it is not, it is better to use the former example. Second, there is also an assumption that the difference in size of the strings is significant enough to accept the expense of creating an additional String object, creating a new character array, and then copying into it. Keep in mind, JDT core is often going to want to do things like take an entire source-file, and substring out just a portion of a method, or even just a field declaration. That is a pretty significant amount of String to be holding on to if you just want to refer to a field declaration.
So, should you go off changing every place you use substring to use the copy constructor? Obviously not. The point of this post is, careful analysis of the problem (apparently in Eclipse + JDT Core's case, it was memory consumption by Strings) often results in discoveries by you as a developer that you otherwise would not be aware of just by looking at the code. At least, to me, the first example appears to be more efficient than the second. If the source code wasn't documented or I wasn't aware of the situation, I might be prone to revert the performance-fix back to the more standard example, because it seemed unnatural. Why is it important to prefix unnatural changes like this with performance metrics? Because you can never know all of the side effects of your development choices until you run them through some testing and analysis.
Until next time,
R.J. Lorimer
rj -at- javalobby.org
http://www.coffee-bytes.com
18 replies so far (
Post your own)
Re: Performance: Potentially Save Memory When Using Strings
At work we found that preallocating using StringBuffers of a size that would be big enough to hold the entire string was an "optimization" that was eventually killing our system from a memory perspective.We had a lot of code that looked like:
StringBuffer sb = new StringBuffer(512);
sb.append("some stuff less, but much less than 512");
return sb.toString();
All of our strings would be 512 long. We found that we should make that return:
return new String(sb.toString());
using the copy constructor as you suggest to truncate the strings to reasonable lengths for any string that was going to be sticking around for the life of the application.
Stephen - http://ostermiller.org
Re: Performance: Potentially Save Memory When Using Strings
żIs it guaranteed that the copy constructor will not use the same array?I mean... Is that included in some specification document somewhere, or is this trick implementation-dependant?
Re: Performance: Potentially Save Memory When Using Strings
Guillermo,It is, as far as I can tell, an implementation detail - although the documentation seems to at least imply that some copying is occuring - and since the bulk of a String object is the character array, it may be implying that this is the intended behavior (emphasis mine)
Because they can be assured that the passed in String is immutable (unlike a passed in char[]), they don't explicitly need to copy over the subsequence the String is targeting, so what you say is indeed a valid point.
Regards,
Re: Performance: Potentially Save Memory When Using Strings
An interesting topic, and I have noticed IDEs such as Visual C++ in the past providing vastly ellaborate combinations of checkboxes and tabbed panes to allow developers the chance to obtain or refute warnings about under-performant development practice - would it be too little to ask of Sun or other Java VM implementors to provide a 'best practice' checklist so that IDE vendors such as Eclipse and Netbeans can provide these warnings via language settings in their products?Re: Performance: Potentially Save Memory When Using Strings
I haven't checked myself, but can't tools like PMD (and their associated IDE plugins) do this job?Romain Guy's Java Weblog, #ProgX, Jext
Re: Performance: Potentially Save Memory When Using Strings
You have to ask why are the char arrays backing strings shared in the first place? Most strings are not shared. This entails each string holding two int fields, for the start and length, so on a 32-bit CPU, this is 8 bytes wasted per string, on top of what is wasted by holding on to unnecessarily-large char arrays. In fact, there is no reason to allocate two objects per string -- the String instance itself could hold characters, saving even more memory. All in all, Sun's bad design decisions lead to huge waste. How anybody can deny that Java uses too much memory is beyond me.Re: Performance: Potentially Save Memory When Using Strings
Always look forward to your cheery posts, SlavaRe: Performance: Potentially Save Memory When Using Strings
The cool thing is you can guess the content of the post just by seeing his name in your inbox when you receive a notification of reply in a threadRomain Guy's Java Weblog, #ProgX, Jext
Depends on the use
> You have to ask why are the char arrays backing> strings shared in the first place?
Because they are needed to provide sharing.
> Most strings are not shared.
That really depends. Long-lasting strings are often not shared (but sometimes they are). Short-lasting strings really gain to share their array. The second case is typical when you are working with a long string (extraction, pattern matching, ...). You can't say one is better than the other, that depends on the context.
> This entails each string holding two int
> fields, for the start and length, so on a 32-bit CPU,
> this is 8 bytes wasted per string, on top of what is
> wasted by holding on to unnecessarily-large char
> arrays.
Yes. OTOH, it is better and faster to have start/length/ref (12 bytes) than to copy a long string of 80 chars (160 bytes). Also in the first case, length() is immediate.
> In fact, there is no reason to allocate two
> objects per string -- the String instance itself
> could hold characters, saving even more memory.
Agreed.
> All in all, Sun's bad design decisions lead to huge
> waste. How anybody can deny that Java uses too much
> memory is beyond me.
Agreed.
String should have been an interface, allowing different implementations depending on the use. Strings should be immutable (very practical) but the way they manage their chars should be hidden.
should give you something like (bytecode):
And of course a way to transform a String into an internal String:
The advantages of an interface (refering to an immutable object) are obvious. And saves as much as memory as possible.
Java weakness
> The cool thing is you can guess the content of the> post just by seeing his name in your inbox when you
> receive a notification of reply in a thread ;-)
The good thing is Slava always brings interesting and reasonnable issues leading to good discussions. You may disagree but the remarks of Slava are always backed up by arguments. In this case, String in Java use too much memory (comparing to other languages), can not deal with Unicode (chars above 65536) and can result in memory leaks (small strings referring to huge arrays). These are issues that should be/could have been avoided.
Re: Java weakness
It was just a nice cutting remark to SlavaHaving worked a lot with Strings in Jext I can just agree with you two concerning the memory waste. By the way, memory waste issue seems to be one of the flaws that almost anyone find in Java. The simplest Swing application already takes 15 or 20 MB of system memory (I haven't checked for a long time though).
Does anyone knows why Sun did implement String that way? This would be interesting to know.
Romain Guy's Java Weblog, #ProgX, Jext
Re: Performance: Potentially Save Memory When Using Strings
Couldn't this be helped by simply adding the following methods to the API so that the developer can easily decide how the new strings should be created?String.substring(int start, boolean useNewCharArray)
String.substring(int start, int end, boolean useNewCharArray)
StringBuffer.toString(boolean useNewCharArray)
So instead of:
String simpleName = new String(fullyQualifiedName.substring(fullQualifiedName.lastIndexOf('.')));
You can use:
String simpleName =
fullyQualifiedName.substring(fullQualifiedName.lastIndexOf('.'), true);
IMHO, it's much easier to understand what's going on in the second line.
Otherwise, unless you've actually looked at the source code for the String and StringBuffer classes (which I've had to do in the past), I don't know how you're even supposed to know when the char arrays are being shared. It's not specified in the Javadocs.
Re: Performance: Potentially Save Memory When Using Strings
James,Yes, you are right, it is a failing of a.) the API and b.) the documentation.
Strings have always been one of the most 'black magic' parts of Java - I don't think I've ever seen a new Java developer not get confused by all of the various bits of trivia regarding literals, interning, immutability, etc.
In addition, as another reader pointed out, the underlying implementation is Sun JVM specific. That helps to complicate matters.
Unfortunately (as is the example with Eclipse), implementation details like this can make real differences in how external developers can efficiently use the API, so practical need (need for application scalability) overrides the purist in all of us.
C'est La Vie!
Re: Java weakness
> Does anyone knows why Sun did implement String that> way? This would be interesting to know.
Probably simply for performance.
I look at how we use strings in our J2EE app, and since we don't "hold on" to any of them for any signficant period of time, then the shared structure of a string makes OUR use of strings faster overall.
We can substring et al all day long but at the end of the function, all of the strings vanish anyway, so it doesn't affect us if it's shared.
Also, in many contexts, this behavior will SAVE memory, because it is reusing the underlying char array as much as practical rather than aggressively allocating new arrays, choosing only to create new buffers when entirely new strings are created. The example where you slurp in a big string and hold on to a snippet is simply an edge case in the model that I think most users do not encounter.
But I can certainly say that no matter what Sun did for their implementation, someone would find fault with it because the high level behavior didn't suit their application. They had to make an implementation choice using whatever metrics they were using at the time.
This behavior isn't exposed through the JavaDoc because it's an implementation behavior that doesn't really show through the API, and the Sun Java API is simply that, an API, whereas the Sun JVM is an implementation of that API. We have yet to know whether IBM or JRockit, etc. use the same mechanisms in their String implementations.
It's a good article tho as it does bring to light something that may be necessary to be aware of in some future project. But I'm not going to condemn the design decision because I don't think that for a general purpose String class it's a horribly bad design.