NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed. (sponsored)
In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.
Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.
Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
Replies:
52 -
Pages:
4
[
1234
| Next
]
Threads:
[
Previous
|
Next
]
I worked late this long week-end to improve the implementation of Javolution high-performance
FastMap
. But finally, I believe that I got it! The "Swiss Knife/Universal /Holy Grail" map for developers! But judge for yourself:
Performance
on par with HashMap; but LinkedHashMap behavior (keep insertion order)
Thread-safe without synchronization when marked shared (including map collection's iterations). Unlike ConcurrentHashMap access never blocks.
Predictable response time (real-time), no large resize/rehash performed ever.
Support for custom key/value comparators (e.g. custom hashcode implementations).
Support for custom entries implementation (e.g. to implement efficient caching, the map's entries may include a time-stamp).
RTSJ-Safe, FastMap can be allocated in ImmortalMemory (static instances) without causing memory leak when entries are removed (unlike standard maps)
Utilities to print statistic, provide an unmodifiable view, etc...
What do you think? Is it the map "Holy Grail" ?
Note for the technical savvy: The new implementation uses a different approach for collision. Basically all entries are stored in the table, if there is a collision the entry is placed in the next available slot. The number of entries is always less than half the size of the table (creating enough holes). Large maps are recursively broken into smaller maps to maintain time predictability.
With regard to "get(key)" access HashMap, LinkedHashMap and FastMap do about the same thing. Basically, they are indexing a table based upon the key hash code. The first difference is that FastMap will not rehash the hash code unless it can determine that the hashcodes are not evenly distributed. For example, the default hash code for an object using the Sun VM is evenly distributed; on the other hand IBM VM uses the memory address for default hashcode (or something like that) which is far from optimal. A test is performed at startup to determine if the platform has an evenly distributed default hashcode (if not, rehash is performed). When hashcodes are calculated (e.g. String), they are generally well distributed.
Not having to rehash saves few computational cycles at least on the Sun platform (access to the entry is a direct:
table[hashcode & (table.length - 1)]
).
Memory-wise, what takes the most space are the entries objects (not the tables). This is why FastMap and LinkedHashMap have a larger footprint than HashMap (the entries have two additional fields: next and previous). FastMap entries do not need the extra field to link to the entries in the same slot (otherwise it would be difficult to maintain thread-safety when resizing). Colliding entries are stored on adjacent slots.
Time-wise, allocating large arrays (HashMap/LinkedHashMap) is likely to create memory problems and delays (fragmentation, significant gc delays, etc). Try to allocate arrays greater than 1 million elements and you will see what I am talking about. FastMap does not have this issue because it resizes only up to a certain size then after that, forwards to sub-maps (which themselves resize up to a certain size and so on).
> the default hash code for an object using
> the Sun VM is evenly distributed; on the other hand
> IBM VM uses the memory address for default hashcode
> (or something like that) which is far from optimal.
I think both use the object's address, but the difference in quality should be due to different memory allocation / GC strategies. It seems IBM uses a segregated heap strategy (where all objects of any given size range are grouped), so if you put in the hashtable several keys that are objects of similar size (like it's common in microbenchmarks), they tend to be allocated within a narrow range of memory pages and the variance of default hashcodes suffer.
I wouldn't worry about this anyway, because no decent Java program (remarkably a high-perf or RT program) would hash objects that lack a proper hashCode() implementation. People who do that deserve to suffer the performance cost... there are other costs too, typically the JVM will expand the size of any objects that use the default hashCode(), because the return of this method must be stable while its address is not (due to GC), so if the JVM generates an address-based hashcode, it has to store this hashcode in the object header so future calls will return the same value even if the object moves. Some JVMs can expand the object header dynamically (piggybacking on the copying performed by GC), so you only pay this space cost after hashing the object. But it's an important cost anyway. Objects with custom hashCode() methods will not have that cost, unless they explicitly cache the hashcode in a field (like java.lang.String does, for example).
Hi,
Sometime ago we have been moving our project to custom implementations of maps and set, in our case it was done to reduce memory footprint (getting rid of Entry objects allow us to reduce heap size of application by 20%). Our FlatHashSet places object in next table slot in case of hash collision (but we do not use Entry objects at all). We also considered implementing concurrent version of set using AtomicArray but come to conclusion that correct implementation will have very slow object removal edge cases. So I was very interested to see your implementation.
I've downloaded sources for 5.2.2, but what did I see? Old cursed double locked antipattern.
Basically FastMap synchronizes most write access leaving read access unsynchronized, worse write access synchronizes on whole map even if data is stored using submaps. But I didn't lose hope at that moment, double locking is incomprehensible beast, I still could not see exact scenario there your map will break. So I just wrote a test - a bunch of thread iterating with using objects with bad hash distribution. And this test has failed, on single core, without hyperthreading, plain old Athlon rock. Both synchronized HashMap and ConcurrentHashMap did work fine, but FastMap failed.
(sorry somthing wrong with this forum it has cut massage tail)
Below is test source (maptest.java)
and test results for 3 kinds of map on my Box (Athlon XP (Barton) @ 1.84 GHz)
By the way perfomance results (higher are better)
synchronized HashMap - 11 (parrots)
ConcurrentHashMap - 82 (parrots)
synchronized HashMap - 25 (parrots)
(But it is not perfomance test but correctness test, so I've added benchmark just not to get too bored)
I still hope I just did something wrong, and looking for comments
I just looked at your Java source and it seems that you call setShared(false); which makes it non-threadsafe. You print out that you set it to true, but the code you uploaded shows it set to false.
I know that you are waiting for it, so here comes my "let's kill the javolution" benchmark.
I had prepared it in the way to penalize the FastMap really hard. I fully accept that it might be good in normal usage, but I just want to show it is not 'universal' - there are certain use cases (however pathological they might be) where it is non-predicatable and considerably slower then jdk HashMap.
Non-predactibility means that depending on the order in which I remove the values from map(which should hash perfectly afaik), same operation can take around 5 times longer or shorter. While you have managed to remove rehashing part out of the equation, performance (at least removal) can be severely affected by the order in which values are processed.
Same benchmark is repeated for jdk HashMap and as expected it shows same numbers for both orders of removal. As an extra, it is around 5 times faster for FastMap-friendly case and 25 times faster FastMap-unfriendly case.
There is nothing wrong with your implementation - just collision resolution you are using is by very definition removal-unfriendly plus can show very different performance depending on the order of the operations.
My results are around
EASY 90 ms
HARD 380 ms
JDKEASY 17 ms
JDKHARD 17 ms
import java.util.HashMap;
import javolution.util.FastMap;
publicclass Test {
staticint SIZE = 200000;
static Object value = new Object();
static Integer[] keys;
publicstaticvoid main(String[] args) {
keys = new Integer[SIZE];
for ( int i =0; i < SIZE; i++ ) {
keys[i] = new Integer(i);
}
for ( int i =0; i < 10; i++ ) {
test();
testJDK();
}
}
publicstaticvoid test() {
long start,end;
FastMap map = new FastMap(SIZE*2 + 1);
// heat up the map
for ( int i =0; i < SIZE; i++ ) {
map.put(keys[i], value);
}
map.clear();
for ( int i =SIZE-1; i >=0; i-- ) {
map.put(keys[i], value);
}
start = System.nanoTime();
for ( int i =SIZE-1; i >=0; i-- ) {
map.remove(keys[i]);
}
end = System.nanoTime();
System.out.println("EASY " + (end-start)/1000000.0 + "ms");
map.clear();
for ( int i =0; i < SIZE; i++ ) {
map.put(keys[i], value);
}
start = System.nanoTime();
for ( int i =0; i < SIZE; i++ ) {
map.remove(keys[i]);
}
end = System.nanoTime();
System.out.println("HARD " + (end-start)/1000000.0 + "ms");
}
publicstaticvoid testJDK() {
HashMap map = new HashMap(SIZE*2 + 1);
// heat up the map
for ( int i =0; i < SIZE; i++ ) {
map.put(keys[i], value);
}
map.clear();
for ( int i =SIZE-1; i >=0; i-- ) {
map.put(keys[i], value);
}
long start = System.nanoTime();
for ( int i =SIZE-1; i >=0; i-- ) {
map.remove(keys[i]);
}
long end = System.nanoTime();
System.out.println("JDKEASY " + (end-start)/1000000.0 + "ms");
map.clear();
for ( int i =0; i < SIZE; i++ ) {
map.put(keys[i], value);
}
start = System.nanoTime();
for ( int i =0; i < SIZE; i++ ) {
map.remove(keys[i]);
}
end = System.nanoTime();
System.out.println("JDKHARD " + (end-start)/1000000.0 + "ms");
}
}
This is the first version and there is plenty of room for improvement. I am also very attentive to any problem detected with regards to thread-safety (I believe it is thread-safe but I might have miss something).
Object removal is indeed far from optimal. The current implementation is "getting" the entry then "removing" the entry (going back to sub-maps) when it could do both at once. There is almost a 2x improvement right there.
Also, I did not tune any of the size parameters, for example having a table only twice the number of entries might not be enough ?!
I am very open to any code improvement, do not hesitate to send me your suggestions (or patches).
Jean-Marie Dautelle - Marlboro, MA
-- Javolution: Everything should be made as simple as possible...
-- JScience: But not simpler!
> because now the _isShared boolean is not volatile or
> access in a synchronized lock.
>
> So setting it to true at one point then it still
> could be that other threads don't see that
> immediately..
Good point! We will move it to constructor.
Jean-Marie Dautelle - Marlboro, MA
-- Javolution: Everything should be made as simple as possible...
-- JScience: But not simpler!
Just fix it and run for yourself.
My bad, I was checking if setShared() changes any thing, and posted wrong source. And it does with setShared(true) it breaks sometimes but with setShared(false) it almost not working at all.
> Just fix it and run for yourself.
> My bad, I was checking if setShared() changes any
> thing, and posted wrong source. And it does with
> setShared(true) it breaks sometimes but with
> setShared(false) it almost not working at all.
If shared is false, it is not expected to work.
Now if shared is true and it breaks that is more interesting. Any idea of where the problem might be?
Jean-Marie Dautelle - Marlboro, MA
-- Javolution: Everything should be made as simple as possible...
-- JScience: But not simpler!
Do not trust me, run test for yourself, check if it is not broken someway (but other maps pass it, so I think it is OK). I would be happy to be proven wrong.
The problem IMHO is double locking. You are trying to avoid synchronization for reads, it was proven thousand time that it is impossible (at least without using atomic compareAndSet). Also you even do not bother yourself with putting write barriers. Sorry for not providing examples from your code, analyzing concurrent aspect by source code is a really damaging task for brain of mere mortal.
Also I do not really have interest in something that should synchronize for every non-value-replace modification on whole map. Good luck with investigation.
P.S.
Just run test on hyperthreading enabled single core, number of fails of FastMap is much worse, synchronized HashMap and ConcurrentHashMap are still doing job fine. Also I put Doug Lee's pre java 1.5 ConcurrentHashMap, and it also has passed the test.
I found the problem! It is in the pack(int slot)
What happens is that the thread removing an entry might move entries from other threads (to fill the hole). A simple change of:
_entries[slot & mask] = null;
to
_entries[slot & mask] = NULL;
Solves the problem (we fill the hole without moving anyone, except that now I have to test for NULL when looking for a slot available).
I will update Javolution shortly!
Thanks (great test)!
Jean-Marie Dautelle - Marlboro, MA
-- Javolution: Everything should be made as simple as possible...
-- JScience: But not simpler!
Good to hear, but I still very pessimistic that it is possible to implement working double-checked locking in java memory model. I'm eager to put my teeth on fixed version.
Universal Map Implementation
URL: Javolution
At 10:18 AM on Sep 5, 2007, Jean-Marie Dautelle
wrote:
Fresh Jobs for Developers Post a job opportunity
What do you think? Is it the map "Holy Grail"
Note for the technical savvy: The new implementation uses a different approach for collision. Basically all entries are stored in the table, if there is a collision the entry is placed in the next available slot. The number of entries is always less than half the size of the table (creating enough holes). Large maps are recursively broken into smaller maps to maintain time predictability.
With regard to "get(key)" access HashMap, LinkedHashMap and FastMap do about the same thing. Basically, they are indexing a table based upon the key hash code. The first difference is that FastMap will not rehash the hash code unless it can determine that the hashcodes are not evenly distributed. For example, the default hash code for an object using the Sun VM is evenly distributed; on the other hand IBM VM uses the memory address for default hashcode (or something like that) which is far from optimal. A test is performed at startup to determine if the platform has an evenly distributed default hashcode (if not, rehash is performed). When hashcodes are calculated (e.g. String), they are generally well distributed.
Not having to rehash saves few computational cycles at least on the Sun platform (access to the entry is a direct:
table[hashcode & (table.length - 1)]).Memory-wise, what takes the most space are the entries objects (not the tables). This is why FastMap and LinkedHashMap have a larger footprint than HashMap (the entries have two additional fields: next and previous). FastMap entries do not need the extra field to link to the entries in the same slot (otherwise it would be difficult to maintain thread-safety when resizing). Colliding entries are stored on adjacent slots.
Time-wise, allocating large arrays (HashMap/LinkedHashMap) is likely to create memory problems and delays (fragmentation, significant gc delays, etc). Try to allocate arrays greater than 1 million elements and you will see what I am talking about. FastMap does not have this issue because it resizes only up to a certain size then after that, forwards to sub-maps (which themselves resize up to a certain size and so on).
52 replies so far (
Post your own)
Re: Universal Map Implementation
What! Not optionally weak and/or soft references for keys and/or values???Just kidding. Great work.
Cheers,
MiG Java Calendar Component, MiG Layout for Swing/SWT (Vote -> JDK)
Re: Universal Map Implementation
> the default hash code for an object using> the Sun VM is evenly distributed; on the other hand
> IBM VM uses the memory address for default hashcode
> (or something like that) which is far from optimal.
I think both use the object's address, but the difference in quality should be due to different memory allocation / GC strategies. It seems IBM uses a segregated heap strategy (where all objects of any given size range are grouped), so if you put in the hashtable several keys that are objects of similar size (like it's common in microbenchmarks), they tend to be allocated within a narrow range of memory pages and the variance of default hashcodes suffer.
I wouldn't worry about this anyway, because no decent Java program (remarkably a high-perf or RT program) would hash objects that lack a proper hashCode() implementation. People who do that deserve to suffer the performance cost... there are other costs too, typically the JVM will expand the size of any objects that use the default hashCode(), because the return of this method must be stable while its address is not (due to GC), so if the JVM generates an address-based hashcode, it has to store this hashcode in the object header so future calls will return the same value even if the object moves. Some JVMs can expand the object header dynamically (piggybacking on the copying performed by GC), so you only pay this space cost after hashing the object. But it's an important cost anyway. Objects with custom hashCode() methods will not have that cost, unless they explicitly cache the hashcode in a field (like java.lang.String does, for example).
It is just not thread safe
Hi,Sometime ago we have been moving our project to custom implementations of maps and set, in our case it was done to reduce memory footprint (getting rid of Entry objects allow us to reduce heap size of application by 20%). Our FlatHashSet places object in next table slot in case of hash collision (but we do not use Entry objects at all). We also considered implementing concurrent version of set using AtomicArray but come to conclusion that correct implementation will have very slow object removal edge cases. So I was very interested to see your implementation.
I've downloaded sources for 5.2.2, but what did I see? Old cursed double locked antipattern.
Basically FastMap synchronizes most write access leaving read access unsynchronized, worse write access synchronizes on whole map even if data is stored using submaps. But I didn't lose hope at that moment, double locking is incomprehensible beast, I still could not see exact scenario there your map will break. So I just wrote a test - a bunch of thread iterating with using objects with bad hash distribution. And this test has failed, on single core, without hyperthreading, plain old Athlon rock. Both synchronized HashMap and ConcurrentHashMap did work fine, but FastMap failed.
Below is test source
Re: Universal Map Implementation
shouldn't setShared be a constructor property (and then final if possible)because now the _isShared boolean is not volatile or access in a synchronized lock.
So setting it to true at one point then it still could be that other threads don't see that immediately..
CONTINUE: It is just not thread safe
(sorry somthing wrong with this forum it has cut massage tail)Below is test source (maptest.java)
and test results for 3 kinds of map on my Box (Athlon XP (Barton) @ 1.84 GHz)
By the way perfomance results (higher are better)
synchronized HashMap - 11 (parrots)
ConcurrentHashMap - 82 (parrots)
synchronized HashMap - 25 (parrots)
(But it is not perfomance test but correctness test, so I've added benchmark just not to get too bored)
I still hope I just did something wrong, and looking for comments
Re: CONTINUE: It is just not thread safe
I just looked at your Java source and it seems that you call setShared(false); which makes it non-threadsafe. You print out that you set it to true, but the code you uploaded shows it set to false.Removals are slow
Hello Jean-Marie,I know that you are waiting for it, so here comes my "let's kill the javolution" benchmark.
I had prepared it in the way to penalize the FastMap really hard. I fully accept that it might be good in normal usage, but I just want to show it is not 'universal' - there are certain use cases (however pathological they might be) where it is non-predicatable and considerably slower then jdk HashMap.
Non-predactibility means that depending on the order in which I remove the values from map(which should hash perfectly afaik), same operation can take around 5 times longer or shorter. While you have managed to remove rehashing part out of the equation, performance (at least removal) can be severely affected by the order in which values are processed.
Same benchmark is repeated for jdk HashMap and as expected it shows same numbers for both orders of removal. As an extra, it is around 5 times faster for FastMap-friendly case and 25 times faster FastMap-unfriendly case.
There is nothing wrong with your implementation - just collision resolution you are using is by very definition removal-unfriendly plus can show very different performance depending on the order of the operations.
My results are around
EASY 90 ms
HARD 380 ms
JDKEASY 17 ms
JDKHARD 17 ms
import java.util.HashMap; import javolution.util.FastMap; public class Test { static int SIZE = 200000; static Object value = new Object(); static Integer[] keys; public static void main(String[] args) { keys = new Integer[SIZE]; for ( int i =0; i < SIZE; i++ ) { keys[i] = new Integer(i); } for ( int i =0; i < 10; i++ ) { test(); testJDK(); } } public static void test() { long start,end; FastMap map = new FastMap(SIZE*2 + 1); // heat up the map for ( int i =0; i < SIZE; i++ ) { map.put(keys[i], value); } map.clear(); for ( int i =SIZE-1; i >=0; i-- ) { map.put(keys[i], value); } start = System.nanoTime(); for ( int i =SIZE-1; i >=0; i-- ) { map.remove(keys[i]); } end = System.nanoTime(); System.out.println("EASY " + (end-start)/1000000.0 + "ms"); map.clear(); for ( int i =0; i < SIZE; i++ ) { map.put(keys[i], value); } start = System.nanoTime(); for ( int i =0; i < SIZE; i++ ) { map.remove(keys[i]); } end = System.nanoTime(); System.out.println("HARD " + (end-start)/1000000.0 + "ms"); } public static void testJDK() { HashMap map = new HashMap(SIZE*2 + 1); // heat up the map for ( int i =0; i < SIZE; i++ ) { map.put(keys[i], value); } map.clear(); for ( int i =SIZE-1; i >=0; i-- ) { map.put(keys[i], value); } long start = System.nanoTime(); for ( int i =SIZE-1; i >=0; i-- ) { map.remove(keys[i]); } long end = System.nanoTime(); System.out.println("JDKEASY " + (end-start)/1000000.0 + "ms"); map.clear(); for ( int i =0; i < SIZE; i++ ) { map.put(keys[i], value); } start = System.nanoTime(); for ( int i =0; i < SIZE; i++ ) { map.remove(keys[i]); } end = System.nanoTime(); System.out.println("JDKHARD " + (end-start)/1000000.0 + "ms"); } }Re: Removals are slow
This is the first version and there is plenty of room for improvement. I am also very attentive to any problem detected with regards to thread-safety (I believe it is thread-safe but I might have miss something).Object removal is indeed far from optimal. The current implementation is "getting" the entry then "removing" the entry (going back to sub-maps) when it could do both at once. There is almost a 2x improvement right there.
Also, I did not tune any of the size parameters, for example having a table only twice the number of entries might not be enough ?!
I am very open to any code improvement, do not hesitate to send me your suggestions (or patches).
-- Javolution: Everything should be made as simple as possible... -- JScience: But not simpler!
Re: Universal Map Implementation
> because now the _isShared boolean is not volatile or> access in a synchronized lock.
>
> So setting it to true at one point then it still
> could be that other threads don't see that
> immediately..
Good point! We will move it to constructor.
-- Javolution: Everything should be made as simple as possible... -- JScience: But not simpler!
Re: CONTINUE: It is just not thread safe
Just fix it and run for yourself.My bad, I was checking if setShared() changes any thing, and posted wrong source. And it does with setShared(true) it breaks sometimes but with setShared(false) it almost not working at all.
Re: CONTINUE: It is just not thread safe
> Just fix it and run for yourself.> My bad, I was checking if setShared() changes any
> thing, and posted wrong source. And it does with
> setShared(true) it breaks sometimes but with
> setShared(false) it almost not working at all.
If shared is false, it is not expected to work.
Now if shared is true and it breaks that is more interesting. Any idea of where the problem might be?
-- Javolution: Everything should be made as simple as possible... -- JScience: But not simpler!
Re: CONTINUE: It is just not thread safe
Do not trust me, run test for yourself, check if it is not broken someway (but other maps pass it, so I think it is OK). I would be happy to be proven wrong.The problem IMHO is double locking. You are trying to avoid synchronization for reads, it was proven thousand time that it is impossible (at least without using atomic compareAndSet). Also you even do not bother yourself with putting write barriers. Sorry for not providing examples from your code, analyzing concurrent aspect by source code is a really damaging task for brain of mere mortal.
Also I do not really have interest in something that should synchronize for every non-value-replace modification on whole map. Good luck with investigation.
P.S.
Just run test on hyperthreading enabled single core, number of fails of FastMap is much worse, synchronized HashMap and ConcurrentHashMap are still doing job fine. Also I put Doug Lee's pre java 1.5 ConcurrentHashMap, and it also has passed the test.
FOUND IT!
I found the problem! It is in the pack(int slot)What happens is that the thread removing an entry might move entries from other threads (to fill the hole). A simple change of:
to
Solves the problem (we fill the hole without moving anyone, except that now I have to test for NULL when looking for a slot available).
I will update Javolution shortly!
Thanks (great test)!
-- Javolution: Everything should be made as simple as possible... -- JScience: But not simpler!
Re: FOUND IT!
Good to hear, but I still very pessimistic that it is possible to implement working double-checked locking in java memory model. I'm eager to put my teeth on fixed version.