NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed. (sponsored)
In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.
Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.
Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
I've written multiple articles here at Javalobby on the Hibernate O/R mapping tool, and
they are usually met with a large degree of popularity.
Hibernate isn't just a popular kid on the block, however; it is actually a very powerful, consistent,
and reliable database mapping tool. Mapping between objects in Java to relational
databases has many facets that you must be aware of. Hibernate does a particularly good
job of making the process simple to start, and providing the facilities to allow it to
scale well and meet exceedingly complex mapping demands.
One of the primary concerns of mappings between a database and your Java application is
performance. One of the common concerns of people who haven't spent much time working with
Hibernate in particular, is that O/R mapping tools will limit your ability to make
performance-enhancing changes to particular queries and retrievals. Today I want to discuss
two facets of the Hibernate infrastructure that are implemented to handle certain performance
concerns - the second-level cache and the query cache.
The Second Level Cache
The second-level cache is called 'second-level' because there is already a cache operating for you
in Hibernate for the duration you have a session open. From the Hibernate documentation:
A Hibernate Session is a transaction-level cache of persistent data. It is possible to configure a cluster or JVM-level (SessionFactory-level) cache on a class-by-class and collection-by-collection basis. You may even plug in a clustered cache. Be careful. Caches are never aware of changes made to the persistent store by another application (though they may be configured to regularly expire cached data).
As implied above, this 'second-level' cache exists as long as the session factory is alive. The second-level cache
holds on to the 'data' for all properties and associations (and collections if requested) for individual entities that are marked to be cached.
I'm not here to re-hash the details provided clearly by the Hibernate documentation, so for detailed information about selecting a
cache provider and configuring Hibernate to use it, look at
Section 20.2 of the Hibernate documentation
.
Suffice it to say, the important part is the 'cache' element which you add to your mapping file:
For most readers, what I'm describing here is probably nothing new. Bear with me, I'll get to the
goodies in a minute.
The Query Cache
The other cache that is available is the query cache. The query cache effectively holds on to the identifiers for an
individual query. As described in the documentation:
Note that the query cache does not cache the state of the actual entities in the result set; it caches only identifier values and results of value type. So the query cache should always be used in conjunction with the second-level cache.
Once enabled via the configuration of Hibernate, it is simply a matter of calling
setCacheable(true)
on your
Query
or
Criteria
object.
Now, on to the inner workings.
How The Second-Level Cache Works
One of the keys to understanding how the caches can help you is to understand how they work internally (at least conceputally).
The second-level cache is typically the more important to understand, particularly when dealing with large, complex object graphs
that may be queried and loaded often. The first thing to realize about the second-level cache is that it doesn't cache instances of the
object type being cached; instead it caches the individual values for the properties of that object. So, conceptually, for an object like this:
publicclass Person {
private Person parent;
private Set<Person> children;
publicvoid setParent(Person p) { parent = p; }
publicvoid setChildren(Set<Person> set) { children = set; }
public Set<Person> getChildren() { return children; }
public Person getParent() { return parent; }
}
So, in this case, Hibernate is holding on to 3 strings, and one serializable
identifier for the 'many-to-one' parent relationship. Let me reiterate that
Hibernate is *not* holding on to actual instances of the objects. Why is this
important? Two reasons. One, Hibernate doesn't have to worry that client code (i.e. your code)
will manipulate the objects in a way that will disrupt the cache, and two, the
relationships and associations do not become 'stale', and are easy to keep up to date
as they are simply identifiers. The cache is not a tree of objects, and can instead just
be a conceptual map of arrays. I continually use the word conceptual, because Hibernate
does much more behind the scenes with this cache but, unless you plan on implementing your own provider,
you don't need to worry about it. Hibernate calls this state that the objects are in 'dehydrated', as it
is all the important bits of the object, but in a more controllable form for Hibernate. I'll use the terms
'dehydrate' and 'hydrate' to describe the process of converting between this broken apart data into an object
of your domain (hydrating being the process of creating an instance of your domain and populating it with
this data, dehydrating being the inverse).
Now, the astute of you may have noticed that I have omitted the 'children' association all-together
from the cache. This was intentional. Hibernate adds the granularity to allow you to decide which
associations should be cached, and which associations should be re-determined during the hydration of
an object of the cached type from the second-level cache. This provides control for when associations
can potentially be altered by another class that doesn't explicitly cascade the change to the cached class.
This is a very powerful construct to have. The default setting is to not cache associations; and if you are not
aware of this, and simply turn on caching quickly without really reading into how caching works in Hibernate,
you will add the overhead of managing the cache without adding much of the benefits. After all, the primary
benefit of caching is to have complex associations available without having to do subsequent database selects - as
I have mentioned before, n+1 database queries can quickly become a serious performance bottleneck.
In this case
we're dealing just with our person class, and we know that the association will be managed properly -
so let's update our mapping to reflect that we want the 'children' association to be cached.
Once again let me point out that all we are caching is the ID of the relative parts.
So, if we were to load the person with ID '1' from the database (ignoring any joining for the time being) without the cache, we
would have these selects issued:
select * from Person where id=1 ; load the person with id 1
select * from Person where parent_id=1 ; load the children of 1 (will return 2, 3)
select * from Person where parent_id=2 ; load any potential children of 2 (will return none)
select * from Person where parent_id=3 ; load any potential children of 3 (will return none)
*With* the cache (assuming it was fully loaded), a direct load will actually issue
NO
select statements,
because it can simply look up based on an identifier. If, however we didn't have the associations cached, we'd have
these SQL statements invoked for us:
select * from Person where parent_id=1 ; load the children of 1 (will return 2, 3)
select * from Person where parent_id=2 ; load any potential children of 2 (will return none)
select * from Person where parent_id=3 ; load any potential children of 3 (will return none)
That's nearly the same number of SQL statements as when we didn't use the cache at all! This is why it's important to cache associations whenever
possible.
Let's say that we wanted to lookup entries based on a more complex query than directly by ID, such as by name. In this
case, Hibernate must still issue an SQL statement to get the base data-set for the query. So, for instance, this code:
Query query = session.createQuery("from Person as p where p.firstName=?");
query.setString(0, "John");
List l = query.list();
... would invoke a single select (assuming our associations were cached).
select * from Person where firstName='John'
This single select will then return '1', and then the cache will be used for all other lookups as we have everything cached.
This mandatory single select is where the query cache comes in.
How the Query Cache Works
The query cache is actually much like the association caching described above; it's really little more than a list of
identifiers for a particular type (although again, it is much more complex internally). Let's say we performed a query like this:
Query query = session.createQuery("from Person as p where p.parent.id=? and p.firstName=?");
query.setInt(0, Integer.valueOf(1));
query.setString(1, "Joey");
query.setCacheable(true);
List l = query.list();
The query cache works something like this:
*----------------------------------------------------------------------------------------*
| Query Cache |
|----------------------------------------------------------------------------------------|
| ["from Person as p where p.parent.id=? and p.firstName=?", [ 1 , "Joey"] ] -> [ 2 ] ] |
*----------------------------------------------------------------------------------------*
The combination of the query and the values provided as parameters to that query is used as a key, and the value
is the list of identifiers for that query. Note that this becomes more complex from an internal perspective as you
begin to consider that a query can have an effect of altering associations to objects returned that query; not to mention
the fact that a query may not return whole objects, but may in fact only return scalar values (when you have supplied a select
clause for instance). That being said, this is a sound and reliable way to think of the query cache conceptually.
If the second-level cache is enabled (which it should be for objects returned via a query cache), then the object of type Person with
an id of 2 will be pulled from the cache (if available in the cache), it will then be hydradated, as well as any associations.
I hope that this quick glance into how Hibernate holds on to values gives you some idea as to how information can be cached by Hibernate,
and will give you some insight into how you can manage your mappings so that you can squeeze the maximum performance possible out of your
application.
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
I have a few questions about your article:
1) if you want to use query cache then independently do you have to add
entries on the object that will be returned from the query.
2) If the object that is returned from query has enabled filter on set how does that factor into query cache and object cache.
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
If the Query Cache caches the results of ad-hoc queries, is Hibernate smart enough to know when those results are invalidated and flush them from its cache? In other words, in the example above, let's say that the object with ID "2" which currently has its name set to "Joey" has the name changed to "Joseph". The second level cache gets updated automatically, but what about the query cache? The original query should no longer return the object with ID "2". How is the query cache notified of this?
Thanks, this article is a pretty big help in evaluating how Hibernate works and whether it meets the needs of the project I'm working on!
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
I have been testing JBoss Cache with Hibernate over the last few days.
# load will load the object from the cache if it exists and the database otherwise. If it loads the object from the database, it populates the cache.
# createQuery with setCacheable=false. This will get every object and populate the cache. If an object was already in the cache, it will use that. So, if you specify "from foo" with an empty cache, and there are 10 foo objects, it will first grab all 10 objects from the database, and populate the cache with them. In the meantime, if foo#2 is changed in the database, but not the cache, "from foo" will get the cached version. If foo#2 is eliminated from the cache, "from foo" will get 1 and 3-10 from the cache, but will get foo#2 from the database and repopulate the cache.
# createQuery with setCacheable=true. This is basically the same as createQuery with setCacheable=false, with a few differences. First, an extra entry will be made into the query cache. Second, if you issued a query with "from foo", and then manually removed foo#2 from the cache, and then changed foo#2, and then redid the query, "from foo", would it still hold the old foo#2? That is what I expected, but no! It uses the query results from "from foo", but, those only have the keys. A lookup is performed on each key. Since the key does not exist anymore, it goes back to the database and repopulates the cache.
# If one issues a createQuery with "FROM foo WHERE name=?", and the name changes, there will be 2 queries in the query cache, and both will point to the same id. The query cache with the original name is not automatically notified. This means, I think, one must manually do this and organize one's code well, or understand that the query cache should only be used for an acceptable time period whereby it can be slightly incorrect. For example, a report that has to be up to date for the last 10 minutes. Or, one can live with only having the objects cached but not the query cache itself...it is hinted at that caching with the query cache is a bit rare.
# createSQLQuery with a partial list of mapped columns fails (as it should), so that means that if you use custom sql to populate something with addEntity, it will always be fully cached.
# createQuery with a left join fetch adds everything to the cache. This means that explict set caching will probably not be necessary, which is good. Eager fetching with the left join is naturally synchronized with what goes into the cache.
# When using createSQLQuery, one must use an explicit .addScalar or .addEntity if setCacheable is true, or hibernate gets confused on casting.
# To set an expiration time on custom sql which has no class mapping, use setCacheRegion on createSQLQuery, and then set that in the eviction policy.
# If one uses the eviction policy code, the /default must be set
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
It is amazing to me that people actually use this stuff. ORM adds so much complexity it is unbelievable. In my opinion, it is cheaper to just buy a bigger database than to pay specialized developers to develop/maintain this kind of stuff.
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
> In my opinion, it is cheaper to just
> buy a bigger database than to pay specialized
> developers to develop/maintain this kind of stuff.
It's not clear to me how buying a bigger database would solve the issues that ORM solves (simplifying data access and abstracting that access from the database, for starters.)
As far as optimization (of anything) goes--its rarely easy, but its probably simpler with Hibernate (tweak a few config files) than with, say, a hand rolled JDBC framework (write your own caching and then hook it into your framework everywhere that it's needed--good luck getting that right.)
Hibernate: Truly Understanding the Second-Level and Query Caches
At 1:22 AM on Sep 26, 2005, R.J. Lorimer wrote:
Fresh Jobs for Developers Post a job opportunity
I've written multiple articles here at Javalobby on the Hibernate O/R mapping tool, and they are usually met with a large degree of popularity. Hibernate isn't just a popular kid on the block, however; it is actually a very powerful, consistent, and reliable database mapping tool. Mapping between objects in Java to relational databases has many facets that you must be aware of. Hibernate does a particularly good job of making the process simple to start, and providing the facilities to allow it to scale well and meet exceedingly complex mapping demands.
One of the primary concerns of mappings between a database and your Java application is performance. One of the common concerns of people who haven't spent much time working with Hibernate in particular, is that O/R mapping tools will limit your ability to make performance-enhancing changes to particular queries and retrievals. Today I want to discuss two facets of the Hibernate infrastructure that are implemented to handle certain performance concerns - the second-level cache and the query cache.
The Second Level Cache
The second-level cache is called 'second-level' because there is already a cache operating for you in Hibernate for the duration you have a session open. From the Hibernate documentation:
As implied above, this 'second-level' cache exists as long as the session factory is alive. The second-level cache holds on to the 'data' for all properties and associations (and collections if requested) for individual entities that are marked to be cached.
I'm not here to re-hash the details provided clearly by the Hibernate documentation, so for detailed information about selecting a cache provider and configuring Hibernate to use it, look at Section 20.2 of the Hibernate documentation .
Suffice it to say, the important part is the 'cache' element which you add to your mapping file:
For most readers, what I'm describing here is probably nothing new. Bear with me, I'll get to the goodies in a minute.
The Query Cache
The other cache that is available is the query cache. The query cache effectively holds on to the identifiers for an individual query. As described in the documentation:
Configuration of the query cache is described in Section 20.4 of the Hibernate Documentation .
Once enabled via the configuration of Hibernate, it is simply a matter of calling
setCacheable(true)on yourQueryorCriteriaobject.Now, on to the inner workings.
How The Second-Level Cache Works
One of the keys to understanding how the caches can help you is to understand how they work internally (at least conceputally). The second-level cache is typically the more important to understand, particularly when dealing with large, complex object graphs that may be queried and loaded often. The first thing to realize about the second-level cache is that it doesn't cache instances of the object type being cached; instead it caches the individual values for the properties of that object. So, conceptually, for an object like this:
public class Person { private Person parent; private Set<Person> children; public void setParent(Person p) { parent = p; } public void setChildren(Set<Person> set) { children = set; } public Set<Person> getChildren() { return children; } public Person getParent() { return parent; } }...with a mapping like this:
Hibernate will hold on to records for this class conceptually like this:
So, in this case, Hibernate is holding on to 3 strings, and one serializable identifier for the 'many-to-one' parent relationship. Let me reiterate that Hibernate is *not* holding on to actual instances of the objects. Why is this important? Two reasons. One, Hibernate doesn't have to worry that client code (i.e. your code) will manipulate the objects in a way that will disrupt the cache, and two, the relationships and associations do not become 'stale', and are easy to keep up to date as they are simply identifiers. The cache is not a tree of objects, and can instead just be a conceptual map of arrays. I continually use the word conceptual, because Hibernate does much more behind the scenes with this cache but, unless you plan on implementing your own provider, you don't need to worry about it. Hibernate calls this state that the objects are in 'dehydrated', as it is all the important bits of the object, but in a more controllable form for Hibernate. I'll use the terms 'dehydrate' and 'hydrate' to describe the process of converting between this broken apart data into an object of your domain (hydrating being the process of creating an instance of your domain and populating it with this data, dehydrating being the inverse).
Now, the astute of you may have noticed that I have omitted the 'children' association all-together from the cache. This was intentional. Hibernate adds the granularity to allow you to decide which associations should be cached, and which associations should be re-determined during the hydration of an object of the cached type from the second-level cache. This provides control for when associations can potentially be altered by another class that doesn't explicitly cascade the change to the cached class.
This is a very powerful construct to have. The default setting is to not cache associations; and if you are not aware of this, and simply turn on caching quickly without really reading into how caching works in Hibernate, you will add the overhead of managing the cache without adding much of the benefits. After all, the primary benefit of caching is to have complex associations available without having to do subsequent database selects - as I have mentioned before, n+1 database queries can quickly become a serious performance bottleneck.
In this case we're dealing just with our person class, and we know that the association will be managed properly - so let's update our mapping to reflect that we want the 'children' association to be cached.
Now, here is an updated version of our person data cache:
Once again let me point out that all we are caching is the ID of the relative parts.
So, if we were to load the person with ID '1' from the database (ignoring any joining for the time being) without the cache, we would have these selects issued:
*With* the cache (assuming it was fully loaded), a direct load will actually issue NO select statements, because it can simply look up based on an identifier. If, however we didn't have the associations cached, we'd have these SQL statements invoked for us:
That's nearly the same number of SQL statements as when we didn't use the cache at all! This is why it's important to cache associations whenever possible.
Let's say that we wanted to lookup entries based on a more complex query than directly by ID, such as by name. In this case, Hibernate must still issue an SQL statement to get the base data-set for the query. So, for instance, this code:
Query query = session.createQuery("from Person as p where p.firstName=?"); query.setString(0, "John"); List l = query.list();... would invoke a single select (assuming our associations were cached).
This single select will then return '1', and then the cache will be used for all other lookups as we have everything cached. This mandatory single select is where the query cache comes in.
How the Query Cache Works
The query cache is actually much like the association caching described above; it's really little more than a list of identifiers for a particular type (although again, it is much more complex internally). Let's say we performed a query like this:
Query query = session.createQuery("from Person as p where p.parent.id=? and p.firstName=?"); query.setInt(0, Integer.valueOf(1)); query.setString(1, "Joey"); query.setCacheable(true); List l = query.list();The query cache works something like this:
The combination of the query and the values provided as parameters to that query is used as a key, and the value is the list of identifiers for that query. Note that this becomes more complex from an internal perspective as you begin to consider that a query can have an effect of altering associations to objects returned that query; not to mention the fact that a query may not return whole objects, but may in fact only return scalar values (when you have supplied a select clause for instance). That being said, this is a sound and reliable way to think of the query cache conceptually.
If the second-level cache is enabled (which it should be for objects returned via a query cache), then the object of type Person with an id of 2 will be pulled from the cache (if available in the cache), it will then be hydradated, as well as any associations.
I hope that this quick glance into how Hibernate holds on to values gives you some idea as to how information can be cached by Hibernate, and will give you some insight into how you can manage your mappings so that you can squeeze the maximum performance possible out of your application.
Until next time,
R.J. Lorimer
Contributing Editor -rj -at- javalobby.orgAuthor -http://www.coffee-bytes.comSoftware Consultant -http://www.crosslogic.com6 replies so far (
Post your own)
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
I have a few questions about your article:1) if you want to use query cache then independently do you have to add entries on the object that will be returned from the query.
2) If the object that is returned from query has enabled filter on set how does that factor into query cache and object cache.
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
If the Query Cache caches the results of ad-hoc queries, is Hibernate smart enough to know when those results are invalidated and flush them from its cache? In other words, in the example above, let's say that the object with ID "2" which currently has its name set to "Joey" has the name changed to "Joseph". The second level cache gets updated automatically, but what about the query cache? The original query should no longer return the object with ID "2". How is the query cache notified of this?Thanks, this article is a pretty big help in evaluating how Hibernate works and whether it meets the needs of the project I'm working on!
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
I have been testing JBoss Cache with Hibernate over the last few days.# load will load the object from the cache if it exists and the database otherwise. If it loads the object from the database, it populates the cache.
# createQuery with setCacheable=false. This will get every object and populate the cache. If an object was already in the cache, it will use that. So, if you specify "from foo" with an empty cache, and there are 10 foo objects, it will first grab all 10 objects from the database, and populate the cache with them. In the meantime, if foo#2 is changed in the database, but not the cache, "from foo" will get the cached version. If foo#2 is eliminated from the cache, "from foo" will get 1 and 3-10 from the cache, but will get foo#2 from the database and repopulate the cache.
# createQuery with setCacheable=true. This is basically the same as createQuery with setCacheable=false, with a few differences. First, an extra entry will be made into the query cache. Second, if you issued a query with "from foo", and then manually removed foo#2 from the cache, and then changed foo#2, and then redid the query, "from foo", would it still hold the old foo#2? That is what I expected, but no! It uses the query results from "from foo", but, those only have the keys. A lookup is performed on each key. Since the key does not exist anymore, it goes back to the database and repopulates the cache.
# If one issues a createQuery with "FROM foo WHERE name=?", and the name changes, there will be 2 queries in the query cache, and both will point to the same id. The query cache with the original name is not automatically notified. This means, I think, one must manually do this and organize one's code well, or understand that the query cache should only be used for an acceptable time period whereby it can be slightly incorrect. For example, a report that has to be up to date for the last 10 minutes. Or, one can live with only having the objects cached but not the query cache itself...it is hinted at that caching with the query cache is a bit rare.
# createSQLQuery with a partial list of mapped columns fails (as it should), so that means that if you use custom sql to populate something with addEntity, it will always be fully cached.
# createQuery with a left join fetch adds everything to the cache. This means that explict set caching will probably not be necessary, which is good. Eager fetching with the left join is naturally synchronized with what goes into the cache.
# When using createSQLQuery, one must use an explicit .addScalar or .addEntity if setCacheable is true, or hibernate gets confused on casting.
# To set an expiration time on custom sql which has no class mapping, use setCacheRegion on createSQLQuery, and then set that in the eviction policy.
# If one uses the eviction policy code, the /default must be set
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
Very good article, if someone needs some tips in spanish I recomend the following post about Second Level Cache
Bye,
Andrea
Re: Hibernate: Truly Understanding the Second-Level and Query Caches
It is amazing to me that people actually use this stuff. ORM adds so much complexity it is unbelievable. In my opinion, it is cheaper to just buy a bigger database than to pay specialized developers to develop/maintain this kind of stuff.Re: Hibernate: Truly Understanding the Second-Level and Query Caches
> In my opinion, it is cheaper to just> buy a bigger database than to pay specialized
> developers to develop/maintain this kind of stuff.
It's not clear to me how buying a bigger database would solve the issues that ORM solves (simplifying data access and abstracting that access from the database, for starters.)
As far as optimization (of anything) goes--its rarely easy, but its probably simpler with Hibernate (tweak a few config files) than with, say, a hand rolled JDBC framework (write your own caching and then hook it into your framework everywhere that it's needed--good luck getting that right.)