Forum Controls
Spotlight Features

The Rich Engineering Heritage Behind Dependency Injection

Andrew McVeigh takes us on a tour of the rich heritage behind dependency injection, what it represents, and tells us why its here to stay.

NetBeans 6: Matisse Updates

NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed.

Introduction to Groovy Part 3

In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.

Easier Custom Components with Swing Fuse

Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.

Benchmark Analysis: Guice vs Spring

Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
Replies: 15 - Pages: 2   [ 1 2 | Next ]
  Click to reply to this thread Reply

Search support: Hibernate/Lucene and the Compass framework

URL: Shay Banon's KimchyBlog

At 7:40 AM on Nov 20, 2006, Rick Ross wrote:

No matter what kind of web application you’re writing, you probably need to provide some degree of search support. The powerful, open source Lucene package is one of the most widely used search systems, and the Compass framework leverages Lucene to expose a broad range of search features to your application.

Recently the Hibernate team has begun to enhance Hibernate's Lucene support , offering richer integration between the ORM persistence of Hibernate and the full-text search of Lucene. This has apparently raised some questions about where Compass fits into the picture, so Compass founder Shay Banon has written up a Q&A about how he sees the bigger picture.

It’s an interesting read, and one of the main conclusions seems to be that comparing Compass to the Lucene-enhanced Hibernate is really comparing apples to oranges. Shay seems patient and restrained as he speaks to the questions raised in public list discussions, but it’s evident that the goals and features of Compass are much broader in scope than what Hibernate provides.

What are you using to power search features in your applications? Have you tried using Compass, and what kind of results did you observe? Have you leveraged Lucene by making direct api calls? Is your underlying database engine giving you everything you need? These are big questions that affect all of us, so I think it would be a good idea to hear how people are approaching search functionality in real-world Java apps and beyond.
1 . At 9:39 AM on Nov 20, 2006, Mark N DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

Well, even if one is not writing a web app, search support can be needed/useful.


I have been eyeing including search for domain objects for some time. I've not used it yet in any projects due to scope and time. I do have a project that I am starting up again that will require it, though. So I will get a chance to see which works better.

I have had an opportunity to deal with a 3rd party application that tightly integrates advanced search and the database. It uses SQL Server and SQL Server's Full Text search. For large datasets, it is horrible. It takes way to long to index. Part of the issue is how the application does it. Part of it is SQL Server (2000 --- 2005 might help some). So I would say that integration is good. Tight integration is bad.

I have, however, written an application (with a Browser UI) that allows advanced search of Sharepoint Documents and Metadata (it uses the .Net version of Lucene) and it works very well. It provides features that a database couldn't. Well, not without including a search engine. But as I have said above, that has its limitations.
2 . At 7:25 PM on Nov 20, 2006, Nick Minutello wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

I think there are two questions/issues here. The first obviously has a bearing on the second:

1) What role indexing/searching plays in your app
2) What you use: (Compass, Hibernate or Lucene-directly)

I have used Lucene directly and also have used Compass.
Needless to say, I use Hibernate.
But that said, I've not used Hibernate's Lucene support.

So, getting to the first issue, what role does the indexing/searching play in your app?

Well, for the applications I have worked on (and this also correlates with applications I have worked with - like Jira) the indexing/search plays 2-3 roles:
(In general, the applications I have worked on have had very structured data)

1) Google-like search. The user wants to type in anything they can remember about some entity, and find it.
In the case of a simple Counterparty lookup, it may be the ID/symbol/name of the Counterparty. It may be the name of the City in which the Counterparty has an Office. It may be one of the contact names for Counterparty in Beijing.

Most traditional applications solve this problem by querying on the database - and in order to have decent performance, they offload some of the work to the user. They present the user with a complicated "Advanced Search" form with a dozen fields that they can pass into their SQL query. Sadly there is nothing "advanced" about that search.
Creating a compound "all" field to search on with Lucene is orders of magnitude faster and provides the user with a much nicer interface. More like Google.

2) Filtering. This is useful when you want to specify tight boundaries the criteria and get a list of results - and you want it to be very fast . Jira is a good example of this - where many links off the front page, are links to pre-determined Filters. E.g. Assigned-to-me, Reported-by-me, Resolved-recently. In many ways, this is a form of reporting.

3) Online Reporting. This is similar to filtering, except that you are merely collecting the numbers. To get the numbers, you may use HitCollectors or you may run a number of sequential Lucene searches and get the hit-counts.
In many cases, this provides very targeted "reporting" features which are tightly integrated into the application - and working on the live data (unlike offline reporting).
Jira is a good example of this as well. You can save & run many Filters, you can get 2-dimensional (X vs. Y) reports, you can also get graphs and charts of the results - or trend charts over time.

4) Unstructured Content Search. There is another usage which is common - but I personally have not had call to use - and that is essentially content searching. I.e. performing searches on unstructured data - like text in a wiki page or forum post. A valid usage - but not one I have had great call to use in my job.

So given the above usage scenarios, which of Compass/Hibernate/Lucene-direct would you use?

Option 1: Direct Lucene.
The main issue with using Lucene directly is the amount of code you end up writing keeping your data and index in synch, and the amount of code you write to index/store data in the index and getting the data back out again.
There are also other issues with index transactionality, etc.

Compass / Hibernate Search?
So, as for Compass Vs Hibernate, (and keeping in mind I haven't use Hibernate's Lucene support) the key difference I have picked up on is the way Object's are "re-hydrated". Hibernate stores no data in the Lucene index - it just uses the id's to feed to hibernate to load the results.
In the applications that I have used Lucene for searching, this loses one very big advantage: speed.
Using Compass (and before compass, this is the way we did by hand) we store minimal "summary" data in the index so that we can render the search results without hitting the database at all. We only hit the database when someone wants to view a given item. This means that search/filtering/reporting is very very fast.

Now, you could argue that you could use the Hibernate L2 cache… But to get similar performance, it would mean your search results would have to have 100% hit rate (only way to guarantee that is to have lots of RAM or little data).
Also, reporting (ala Hit Collectors) would result in bringing all your data back - which would not perform well at all.

In applications where I have used Lucene for searching, it also coincided that the database was very slow & non-perfomant. Having a very fast search that didn’t load the database, gave users exactly what they were searching for - and in milliseconds. This allowed us to isolate the database and eventually change & fix it.
Also, getting some 36,000 hits in around 70ms was a big PR-boost for the project - even though they didn’t actually load those 36K results, they got the overwhelming message "this application is fast " - and that’s all they needed to know since all the problems with their existing app stemmed from the fact that it was excruciatingly slow.
Say what you want about smoke and mirrors, but the google-like brag "Results 1 - 10 of about 19,600,000. (0.07 seconds)" won the users over from day 0.


The downside to using Compass, if you are already using Hibernate, is that you have duplicated annotations or mapping files - and another Hibernate-like thing to configure.
In practice, the somewhat duplicated mapping only occurs for a small subset of your total object model - that subset you need to search on.
Also, Compass, at least so far, has been very easy to configure and use.
The similarity in terms of mapping and configuration between the two is both appealing (the concepts & Spring integration are very analogous - hence famliar) - but also it can be a bit galling when you have to make the same property rename in two mapping files. Its been a bit annoying at times, but to be honest, not a big drama in the scheme of things.


Ultimately, both Hibernate and Compass are both moving in the right direction. More applications should make more use of indexing.
It provides user functionality that is difficult to achieve otherwise.

-Nick
3 . At 10:26 AM on Nov 21, 2006, Mark N DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

Nick,
Great post. Helps me realize that I am going in the right direction. And it cements my theory that doing it via the db (db engine calls the search engine) is not the right way.

As I read this, I am thinking that it can be used to solve adhoc querying issues. Thoughts?
4 . At 12:01 PM on Nov 21, 2006, Shay Banon wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

If by ad hoc querying issues you mean another option for querying your data instead of using SQL, then this is something that I have noticed happen many times.

Once search capabilities are introduced into the application, very quickly the query capabilities of the search engine are used not only for a google like text box. As Nick noted, they can be used for reporting, filtering, and so on (similar to what Jira does). Usually they are much faster, and offload the pressure from the database.
5 . At 12:40 PM on Nov 21, 2006, Mark N DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

Most users don't know SQL. But they can use Google. Sort of. I really should have said ad hoc reporting. But that and ad hoc querying go hand in hand.

It the project I did that uses Lucene (.Net), the users like d the "Advanced" search better than the Google Textbox.
6 . At 1:36 PM on Nov 22, 2006, Nick Minutello wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

>> the users like d the "Advanced" search better than the Google Textbox

Hmmmmm :-/

Ya, this is a favourite rant of mine :-)

Its *NOT* Advanced Search.
Its neither Advanced. Nor Search.

Its not Advanced because you are making the user do more work rather than less. Its not Search because typically users are doing reporting rather than trying to find something.

In my experience, the Google-like search can be less popular because
1) It doesnt give them the results they expect (ie the google-search isnt smart enough)
2) They dont know what is possible in the google-like search
3) They cant remember what they have to search for

To address (1) you have to get crafty with indexing, weighting, query parsing and the like to give the user what they naturally expect. It takes some effort squeezing that information out of the users - but the rewards can be well worth it.

To address (2) you have to build in some hints when they are searching. I never got the chance on my last app to do this - but I wanted to dynamically suggest useful terms that would narrow down their search results. We just had static ones.

To address (3) - well, sometimes you cant. You have to keep the search simple - and do (2) - and still be resigned to the fact that some users dont think in a way that lets them make use of it.

As for Reporting (and if that word strikes the fear of BusinessObjects into your heart - choose another metaphor like Explorer, Navigator, Sniffer) Lucene can be great. Allows the user to build their own "reports", dump them to excel - and usually can be delivered much much quicker than any "Reporting Solution". Jira is a good example of how far this can be taken - with reports, portlets & graphs, etc.
7 . At 4:05 PM on Nov 22, 2006, Mark N DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

> >> the users like d the "Advanced" search better than
> the Google Textbox
>
> Hmmmmm :-/
>
> Ya, this is a favourite rant of mine :-)
>
> Its *NOT* Advanced Search.
> Its neither Advanced. Nor Search.
>
Hence my quotes around the word "Advanced". I did that based on your previous comments. Advanced definitely doesn't describe what it does. But one can't call it "Dumbed Down Query Interface". :)
I guess I need to check out Jira to get some ideas.
8 . At 1:31 PM on Nov 29, 2006, Jilles van Gurp DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

I've recently used hibernate-lucene integration. Despite the lack of documentation it is fairly easy to set up. There are some gotchas (like the documentation being slightly inaccurate & incomplete). Otherwise it seems to more or less work.

Essentially how it works is that an event listener is registered on the delete, insert and update events that hibernate uses for processing any data object changes. Additionally you have to annotate the fields you want indexed using various annotations. This essentially tells hibernate how to create a lucene Document instance for your data objects.

This all works fairly well but I had problems scaling it up. The problem with lucene is that while reading the index is threadsafe, writing it isn't. When integrating with a clustered or multi threaded application server, that tends to make things hard because only one thread in the entire cluster can have a IndexWriter instance. An additional problem is that deleting objects is done through the IndexReader class, which is otherwise threadsafe except you can't delete and write at the same time :-/.

During my evaluations I produced several index lock exceptions which clearly indicates that hibernate has not provided a proper solution for this problem yet. Of course their implementation is clearly marked as experimental. But including the functionality with hibernate 3.2 suggest a level of maturity that clearly has not been realized yet. But despite this, it seems interesting enough to keep an eye on in the future.

As for compass, I have 0 experience with it. I'm not interested in yet another persistence framework. It would have to support the JEE 5 annotations and integrate properly (i.e. without much work/configuration) with application servers for me to even spend time on evaluating it.

I also have experience with the lucene API. It is easy to use and it is not that much work to convert construct lucene Document instances out of a javabean. The real issue is when you want transactional semantics to apply. In other words when you delete something from the db, the index should be updated as part of the transaction instead of afterwards. This is hard and I definitely don't really need it. So, I'm currently building a standalone lucene based indexer for my database search needs.

If your object models are simple and you don't need transactional semantics to extend to the index, just use lucene directly. It's not that hard and you can access all features of lucene directly.
9 . At 5:11 PM on Nov 29, 2006, Mark N DeveloperZone Top 100 wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

> As for compass, I have 0 experience with it. I'm not
> interested in yet another persistence framework. It
> would have to support the JEE 5 annotations and
> integrate properly (i.e. without much
> work/configuration) with application servers for me
> to even spend time on evaluating it.

Then you probably should evaluate it. It is more mature, is not another persistance framework, supports Java 5 annotations and is simpler than doing it yourself.
10 . At 9:35 AM on Nov 30, 2006, Emmanuel Bernard wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

> I've recently used hibernate-lucene integration.
> Despite the lack of documentation

I've improved it, wait for the next release :-)

> documentation being slightly inaccurate &
> incomplete). Otherwise it seems to more or less

Please open some JIRA issues so that I fix it

> This all works fairly well but I had problems scaling
> it up. The problem with lucene is that while reading
> the index is threadsafe, writing it isn't. When
> integrating with a clustered or multi threaded
> application server, that tends to make things hard
> because only one thread in the entire cluster can
> have a IndexWriter instance. An additional problem is
> that deleting objects is done through the IndexReader
> class, which is otherwise threadsafe except you can't
> delete and write at the same time :-/.
>
> During my evaluations I produced several index lock
> exceptions which clearly indicates that hibernate has
> not provided a proper solution for this problem yet.

The version currently in SVN head (working on a release right now), does reduce the window of index writing and increase its speed by batching per transaction.
As for a clustered environment, I opened the design to support fairly easily a centralization of the writing process

> Of course their implementation is clearly marked as
> experimental. But including the functionality with
> hibernate 3.2 suggest a level of maturity that
> clearly has not been realized yet. But despite this,
> it seems interesting enough to keep an eye on in the
> future.

That's a drawback we are facing. We have a mature core and a mature annotations (JPA) implementation. It includes the Lucene integration, which is not yet mature. So the version number reflect the global project and we mark features not yet mature, as experimental :-)

Thanks for you feedbacks
11 . At 9:43 AM on Nov 30, 2006, Emmanuel Bernard wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

Also, not from this post, but object rehydratation is not as bad as it sounds when you think of session level cache + second level cache + batch-size, the number of DB roundtrips shouldn't be that bad, and if your DB is local (ie like your index), there is no reason on earth why a Lucene Document access (ie access to the files containing data - not index) would be slower than a DB access
12 . At 4:03 PM on Dec 1, 2006, Shay Banon wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

> Also, not from this post, but object rehydratation is
> not as bad as it sounds when you think of session
> level cache + second level cache + batch-size, the
> number of DB roundtrips shouldn't be that bad, and if
> your DB is local (ie like your index), there is no
> reason on earth why a Lucene Document access (ie
> access to the files containing data - not index)
> would be slower than a DB access

Of course, when you perform the search, you actually do have to get the Lucene Document in order to at least get the ids from it in order to fetch the data from the database. Since you already hit the index storage, it would be much faster to simply read more data from the index (the same document, just holds more information) then hitting the database. This does not mean that you need to store all your data in the index (as something that can be fetched, not searched), just that you can store enough information in order to show the search results to the user.

Also, using second level cache is not a solution that applies to all applications and depends on your application available ram.
13 . At 5:13 PM on Dec 1, 2006, Emmanuel Bernard wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

This is not what I said. I was comparing a Lucene access to a DB access.

"it would be much faster to simply read more data from the index"
Let's do an academic comparison
1. Lucene indexing and storing the needed data and no DB access
2. Lucene indexing the needed data (no storage) and a DB access (no 2nd level cache)

the index size of 2. will be smaller, the nbr of files representing the index will be lower, hence reducing the number of (and increasing the speed of) files accesses to randomly retrieve the documents (after a query). Of course it depends on what is stored.
OTOH, you need to do a DB access which can be local / remote with the data pre cached in DB memory or not etc etc

Which one is faster, it depends. But it is certainly not obvious.

Even if (worse case scenario) time to process 2. is time to process 1. + time for DB access.
If you use batch-size smartly, we are talking about 1 query for n results. maybe 15 to 20ms overhead? Most system can afford that.

If you want to have more fun, add a second level cache to the loop an it will really depend ;-)
14 . At 1:43 PM on Dec 8, 2006, Nick Minutello wrote:
  Click to reply to this thread Reply

Re: Search support: Hibernate/Lucene and the Compass framework

>> I'm not interested in yet another persistence framework.
Its not a persistance framework.
Its an indexing/searching framework.

It deals with the hibernate synchronisation & transaction/threading problems you encountered.

Also, while the JavaBean to/from Document code is pretty simple, once your domain model gets complicated enough, and your search requirements complicated enough, that code becomes rather repetative. Ideal candidate for a declarative approach.

>>It would have to support the JEE 5 annotations
There are JEE annotations for indexing?

thread.rss_message