About the Reviewer
Steve Banks has a Bachelor's degree in Computer Engineering. He has over ten years of software development and integration experience, in a variety of industries, including government, health care, banking and process manufacturing, using Java and related technologies. He is currently working as a Senior Consultant with Object Partners, Inc, in the Twin Cities.
Spotlight Features

The Rich Engineering Heritage Behind Dependency Injection

Andrew McVeigh takes us on a tour of the rich heritage behind dependency injection, what it represents, and tells us why its here to stay.

NetBeans 6: Matisse Updates

NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed.

Introduction to Groovy Part 3

In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.

Easier Custom Components with Swing Fuse

Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.

Benchmark Analysis: Guice vs Spring

Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.

Lucene in Action

Author(s): Otis Gospodnetic and Erik Hatcher
Publisher: Manning Publications Co.
PubDate: December 2004
Reviewer: Steve Banks




One Minute Review


Pros
  • Good background on searching, indexing
  • Lots of examples
  • Increasingly complicated theory and examples
Cons
  • Numerous examples interrupt reading flow
  • Constant referencing of previous/future section(s)
    can be distracting

Sections

Intent & Audience

Lucene in Action is the authoritative guide to using Lucene, an open-source IR (Information Retrieval) library. This book guides you through how to index your data, search, sort, filter, and highlight your search results, as well as how to integrate Lucene into your application.

This book is intended for Java developers, both those with an immediate need for searching capabilities, or curious about searching and indexing for future reference. Readers should also be familiar with JUnit, as the code examples are real, working JUnit test cases.

Relevance of material

Lucene in Action definitely lives up to its name. The authors dive right into relevant, if simple, examples, and one could immediately start using Lucene after only reading the first few chapters. Further chapters delve into more complex indexing and searching capabilities.

When the book was written in 2004, Lucene was at version 1.4. At the time of this writing, Lucene is on version 2.0.0, which removed all of the methods deprecated in version 1.9 (compatibility release between versions 1.4.3 and 2.0.0). The core concepts and components remain the same; only the API has been cleaned up.

Chapter highlights


Chapter 1 - Meet Lucene
Background on what Lucene is, where it came from, and what it can do for you. A simple example, illustrating the core components of Lucene, is an excellent jumpstart to using Lucene.

Chapter 2 - Indexing
How indexing works, including performance tuning, index optimization. In-memory indexing (RAMDirectory) is compared to file-based (FSDirectory). The section on concurrency, thread-safety and locking becomes especially important for scenarios involving multiple readers/writers/indexes.

Chapter 3 - Adding search to your application
Covers the most common ways to search using Lucene, including how Lucene scores and ranks search results, and is probably ample material for the majority of applications using Lucene.

Chapter 4 - Analysis
Detailed look at Lucene's analysis process, which converts your data into searchable terms in the index. A variety of analyzers are examined, including "sounds like", synonym, and foreign language analyzers.

Chapter 5 - Advanced search techniques
The more sophisticated searching capabilities of Lucene, including sorting, span queries, filtering, and term vectors. There is also a section on searching across multiple indexes, even remotely located indexes.

Chapter 6 - Extending search
The previous chapters covered the built-in functionality of Lucene - this chapter explores how to extend Lucene, covering customized sorting and filtering, extending QueryParser, implementing your own HitCollector, and tuning query performance.

Chapter 7 - Parsing common document formats
Covers parsing and indexing document types other than plain text - PDF, Microsoft Word, HTML, XML and RTF - using third-party tools to extract text. The chapter ends with an example document-handling framework.

Chapter 8 - Tools and extensions
Various Lucene Sandboxcomponents and third-party tools, covering areas such as analyzers, ant, database, javascript, and WordNet. Nice reference of extensions to help prevent you from "reinventing the wheel".

Chapter 9 - Lucene ports
Discussion of currently available Lucene ports to different programming languages, including C++, C#, Perl, and Python. The LuceneImplementations wiki page lists several others.

Chapter 10 - Case studies
Seven case studies showcasing Lucene in all its glory, illustrating a diverse and interesting range of use. The PoweredBy wiki page lists numerous other examples.

Appendix A: Installing Lucene
Similar to the
"Getting Started" section on Lucene's homepage.

Appendix B: Lucene index format
Detailed discussion of Lucene's index structure.

Appendix C: Resources
List of resources related to searching and indexing, including Doug Cutting's (creator of Lucene) publications.

Rating


Relevance
Readability
Overall

While the material covered in this book is excellent, the delivery occasionally stumbles - the practice of frequently referring to previous and/or future sections, while necessary, can be distracting. The JUnit examples, while providing an excellent balance to the theory, can also interrupt the reading flow. But, those same JUnit test cases also help the book live up to its "In Action" name, as they are real, working examples.

This is definitely the book to have if you're planning on using Lucene in your application, or are interested in what Lucene can do for you.