Steve Banks has a Bachelor's degree in Computer Engineering. He has over ten years of software development and integration experience, in a variety of industries, including government, health care, banking and process manufacturing, using Java and related technologies. He is currently working as a Senior Consultant with Object Partners, Inc, in the Twin Cities.
| Author(s): | Otis Gospodnetic and Erik Hatcher |
|---|---|
| Publisher: | Manning Publications Co. |
| PubDate: | December 2004 |
| Reviewer: | Steve Banks |
Lucene in Action is the authoritative guide to using Lucene, an open-source IR (Information Retrieval) library. This book guides you through how to index your data, search, sort, filter, and highlight your search results, as well as how to integrate Lucene into your application.
This book is intended for Java developers, both those with an immediate need for searching capabilities, or curious about searching and indexing for future reference. Readers should also be familiar with JUnit, as the code examples are real, working JUnit test cases.
Lucene in Action definitely lives up to its name. The authors dive right into relevant, if simple, examples, and one could immediately start using Lucene after only reading the first few chapters. Further chapters delve into more complex indexing and searching capabilities.
When the book was written in 2004, Lucene was at version 1.4. At the time of this writing, Lucene is on version 2.0.0, which removed all of the methods deprecated in version 1.9 (compatibility release between versions 1.4.3 and 2.0.0). The core concepts and components remain the same; only the API has been cleaned up.
Chapter 1 - Meet Lucene
Background on what Lucene is, where it came from, and what it can do for you. A simple example, illustrating the core components of Lucene, is an excellent jumpstart to using Lucene.
Chapter 2 - Indexing
How indexing works, including performance tuning, index optimization. In-memory indexing (RAMDirectory) is compared to file-based (FSDirectory). The section on concurrency, thread-safety and locking becomes especially important for scenarios involving multiple readers/writers/indexes.
Chapter 3 - Adding search to your application
Covers the most common ways to search using Lucene, including how Lucene scores and ranks search results, and is probably ample material for the majority of applications using Lucene.
Chapter 4 - Analysis
Detailed look at Lucene's analysis process, which converts your data into searchable terms in the index. A variety of analyzers are examined, including "sounds like", synonym, and foreign language analyzers.
Chapter 5 - Advanced search techniques
The more sophisticated searching capabilities of Lucene, including sorting, span queries, filtering, and term vectors. There is also a section on searching across multiple indexes, even remotely located indexes.
Chapter 6 - Extending search
The previous chapters covered the built-in functionality of Lucene - this chapter explores how to extend Lucene, covering customized sorting and filtering, extending QueryParser, implementing your own HitCollector, and tuning query performance.
Chapter 7 - Parsing common document formats
Covers parsing and indexing document types other than plain text - PDF, Microsoft Word, HTML, XML and RTF - using third-party tools to extract text. The chapter ends with an example document-handling framework.
Chapter 8 - Tools and extensions
Various Lucene Sandboxcomponents and third-party tools, covering areas such as analyzers, ant, database, javascript, and WordNet. Nice reference of extensions to help prevent you from "reinventing the wheel".
Chapter 9 - Lucene ports
Discussion of currently available Lucene ports to different programming languages, including C++, C#, Perl, and Python. The LuceneImplementations wiki page lists several others.
Chapter 10 - Case studies
Seven case studies showcasing Lucene in all its glory, illustrating a diverse and interesting range of use. The PoweredBy wiki page lists numerous other examples.
Appendix A: Installing Lucene
Similar to the "Getting Started" section on Lucene's homepage.
Appendix B: Lucene index format
Detailed discussion of Lucene's index structure.
Appendix C: Resources
List of resources related to searching and indexing, including Doug Cutting's (creator of Lucene) publications.
| Relevance | |
|---|---|
| Readability | |
| Overall | |
While the material covered in this book is excellent, the delivery occasionally stumbles - the practice of frequently referring to previous and/or future sections, while necessary, can be distracting. The JUnit examples, while providing an excellent balance to the theory, can also interrupt the reading flow. But, those same JUnit test cases also help the book live up to its "In Action" name, as they are real, working examples.
This is definitely the book to have if you're planning on using Lucene in your application, or are interested in what Lucene can do for you.