Vendor Endeca is at Forum this year in case you’re thinking about doing the same thing to your OPAC that Emily Lynema and Andrew Pace described in this presentation.
Andrew Pace, head of IT at the North Carolina State University libraries, explained that Endeca enabled them to implement faceted search on their catalog.
Roy Tennant’s statement that the usual current OPAC “should be removed from public view.” NCSU decided to look into some of the “next generation” library search tools they might use to make their library search better. A list he showed included Aquabrowser, WorldCat.org, Georgia PINES, Koha, etc. He demoed a few: Clusty creates faceted-type display on the fly; dice.com uses Endeca for faceted search; Amazon.com has a faceted display; EBSCO databases have Grokker interfaces available.
Existing catalogs are hard to use. They grew out of backoffice processing systems. The vast majority of libraries are living with the OPAC bundled with their ILS. At NCSU, for example, they saw in their search logs a lot of broad topical searches being done which did not retrieve useful results (too many, too irrelevant); even known-item search didn’t support features such as spell correction or relevance ranking. The display of search results also had problems: users could not browse or link from valuable metadata like subject headings, at least not in a reasonable number of steps. Nor could they adequately filter on aspects of the item record like proximity and availability (is it in my branch, checked in?).
What is Endeca? Search and information access technology provider for Circuit City, CompUSA, Lowes, many other e-commerce sites.
Why Endeca for NCSU? Better subject access through facets, better response time, better natural language searching, true browse without any query at all.
Emily Lynema, systems librarian, demonstrated Endeca at NCSU, using general terms like “Java” and “Civil war” and clicking on the facet list at left to filter a search for more specific results. Also, she demonstrated browsing without any search term, and removing applied filters when a search was narrowed down too much.
They created their own customized algorithm in Endeca for relevance ranking. They prefer a heading containing exactly the query as entered, then a phrase match in the record; the phrase in the title is preferred over the phrase somewhere in the contents. They had to refine their algorithm after initial release, based on user responses.
The display includes a facet list at left, removable filters and category browsing at top, search results at right. I find the display cluttered, but I forget what their old display used to look like!
True browse is now available by any of the facets they have set up. The public interface currently only shows browse by LC classification, but they have the option to set up other ways to browse.
Automatic spelling correction means if a user enters “dictionary of organic compunds” and there are less than 5 results, the catalog will automatically also retrieve results for “dictionary of organic compounds.” The system also suggests “Did you mean?” and can do automatic stemming.
SirsiDynix Unicorn ILS and Web2 online catalog are still used. Endeca handles the keyword search, Web2 handles the authority search and the detail page display. They have to export MARC records nightly from the ILS into a format Endeca can use. The Endeca system indexes the data into its internal engine. The libraries’ web site can be completely controlled by NCSU; their web application interacts with the Endeca engine through an API.
Staff resources: implementation team of seven
- 5 IT staff, 1 cataloging librarian, 1 reference librarian
- Functional requirements: 40-60 hours total
- Java-trained IT librarian: full time about 14 weeks
- IT project manager: about 25% time for 20 weeks
Total timeline: about six months
Major decision points:
- What facets should be used (Endeca will help walk through)
- Designing the user interface: eliminate information overload as much as possible while providing enough information to enable navigation
- Toss the old OPAC or integrate it into the interface? They integrated (“Search begins with” box searches the old Web2 authority records)
- What type of relevance ranking algorithm, for author vs. topical vs. title searches
Working with a non-library vendor can have a lot of special challenges:
- Data formats that are library-specific
- Data consistency between ILS and Endeca (due to one-way export from ILS)
- Data issues especially with older cataloging practices still persisting in catalog records
- Search 68%
- Search + navigation (facet refinements after a search) 21%
- Navigation (true browse) 11%
They have a number of other statistics such as navigation types selected (mostly topical — either subject topic, subject genre, or LC classification).
Limited test group of 5 on new catalog and 5 on old catalog. The results showed that in general, completing tasks in the new catalog rated easier and took less time than in the old catalog.
For students, relevance ranking is key; only 13% continue past the first page of search results. Faceted browsing is intuitive. Library jargon continued to confuse students (“keyword anywhere”, e.g.). Users experienced with the old system were suspicious of features in the new one (they expected a simple search box to retrieve completely unusable results).
In general, they have found that the new system does retrieve more relevant results.
Andrew returned to talk about future directions, including:
- Experiment with FRBR
- Integrate the catalog with other search tools (like their website search) through web services
- Enrich the catalog with external web services
- Use Endeca to index local collections
The problem with data silos continues: vendor databases, serials lists, OPAC, etc. What needs to happen in the future is true interoperability among data stores. Our metadata needs to be more visible to other “storefronts” where users go for information. Their Endeca implementation not only creates a better search interface for the OPAC itself, but creates a more interoperable data platform for integrating their OPAC data into other services.
For more info: NCSU Endeca project site will contain the slides from this presentation.