Facet Forward: Faceted Navigation of Federated Search Results for Cultural Heritage Materials

Presenters: Danielle Cunniff Plumer, David Dorman, Mark Phillips.

This session reviewed three different ways or projects that provide faceted searching.

Danielle Plumer – Texas Heritage Digitization Initiative. http://texasheritageonline.org/

The Initiative is a statewide plan that unifies previously created pockets, not a centralized database. They have an OAI harvester (to be described later by Mark Phillips), a real time search, and soon – web search.

There are metadata synchronization issues, item description issues, and differences in how the different systems display information.

So they assign “Collection Level Metadata” to make facets in an institution “profile” that help the user narrow searches and identify the institutions.

David Dorman – Faceted Searching in a Metasearch Environment: The Index Data Experience.

Index Data (http://www.indexdata.dk/) is a 13 year old company that develops and supports open source software.

Talked about a product called MasterKey which searches multi datasets and provides search results with a left hand faceted navigation menu. System does real time queries against databases or web pages. Search results are returned on a single page with records that link out to the actual record/item.

Richness of metadata dictates quality of the faceted search results. Relevance ranking of these results is still to come, as is normalization of unstructured data. With the implementation of standards (he recommends z39.50), there are many opportunities for improved faceted searching.

Mark Phillips – UNT

Took us through using OAI-PMH to provide access to collections that are not z39.50 systems, as part of the Texas Heritage Digitization Initiative.

You have the data, at data-providers, service-providers, a harvester, a repository.

There are 6 verbs that are commands.

Step 1 – Harvester – many available at openarchives.org – anyone can harvest! They settled on OCLC’s one page harvester which they customized. This creates one huge xml file of the data-provider.

Step 2 – Then they turn xml files into searchable data. Use python conversion scripts to send to a solr format for indexing. Create a Dublin Core document.

Step 3 – mod-python based SRU access, CQL conversion to Lucene query.

Challenges:

  • Speed
  • The ability to display thumbnails – not standardized part of Dublin Core
  • Getting repositories to use OAI-PMH – not all of them have it initially enabled.
  • Customization OAI-PMH output

And all this feeds into the Texas Heritage Digitization Initiative