Greenstone Digital Libraries: Installation to Production

Sunday, June 26th, 10:30am – 12:00pm
Session descr. from the LITA site: Greenstone digital library software is a comprehensive, multilingual open-source system for constructing, presenting, and maintaining digital collections. Greenstone developer Ian H. Witten will introduce Greenstone and demonstrate installation and collection building. Washington Research Library Consortium and University of Chicago Library representatives will discuss Greenstone implementations at their organizations, including software requirements and selection, collection and interface customization and use of METS-encoded metadata. Laura Sheble will present results from the 2004 Greenstone User Survey.

Speaker 1: Ian Witten, University of Waikato, developers of the Greenstone library system

Goals of Greenstone have been:
-to be able to present collections of digital material and to support custom presentation of these colls.
-large scale support, up to several Gb text
-support associated/linked images, movies, etc.
-serve on web or publish to CD
-run anywhere, on any platform, and with support for many languages
-non-exclusive as to format
-non-prescriptive as to metadata, etc.

Easy to install, supports full text or fielded search. Extensible.

-Open source (SourceForge)
-5,000 copied downloaded each year
-supports 38 languages
-Supported by some important international agencies; UNESCO distributes and provides Greenstone training

Ian did a demo of the Greenstone system (I believe he said he was showing version 2):
Running the librarian interface, demoed creation of a new collection with these main steps
“Gather” – drag/drop images and other Beatles miscellany into a collection window. Greenstone detects mime types, prompts to install plugins for mime types not previously encountered (MP3 and MARC)

“Enrich” – optional step to add metadata, which Ian skipped for demo purposes

“Design” to create indexes. Uses any available extracted metadata if metadata not explicitly provided in the Enrich step (titles from MP3 and HTML files, etc.)

“Build” to build the collection

Demo’ed a search for “love” in full text & title. Shows thumbnails of images, which it creates as the image files are imported in the “Gather” phase.

Bulding a more sophisticated collection for Beatles miscellany took about 1.5, this involved adding a MIDI plugin, adding metadata for the objects, adding DC classifiers, adding a browse by media type function.

Greenstone 3 is a complete rewrite and is in the works; can be downloaded in beta form now but not recommended for production use. V2 is still the supported/recommended product. Changes coming in 3: generates XML rather than HTML, METS is the foundation and underlying collection format, JAVA-based and uses SOAP.

Speaker 2: Alison Zhang, Washington Research Library Consortium

WRLC is 8 academic libraries in the DC area

In 2002, received and IMLS grant to provide dig. collections in a consortial environment

Needed power and flexibility from a digital library delivery system. Features sought:
User interface: good browse, powerful search, customizable, collection-based indexing and labeling, linkable digital objects & metadata, multipage object display (books or other complex text objects), support for multiple formats (MD?), support for standard schema, federated search
Staff interface: ease of use, support for Dublin Core, support master and derivative vers. of objects, templates, direct view of digital objects, allow search edit and delete of records, support global changes/updates, local authority control.

None of the software evaluated met all requirements, so decided to customize two open source packages: DCDot for metadata creation and Greenstone for display/user int. Neither supports federated search or multipage object view.

Most of staff interface is DCDot-based, customized. Created own multipage viewer.

Example collections, for which customized HTML templates were built (17 dig. collections built since 2002 using Greenstone): Art images Collection, Finding Aids collection (EAD-based, first Greenstone customer to do this).

Delved a bit into the details of how to customize Greenstone, referred us to the doc. she wrote which is linked to from the Greenstone site: “Customizing the Greenstone User Interface”

Customizing DCDot – most customization involved Perl. Created templates, implemented a drop-down authority list that updates dynamically as additions are made.

Created own collection management system to tie everything together and are in the process of replacing DCDot with another management interface, possibly DSpace.

Speaker 3: Tod Olson, University of Chicago Library

Chopin Scores project: over 400 scores from Chopin’s early period.

Tabbed user interface display, choose to view bibliographic desc. or the document itself, which has a multipage browse feature.

Built this project on AACR2 MARC from library catalog. Preservation scans and structural metadata were input into a relational database. MARC was transformed to MODS, which were then combined with images and structural md to create a METS record. METS transformed via XSLT into the Greenstone structure. Tod explained in some detail which bits of the METS structmap, etc. were mapped to the Greenstone format.

Features of Greenstone3 that U of C looks forward to: support for Lucene or MG/MGPP (Greenstone internal indexing component), METS as internal structure, MySQL support, XML/XSLT for presentation, continued support for existing Greenstone2 data.

Proof-of-concept Music Information Retrieval (MIR) component:
Scores in the collection are matched to existing MIDI examples. Pitch intervals are encoded as text, which is added to the document metadata.

User can input a tune into a keyboard. This MIDI file is similarly encoded as text, then a search looks for matches in the document metadata. It actually works!

Chopin Early Editions

Speaker 4: Laura Sheble, Wayne State
Greenstone User survey

Created a user survey to get feedback on Greenstone support mechanisms

