Greenstone Digital Libraries II

Fortunately, my co-blogger came in on time and got the first part before laptop death set in. Meanwhile, I arrived late and got only the latter part of the session.

There were probably about a hundred people at this session and the handouts ran out. Handouts and more have now been posted. (Thanks, Kyle!)

I’ll add a few more notes to complement Claire’s summary:

Allison Zhang

The UI out of the box was very basic but it was customizable. Allison showed screen shots of different collections they had worked on, each with customized graphics.

Two different kinds of materials were included in their projects: Digital collections (Dublin Core encoded) and Finding Aids (EAD encoded).

The collections are available at http://www.aladin.wrlc.org/dl/. Allison highlighted the Treasure Chest of Fun and Fact, which is a collection of some 20 years of comic books. For this collection they created metadata for each story, plus structural metadata to link from page to page within each story, and from each story in an issue to others in the same issue. They set up a conditional display format in Greenstone to show the table of contents for an issue differently from the page display. At the top of each page of content, there are links to page back and forth through the story. Her main point in showing all this: You must design your metadata.

Tod Olson

The Chopin Early Editions collection was designed to allow browsing by genres, e.g., nocturnes. In this project they used metadata to drive custom navigation features. Tod showed the pathway from data (catalog records, scanned images, structural metadata) to the integrated record in Greenstone. See PDF of his slides.

Custom page-turner metadata contains previous-page and next-page information; if my notes are correct these were generated from the record creation process, a great idea!

They are using Greenstone 3, with METS internal storage format, MySQL for metadata storage.

The proof-of-concept project allowing music search by "playing" a displayed keyboard image was very impressive. Encoding the pitch intervals rather than the specific notes makes it much more likely to work. It’s good to see some progress on music searching like this.

Laura Sheble

The primary focus of the survey was support needs, but they also asked about characteristics of Greenstone users’ technical environment and collections:
24 questions – general support mechanisms
8 questions – collections and target audiences
8 questions – contact info for follow-up and directory

Unfortunately the legends on the charts were barely or not at all visible from where I was sitting, so I only caught a few of the findings that Laura actually said verbally; for more detail you can check the Handouts. A few selected highlights:

  • Total valid responses: 54
  • Most installations were on Windows OSs, fewer on Linux/UNIX
  • Large percentage of installations were in US/Canada
  • 93% of respondents were actively developing collections
  • About half were developing 0 – 1 collections, about a quarter were working on 6 or more
  • Half were university-affiliated, 20% regional or international centers, and surprisingly to me, about 7% were commercial enterprises
  • About 30% had a multilingual target population
  • Of the major support needs, many respondents cited local training and materials and local support organizations in their own country as needed. Interestingly, many of them were actively developing training materials.

The survey is closed for this particular statistical analysis, but is still open for responses:
Greenstone survey

Q & A on Greenstone

Q. Is Greenstone 3 available?
A. It’s available; serving is in an advanced development state but building is still under development.

Q. What is it written in?
A. Greenstone 3 is Java-based.

Q. OCLC ContentDM has resources behind it, while Greenstone is perceived to be hard to use. Wouldn’t I want to go with a commercial, supported product?
A. There is a DL Consulting company. But open source software, by its nature, is not usually well marketed or professionally supported and documented.

Q. We are a non-profit with an existing base of XML and our own DTD. Can this work with Greenstone?
A. You could develop a plugin to translate your existing data to Greenstone internal format.

Q. Does Greenstone support DSpace?
A. It supports importing from DSpace internal format.

Q. JPEG2000?
A. Pretty much any image format (using Image Magic ?? I wasn’t clear if that was a required plugin for this).

Q. Fedora?
A. There’s work underway to support it as a storage format with Greenstone for the indexing and presentation.

Q. What’s the maximum size of a Greenstone collection?
A. About 11 million records. The BBC metadata currently contains about 3 million and is using Greenstone. There is a known bug after about 20 GB of text, but due to Greenstone’s compression algorithms it takes a long time to hit that limit. Due to support for cross-collection searching, though, you can segment a large corpus into smaller collections. Most people do currently use it for smaller collections.

The joys of the Intercontinental

Here is the room where the Greenstone session was held (the King Arthur Court room)!

Greenstone Room Photo