Gretchen Gueguen began the session by giving an overview of the work done by the University of Maryland Libraries. UM has identified four basic types of digital collections:
- Thematic collections, sometimes containing multiple types of digital objects, tightly organized around a single subject (example: Documenting the American South).
- Object collections, generally containing multiple object types not organized into topical collections (example: Indiana University Digital Library Program
- Packaged collections, containing secondary source materials about a topic but few, if any, primary source digital objects (example: Romantic Circles)
- After-the-Fact Collections, aggregating work from other locations into a single repository (example: NINES).
This is an unusual way of distinguishing between types of digital projects, but although the examples might be better, it might be useful.
Gretchen identified three building blocks of digital projects: metadata, vocabulary, and interface design. Under metadata she discussed what I would have called interoperability protocols, particularly Z39.50 and OAI-PMH. Under vocabularies, she discussed pre-coordinated vocabularies such as LCSH compared to post-coordinated vocabularies, which are more typical of the online environment. She also addressed local vocabularies, noting that lack of control may make them unmanageable. Hierarchical vocabularies also present opportunities for interface design, although she advocated the use of multiple hierarchies and multiple modes of access. She recommended the interface used in the Documenting the American South project, which includes browse lists based on LCSH.
Gretchen followed this discussion with a review of the Thomas MacGreevy Archive, a humanities research project originally intended to provide access to the writings of and about the Irish poet and critic Thomas MacGreevy (1893-1967). As the archive has grown, however, additional types of digital objects have been included, and the original system used for the textbase (TEI P4) has not accommodated these well. The need for a project redesign allowed their digital collections team to test some of their ideas about metadata, controlled vocabularies, and interface design. The team extended their use of TEI P4 to include other types of objects (which is possible, although not perhaps the easiest approach!) and adding additional terms to their locally constrained controlled vocabulary that will enhance the browsability of these objects. The team has also redesigned their search interface to make the site more user-friendly. Gretchen did note that if they were to start from scratch on this project, they would not probably use TEI, although she also commented that TEI P5 would be better suited to a multi-type digital collection because of its ability to handle namespaces.
Jennifer O’Brien Roper then spoke about the University of Maryland’s Digital Repository, which uses Fedora for the underlying repository with customized interfaces for each collection. During the question and answer period, it became clear that this is totally separate from DRUM, the Digital Repository at the University of Maryland, which is a DSpace installation. As part of the development of this repository, UM also developed a rich metadata standard, University of Maryland Descriptive Metadata (UMDM), which combines elements from Dublin Core and VRA into a custom DTD. Their system was based on one originally developed by the University of Virginia, who has since switched to MODS. The descriptive metadata is then embedded in a METS wrapper that adds metadata about file structure.
UM uses an ingest system to take metadata from disparate source and normalize it into UMDM. This ingest system is form-based and includes drop-down lists for controlled vocabulary items. Every item must have at least one subject heading from a common vocabulary; the UM Technical Services Department will create authority files as appropriate to allow the vocabularies to evolve. Additional terms can be selected from standard thesauri, including LCSH, the Getty Art & Architecture Thesaurus, and the Thesaurus for Graphic Materials. The next step, beginning in 2007, will be the development of a system for browsing based on these common subject terms. This will not be a dynamic system; instead, the indexes will be generated weekly.