MODS and MADS: Current implementations and future directions

MODS and MADS: Current implementations and future directions

ALA Annual Conference 2010

Sunday, June 28, 2010 10:30 to noon

Intro, Jenn Riley: Metadata Librarian, Indiana U. Digital Library Program

MODS 3.4 schema released June 2010. MODS/MADS editorial committee considering overall direction for MODS 4.0. mods 3.4 has

support for RDA descriptions
better handling of subject vocabularies (specify vocabulary at relevant subject subelements, specify vocabularies and terms by URI.
Better support for multilingual cataloging
expanded the use of the usage attribute
expanded use of the displayLabel attribute.
Ability to bind a specific name to a title to create a Uniform title.
The ability to mark selected elements as containing cataloger-supplied data (rather than brackets, etc.).
Various changes to make the schema itself for consistent, easier to manage and of greater utility to other applications importing elements from the MODS namespace.

For mods 4.0 thinking of a more formal data model, maybe RDF. Want to encourage linked data and hope that the more formal model may help. Give feedback on the MODS listserv

Speakers:

Bill Leonard, Library and Archives Canada

In 2004 national archives and national library of Canada merged. This meant that they had to merge all of their data and records. They have a federated search to both the archival and bibliographic descriptions: http://www.collectionscanada.gc.ca/lac-bac/search/all

They are also building a trusted digital repository using these metadata standard:

METS
PREMIS
MODS
Government of Canada records management metadata standard (records are received this way and then stripped down to the archival core set, eventually mapped to MODS to be placed in the TDR)
archival core set
ARK

MODS is the common schema for all the descriptions within the TDR.

Another project is Canadiana Authorities: http://www.collectionscanada.gc.ca/databases/canadiana-authorities/index-e.html. This is a new search interface to LAC’s authorities (name, name/title and series title) that has benefits because the data is clean, there is greater flexibility when searching XML and the control and normalization of the data.

In the future bibliographic data to be available in a parallel service. They need to fix character set issues, ingest heading data from the archival descriptions, enhance the bibliographic authority data with biographical or administrative history notes, and make Canadiana Authorities available as a web service

Sally McCallum and Rebecca Guenther, LOC

“Using MODS for Discovery of LC’s Rich Collections”

In 2007 LC went through strategic planning. A goal was to enable “seamless access” to all of LC’s resources but also support subtype access – e.g. digital only, moving image only, photo only, etc.

The LC landscape at the time was horribly diverse. Like really, really crazy with the systems, databases, and repositories all over the place that were of various ages and depth, and obviously, platforms and standards.

They considered two scenarios: federated search, which would mean keeping the silos and doing a metasearch. But this is really only useful for a limited number of systems for special services and very hard to do it fast in realtime. The other option is to combine the metadata in one system. Created a unified database to search (after harmonizing metadata). The metadata would focus on LC collection and would provide just enough information to know if the item is obtainable. They tested out unified database with native XML architecture. To implement this they used markLogic XML Server.

As a sampling: they had about 17 mil OPAC records, 1.3 mill American Memory (mostly MARC), performing Arts Encyclopedia about 50K (MODS in METS), LC Web Archive about 6K (MODs), Handbook of Latin American Studies about 160K (MARC), and so on and so on and so forth….whew.

They decided to use METS as a wrapper and MODS for descriptive metadata for all items…plus MARCXML if there is MARC; EAD if it’s a finding aid; TEI if there is text of veteran narrative (one of the projects); KML for geographic coordinates; ALTO for newspapers; parts of ONIX for summaries and cover art. MODS article-level data lists the “host item” (or the whole that it is a part of) as <related Item> (for the Latin American Handbook project); <location> is used for holdings

MODS advantages:

compatibility with what they already have
less detail than MARC so you can map MARC
other standards are easy to map to MODS as well (easier than MARC at least)

[mods helps by providing a common format. As a derivative of MARC is retains a lot of the richness of MARC while playing well with less granular data. Does support faceted discovery.]

XML advantages

can use Xquery
use of xml-based standards like SRU, OAI
easy to use tools like RSS and simile
linking to authority data in id.loc.gov and later even incorporating it into markLogic
enhanced faceting
geospatial searching

Will phase in databases to the service and will launch before they are all in there

The metadata for digital content working group also set specific goals included improving discovery of LC digital content through metadata; provide access to metadata isn silos; share content and metadata; provide 2.0 services; establish consistent metadata and create it consistently going forward

Their strategies for improving discovery:

allow search from external sites
focus on reusability
improve and enrich metadata
other use cases like navigation by facet, reuse for different delivery systems
supply metadata for partner sites like youtube and flickr.

Steps to do this:

establish master list of elements across projects based on MODS
annotate list with best practices
create metadata profiles for each initiative
identify where metadata remediation is needed to apply best practices

Metadata profiles are available at: http://www.loc.gov/standards/mdc (only American Memory, LC Web Archive, and Performing Arts Encyclopedia are available now, but they are working on the others)

The upshot?

Standardize metadata across LC to provide seamless access
profiles provide compatibility with existing metadata
quality is improved by applying best practices detailed in the metadata set
LC is generating HTML meta tags with values extracted from metadata source for use by search engines

Amanda Harlan, Baylor University

“Texas Digital Library’s Eletronic Dissertations and Thesis Descriptive Metadata Guidelines and Vireo ETD Submission System”

Historical overview of the ETS and Vireo submission system:

TDL aggregates digital material from across Texas libraries. In 2005, they had to decide on a metadata for ETD. MARC and DC both dismissed. MODS was chosen because

based on MARC
easier to work with b/c of XML
captured info more easily
the only extension needed would be for degree
there was already some experimenting with MODS as separate bitstreams to dspace items. Eventually Mannakin allowed them to work with the MODS record rather than the DC that dspace normally uses

Their schema includes 16 top-level elements.

TDL originally felt that a federated collection of ETDs was a priority. As they developed it, it became the current Vireo system. Each institution has their own instance and repository, and then TDL would federate the metadata through OAI. In Vireo MODS was used and the descriptive metadata. They worked on standardizing workflows across institutions for students submitting ETD, grad school processing and then library processing. The basic workflow includes:

students ingest
staff and students verity
published

TDL Vireo to provide a tool and interface for each step. The turnkey solution can be deployed at each institution.

Other issues to address included TDL’s level of involvement (fully from ingestion to publication) and the author’s rights (the license is similar to Creative commons. Each institution could change the license if needed, but always had non-exclusive rights.

Currently, they have published guidelines and application profile for MODS descriptive that were created in 2008, but haven’t been updated. Vireo is currently on version 1.0.3. There are three institutions in full production and 5 other in various stages of piloting. TDL also hosts a production and lab instance of Vireo for ever TDL member, even if they don’t use them or run their own…so they can play around with it and decide if they want to sue it. They also created a Vireo User’s Group to identify and report defect and recommend enhancements for the next version. This group provides support for all users of Vireo.

For the future they hope that the guidelines and profile will be reviewed as more institutions are involved and there might be new needs not originally discovered as well as to keep up with changes in standards. They hope to release Vireo as open-source in the fall. Two other institutions use it (UIUC and MIT) so they hope to incorporate customizations made there.

Karen D. Miller, Northwestern University

“Crosswalking EAD to MODS at Northwestern University Library”

Do you know what EAD is? If not, find out: http://www.loc.gov/ead/

Basically, EAD is a description of archival collection in two parts: a description of the collection as a whole and then a description of the parts, organized by format or topic in series, boxes and folder or entities and fonds.

NUL created an EAD portal that includes finding aids from archives and special collections. In addition, there is a special project for a collection of East African photographs that is one quite large EAD.

Why crosswalk it to MODS? They are storing a MODS version of the record for their digital repository which includes a cross-collection search that searches items original described in MARC, TEI, etc. They create a MODS record for each container that they capture in the collection. The higher-level containers can provide data for quite rich MODS. They create subject headings and other information for each of these in the MODS record that don’t necessarily appear in that format in the EAD.

For sub-containers they can’t necessarily just inherit information from superior containers because info at that level may be too general (i.e. this folder contains german, french and english). They decided that parents to children/children to parents would be the only inheritance recorded (rather than child to grandparent, or folder to series). This info is put in the MODS <relatedItem> but just the <title> and <identifier>. This way you can go up or down the chain and get that info as needed.

They have created then a way to view the EAD container list tree is a left-hand column and then click on each container and see the generated MODS record and digitized item in the right. Cool! Check it out at: http://repository.library.northwestern.edu/winterton/browse.html#actiontgetAllPhotos