2005

XML and Authority Control

The XML and Authority Control program, sponsored by the LITA-ALCTS CCS Authority Control in the Online Environment Interest Group (ACIG) took place in a dim, cavernous room in McCormick Center, remarkably full at the beginning and with sitters on the floor around the walls (this reporter apologizes for taking two chairs, one for herself, one for her Mac!) The program announcement contains the full titles and affiliations for the speakers, as well as the titles for their presentations, which I will not repeat below.

Sally McCallum from LC was the first speaker, and she started by talking about XML in general, characterizing it as “just another data structure.” She pointed out that XML has more capability for hierarchy than MARC21, and noted that this could be either good or bad. She briefly discussed the place of XML schemas in the definition of tags and structure.

After a rapid fire review of why we want to do XML at all, Sally went over what METS was (which I suspect was a bit confusing for the uninitiated), and talked about other new XML-based metadata formats for rights, preservation, and technical information, as well as MARCXML, as a precursor to MODS and MADS. [It was a very quick overview, and probably difficult to digest for the uninitiated (way too many acronyms, for one thing)—a handout with the acronyms explained might have been useful.]

Sally finally moved on to MADS, and described the structure and tags for the authority structures within MADS. Top level tags are authority, related and variant and each has the same substructure. She also described the different attributes for the related and variant terms, most of which seem to be intended to keep intact most of the MARC21 authority structures.

Other elements have been added to accommodate other communities, and Sally explained affiliation and fieldOfActivity and their potential uses, even when (as in the case of affiliation) LC and the library community have no history or intent to populate or use the information.

Sally presented the change from numeric to English tags as a positive move, but clearly there is an issue with internationalization in this approach. She suggested that others could substitute different language tags and make transformations—this strikes me as a very problematic assumption—much cleaner perhaps to keep the numerics and substitute appropriate language names within a user interface.

Several MADS features were cited, among them linking capabilities and special attributes available for all elements (such as language and script). Sally invited attendees to participate in the MADS development (carried out on the MODS mailing list). In response to questions, she was very reluctant to predict how the two formats would be used after an “experimental” period.

Second speaker was Diane Boehr, from NLM, who described NLM’s planned exposure of their authority files in XML. NLM needed the capability to create a centralized, shared authority file for some of their web-based products. This project involved only names, not subjects (those are dealt with by their MeSH section), using their Voyager authority file as their central repository.

Interestingly, though this was a centralized file, not all participants were required to use the same content guidelines (e.g, AACR2), and they chose to perceive the variations as a way to enrich their cross-reference structure. They developed a DTD to enable this, which included different reference structures and accommodated different preferences and practices across NLM projects.

As a NACO participant, NLM still needed to pay attention to AACR2 and MARC21 when redistributing information. They used a combination of $8, a $0[initials], a $9 N value and some local heading tags to manage these variant needs and to suppress NACO distribution for information that was not NACO compliant. She showed some examples of this approach that had the interesting effect of reinforcing the reality that MARC21 is still both standard and flexible.

The NLM DTD did not use MADS because they needed to match the attributes of their bibliographic DTD and did not require direct correlation with every MARC field. They would like to go to XML schemas but are not ready to do so because of their legacy of DTD use.

Diane gave extensive explanations (rather too extensive for my tastes, but I can’t speak for the rest of the audience!) of the other characteristics of their work, which seems primarily oriented towards controlling output for various purposes. [Note for subsequent programs: XML on the screen is virtually useless—please provide handouts if you need to discuss details!]

Diane cites the primary advantages of their shared authority system as improved quality control and the ability to expose multiple, customizable outputs. She also asserted that their approach to multiple languages as better than the multiple, linked authorities approach used by MARC21 (as an example the way the Canadians use French and English separate records).

Louisa Kwok, the third speaker, discussed the work at the Hong Kong University of Science & Technology Library to define an XML-based schema to mark up multi-lingual and multi-script attributes of name information. Their implementation was intended to deal with the problem of identifying Chinese authors using the romanized forms of names used in traditional authority records.

She differentiated between name access control and authority control—which she defined as the difference in designating an authorized form and providing extensive, enriched support for identification of name usage. She further explained a “person model” based on FRBR: Person→Name→Name form. Variations in name form can be due to language, script or Romanization scheme, but all are used to facilitate access to the person and his or her works. The XML format is based on MARCXML (not MADS). Sadly, too many of her examples were unviewable from the audience—again, handouts would have been extremely helpful. She also demonstrated (too much, in my opinion) the workflow and processes for enhancing regular authority records to create the Name Access Control records. The room cleared out a bit after her presentation, even before a stretch break was declared.

Kevin Clarke, fourth speaker, began personfully with an explanation of goals for the XOBIS program. This project has attempted to combine the schemas for bibliographic and authority data, rather than separate them, relying on reusable components. According to the program announcement, “XOBIS hopes to foster the use of traditional library metadata in the digital realm,” though they’ve chosen to do so by using very different. As an example “Being” is used to describe “Specific identities of tangible or intangible beings (living or dead), including personifications.” Thank goodness we’re not losing track of those personifications!

Interestingly, in the midst of this differently modeled and denominated structure remains the “main entry,” presumably for the purpose of being able to map back to the old world. Clarke sees relationships (a big part of XOBIS) as a “growth industry” for catalogers, since these relationships are hard for machines to do well.

“What about the billion or so records that would have to be converted?” asks Clarke, rhetorically. A big job, he admits, and the audience tittered. A big job, indeed, and its difficult to see where the big payoff would occur.

Thom Hickey, the fifth speaker (oh dear, is this fair, to be the fifth of six on a Sunday afternoon?), discussed OCLC’s web service experiments with authority control. Thom first demonstrated an authority control component built for a DSpace implementation, which was limited in scope but pointed out the need for changes in the software to accommodate authorized forms. The service needs to be moved to standard protocols, ranking should be improved, it now links to OCLC rather than LC, and the files are not complete. Attention also needs to be paid to sustainability questions for what is now a free, though researchy, service.

Most valuable of Thom’s points were where a cool demonstration gave way to discussion of what’s still needed for persistent, useful services for authority data—things like free resolution services, persistent identifiers, and competent middleware. He also discussed the gaps in the traditional authority files themselves, which make support for institutional repositories and international libraries spotty.

There were also some tantalizing bits on the Virtual International Authority File and the OPAC Network In Europe Shared Authority Control, though Thom’s approach was at such a different level from the previous speakers that it was difficult to relate his points to their actual implementations.

Joanna Pong was the final speaker, describing the HKCAN, a cooperative project of seven academic libraries in Hong Kong, implementing an approach to the language, script and Romanization issues for Chinese names. Joanna launched quickly into slides with complex authority record representations, testing the focus and stamina of the audience. I admit that by that time my attention had flagged, and her soft, liltingly accented but speedy delivery made her difficult to follow, unless she was reading the slides. Of all the presentations, this one suffered from an excess of “how-we-did-it-good” without much reference to other approaches, including that of the other Hong Kong library presented earlier in the afternoon.

The IG should be commended for putting together such a rich and interesting program, but flogged gently with a wet issue of Cognotes for cramming six speakers into three hours with no chance for any discussion or even an opportunity for the speakers to ask questions of one another (often a good way to bring up good questions when the room is too large for good participation).

Diane Hillmann
Dih1@cornell.edu