Project Shibboleth: Issues and Answers

Project Shibboleth: Issues and Answers
Saturday, June 25th, 1:30-3:30 p.m.
ALA Annual Conference, Chicago

This was an excellent program that had a solid mix of speakers who could address the various aspects of Shibboleth, from the technical underpinnings to considerations for implementing it on a broad scale to perspectives of librarians on the types of changes to user behavior that Shib (for short) might require. The program allowed me to get a solid grounding in Shib without getting too technical or too theoretical. The audience was a good size and a healthy mix of techie and non-techie folks (in fact, at one point one of the presenters asked how many people in the audience were librarians from outside the systems department, and over half raised their hands).

Arrived a bit late because of the same old story, buses and lunch and tigers and bears, oh my…

When I arrived Keith Hazelton of Internet 2 was rolling along in his presentation of the fundamentals of Shibboleth and the ten main verbs to remember when thinking about how Shibboleth works. Hazelton’s main point was to bring home the fact that Shib is the “deliver” verb in the long chain of events involved in authentication (AuthN) and authorization (AuthZ). It is an open-source tool for facilitating identity management between institutions (universities and colleges) and service providers (JSTOR, EBSCO, etc.).

Hazelton discussed the problem of fragmentation, which occurs when multiple IT systems within the same institution have the ability to create a “person record”. At some point, all of those different person records have to be joined together to make one coherent person, a unified source of identity information.

Shibboleth is simply a means of moving identity information from an institution to a service provider. It is up to the institution to handle creating the person and defining who he or she is through attributes (name, status, dob, etc.) and up to the service provider to open the gate and allow the person is. Shib simply carries (delivers) identity information between the two.

Hazelton explained how this works by using the example of the three musketeers. The questions of identity are, What groups do I belong to? What roles am I in? One might answer, I am Porthos, a member of the musketeers. This would be my affiliation. On the other side of the equation, you have the privileges that are granted to persons of a given affiliation. Do you get to bear arms? Do you get to go into the palace? The privileges are easy to follow when there are only one or four people who belong to a single group, but the more people and groups you have, the more complicated it is to manage their privileges. The concept of affiliation is a nexus point to lump people into affiliations and lump privileges to that affiliation. Grouper is the name for what puts the people into groups, Signet is the name for the piece of Shib that says what groups get to do what things (who gets to go into the palace). Hazelton’s metaphor was interesting and, unlike some metaphors used at sessions that shall remain unnamed, seemed to carry through well down into the details.

Hazelton followed this metaphorical discussion with the concepts of how systems talk to each other to authorize a user’s access to a given resource. The “systems of record” (for example, HR and registrar databases) “provision” out information on a person (IAM information). This info needs to be delivered to services, which then make the AuthZ decision. Shib is the middle portion of this equation, delivering IAM information to services that are capable of making an authorization decision.

Hazelton concluded with the fact that not many people rushing to adopt Shib, in large part because it’s open source. In order to succeed, it needs partners. The move to adopt Shib needs to come from the ground up.

Hazelton also quickly flashed the ten verbs to know with Shib, upon which he based his presentation, so I’ll quickly capture them here:

Manage affil/groups
Manage privileges
Deliver authenticate
Authorize log

Next up was Chris Shillum talking about Shibboleth and ScienceDirect (Elsevier)

Shillum’s focus was on what Elsevier has been doing to integrate Shib into ScienceDirect. He showed screen shots taken earlier in the week from a live implementation showing how Shib looks to a typical user searching databases.

The first screen shot showed Science Direct. Clicking on a link took us to a WAYF service (where are you from), a simple application that provides list of all institutions in the “federation,” (a federation is the group of institutions sharing the Shib implementation). The WAYF knows where to send the user when they identify their home inst. The user selects their institution (identity provider) and then gets sent to their university’s login page (University of South. Cal. in the example). Then upon successful login, the user was sent back to Science Direct with the same rights and privileges as if they had been authorized by IP, etc.

Shillum reported that Elsevier sees this Shib as a win-win for both sides, especially since there is less administrative overhead involved. Shillum said the previous presentation gave all the concepts and ideas on the backend, demonstrating how Shib is making the delivery happen, but this example shows how simple and straight-forward this really is, how seamless.

Shillum cited the following benefits of Shib:
1) Shib could replace IP filtering. Removes administrative burden of maintaining IPs.
2) Shib decouples customer’s network architecture from authentication, allowing departmental purchases, which is very hard with IPs if a department doesn’t have its own range of IPs.
3) Also allows personalized access to remote resources using local credentials, instead of having to remember personalization passwords and logins for many different vendors, so the service system always remembers your customizations without you having to remember a separate password and identity.
4) Shib removes the need for users to remember different usernames and passwords and avoids problems of proxy servers for remote access. Bottom line: Shib helps provide the broadest access to the community.

Shillum then talked about some of the logistics and details of the Shib pilot project that Science Direct did with Dartmouth, Georgetown, NYU, UC San Diego, and Penn State.

He concluded with some of the issues surrounding Shib and vendor implementations. The technology is new, complex and rapidly changing. Federations of institutions are in their very early states. Uptake is key in moving to the next level of adoption and ubiquity. We are at a critical point. To make uptake work, we need to make implementation easier for smaller customers and vendors. Elsevier is committed to making access easier for users and will continue to support Shib.

The next presentation, more brief than the others, was by Chris Zagar, from Useful Utilities/EZproxy.

Zagar pointed out that Shib should eliminate the need for proxy servers sometime in the future, but this isn’t going to happen overnight, that everyone will be shibbolized. There will be a path where some are, and some aren’t. Zagar offered a roadmap for how to make the transition through a handout with the terminology of Shib and how it can work during a transitional period with EZproxy to make “shibbolized” and “non-shibbolized” resources work together in an EZproxy environment.

Zagar pointed out that many people probably have an environment with a variety of links to databases, etc., that point to an EZproxy server. During the transition to a Shib environment, users can use traditional EZproxy login methods that they are familiar with. This would be a transitional mode, providing a link where people can go to the Shibboleth WAYF service. Users would select an identity provider (their home institution), then use their standard netID and password. From the user’s perspective, they then just wind up in the database. To the user, this looks very similar to what they are used to with EZproxy, and they don’t have to see all of the intermediate, behind the scenes steps that go into making it happen.

Next up to the podium was John Paschoud of the London School of Economics Library, who talked about “Building a UK infrastructure for access management using shibboleth”

John brought a good deal of experience with Shib implementation from the library perspective, along with a dry British humor that lightened the mood and brought a little cadence to the afternoon. He opened with the observation that “Britain is just like America, but squashed into a very small space.”

Paschoud had a point in saying this, which was that America and Britain share similarities in the concentration of their academic activity, but the institutions are just very close together in Britain. Paschoud focused on a centralized identity management service that has been developed for British higher ed insitutions called Athens, which allows about 2.5 million users to authenticate into about 250 online resources. It is a national, UK-only system that accomplishes essentially the same thing Shib is meant to. The problem is that it is UK-only and that there is quite a bit of administrative overhead to a centralized authentication service.

The implementation of the Athens service gives the UK certain advantages in looking at Shib, particularly the fact that they have been working for 5 years on how to develop “federation” policies and practices. Federations
are organizations with a common purpose (e.g. education and research) who trust each other and sign up to a set of rules, for example who is authorized to access what resources. They also have an established body (JISC) with some centralized authority over an unruly community.

Paschoud went on to talk about what the timeline for developing a Shib infrastructure, look for institutions willing to adopt Shib, create a service for early adopters, work with publishers to make their products Shib compliant, and make national data services Shib compliant. Essentially, Paschoud was outlining a timeline going out to 2006-2007 for the whole project of “shibbolizing” the UK federations.

The big take away from Paschoud’s presentation is that identity management is happening at a national level (sort of) in the UK, but that it requires a large central database which everyone has to go through. Shib decentralizes the whole operation and puts the identity management and the authorization on the individual institutions, creating less overhead by eliminating the centralized service.

The last speaker was Mike Neuman of Georgetown University, whose presentation was “Shibboleth: Problems and Promise.”

Neuman is in the interesting but unenviable position of reporting both to the University Librarian and the CIO. His presentation was geared towareds the functionality of Shib for those in the library but not in systems.

Georgetown became involved with Shib in the fall of 2001, when it became involved in the Elsevier Shib pilot project with Science Direct.

Neuman wanted to review some of the challenges noted by reference librarians as they considered how things would work with Shib as opposed to how they work now in an IP/EZproxy environment. Among librarians there is generally a feeling that change to the status quo is not a good thing, Paschoud pointed out. But he said that some aspects of the status quo maybe need to be eliminated, and Shib might be a good way to do that.

Challenge 1: library patron groups often differ from the University directory groups. Patron groups often include non-University members, and license categories can include non-university members who may not necessarily be in the authentication directory of netIDs. For example, EBSCO has language that allows “authorized users”—so who exactly is this? Are these people that have legitimate claim to privileges, but aren’t strictly university employees? Hospital staff at Georgetown are not university employees, but they have certain privileges in the library. Shib is a way to deal with this disjoint with the university scheme of groups by allowing each individual user to have a unique identity with attributes that indicate very specifically who they are and what they have access to.

2. Licenses complicate authN and authZ models by throwing in all kinds of exceptions and limitations, such as limits on simultaneous access, restrictions on use if you’re on campus as opposed to off campus, and licenses that limit access to specific libraries or departments. E.g., the Law, Health, and Engineering libraries could conceivably be getting the same journal from different vendors. Or all three libraries could be working with the same vendor, but getting a different package of services for each library.

3. Library gate keeping. For example, sometimes resource passwords are controlled for budgetary reasons, such as resources that are licensed on pay-per-view or pay-per-search basis. At end of semester, based on budget resources, a librarian may decide to not give out the password to a resource and point user to another database. Neuman claimed that retaining gatekeeping control such as this would be complicated in a Shib environment.

Neuman also, raised the problem of delinquent patrons, who can sometimes be denied access privileges. Here there is the need for an attribute to indicate a person is blocked from using a resource or all resources. AND librarians need to have access to place and remove these blocks.

4. initially patrons may feel inconvenienced by Shib on campus if they don’t have to log into resources from on campus because of IP authentication. Dual access procedures (Shib and non-Shib) could confuse patrons.

Neuman continued by saying that it might sound from the challenges like they have been discouraged at Georgetown, but this is not true. They are bery interested in continuing to explore Shib with other vendors as well as with Elsevier. In fact, Georgetown received funding from the Mellon foundation for a proof of concept project exploring Shibboleth and scholarly communication. This is still a conceptual project, but the funding is there and planning is moving forward. The project essentially will explore the ways in which Shib can be used to promote interlinkages among various resources (for example, from a professional society publication to related resources in other databases) that are exposed in different ways to users with different identities and attributes.

The program ended with an interesting head table discussion that involved some friendly debate on several points that Neuman raised in his presentation. Essentially Hazelton wanted to make the point that some of the concerns raised by the reference librarians at Georgetown that seemed to disfavor the implementation of Shib were less about Shib itself and more about the complex identity environments that we have to deal with. Hazelton reitereated that Shib is only, in essence, the delivery boy, and it will deliver whatever it is told to between the home institution and the service provider.

The final take-away was that Shibboleth represents a remarkable opportunity to simplify the tremendous overhead of authentication and authorization in today’s academic resource environment, where many shifting groups of users with complicated identities need to gain access to a bewildering array of licensed resources, all with different terms and conditions. There is a base of support for Shib among both vendors and institutions, but in order for it to succeed as a viable alternative to IP authentication and proxy services, there needs to be a bottom-up groundswell of interest from institutions and vendors alike.

Who Do You Trust?


RUSA/MARS User Access to Services Committee

Do You Trust Your IT Staff? Do They Trust you?

Talk about a topic close to my heart! Though somehow, in thinking about it before the conference, in my mind it morphed into “Does Your It Staff Hate You?” In fact, around Wednesday, I had to talk to someone in IT via phone, and I asked him if he hated me. He said no. Do I trust him enough to believe him? /humor

This program was held on the fourth floor of McCormick, which had had a power failure. No elevators or escaltors, few lights, no A/C. I felt like I was in a Gibson novel as I traveled long, seemingly endless, dark service corridors and stairs. Maybe it was Stephenson? Heinlein? When my friend and I finally encountered humans, I said, “Our kind!”

There were four panelists. If you want the names, see Genny’s post. Two were from Chicago Public—their IT director and a public services person. They pretty much love each other. Basically, they feel they are entering a much more comfortable period, where the pace of library technology change is not as overwhelming as it was just a few years ago. Kareem, the IT guy, talked about how IT must service needs, like a utility. He talked about taking out the human element when it doesn’t add value; but when the human element does add value (i.e., reference work), honoring, respecting, and supporting that. <3

The next speaker was an IT director from an academic library in WV. He felt the tension between IT and public services is improving, in part because newer technology is more reliable.

The A/C started working–hurray! I still haven’t gotten over the trek up, though.

The last panelist was from an academic library. She was a techie before she went into public service, which has been useful, as she can translate between the two languages/cultures. She said that the PCs belong to the users–the users are the focus. The relationship with IT has a push me-pull me quality. The tension between security and access; standardization vs. creativity. The planning process needs to be intertwined. Active partnership. However, there does need to be work around IT accepting anecdotal information from librarians. The reference interview as a usability test.

There were good questions and comments. The topic obviously resonated. Librarians still primarily deal with users F2F. It’s not as sexy as digitzation, or chat, or RSS. But IT needs to be reminded to treat public service staff as important customers.

I loved some of the ideas: librarians as translators; user needs as primary focus; bulletin boards for public service staff to discuss technology.

I still feel the “tension”, such a nice word, dealing with IT. My feelings were a bit hurt by Kareem saying that librarians shouldn’t do any troubleshooting or anything on PCs. I have years of experience. I can take apart a CPU, a printer, I have little screwdrivers, tweezers and needle nose pliers; I know what proportional versus fixed means in a variety of settings. These aren’t my primary skills, but still. Now you don’t need or want me involved?

Our staff PCs are so locked down–as much as the public–seems like someone doesn’t trust me. I have come a long way since I threatened an PC/LAN technician with a paring knife if he tried to touch “my” PC (long ago in a distant land; believe me, I had my reasons.) I’ve learned that the best way to deal with IT is to bake them brownies and offer personal interest. I do not hate IT staff–they are among my work friends. But can I tell you about the last IT committee I was on? No, I won’t, but trust me; I do not trust IT. Do you?

ACRL President’s Program in a TiVo®-lutionary Age

Frances Maloy, ACRL President, 2004-2005, opened the ACRL President’s Program on Monday, June 27, 2005 with a report on the year in review and a statement on the organization’s progress that brought cheers and applause. Frances set the stage for a terrific program saying, “After being President for a year, I have concluded that ACRL rocks!”

Awards and Speakers
The 2005-2006 President-Elect, Camilla Alire, was introduced. She noted marketing and advocacy to be her passions and priorities for the upcoming ACRL year. Awards were presented to libraries and librarians who demonstrate excellence and employ best practices. A humorous video created by Pierce College Library followed and introduced the members of their 2005 Excellence in Academic Libraries Award-winning institution. The Time for a Reality Check: Academic Librarians in a TiVo®-lutionary Age program speakers presented next as a panel.

Beloit College Mindset List
Tom McBride, Keefer and Keefer Prof. of Humanities, Beloit College
Ron Neif, Director of Public Affairs, Beloit College

Tom and Ron complemented each other well as joint-presenters, which is not surprising since the duo has represented the Beloit College Mindset List in various forums since the project’s inception in 1998. Ron clarified that the purpose of the list is not to provide a historical chronology of adolescents’ lives (or to insult students… or to make us feel old), but rather to raise awareness that students entering college today have been influenced by a society different from our own. Generational differences create unique reference points. Tom shared anecdotes of reactions and criticisms to the project, while emphasizing its usefulness. The Mindset List can be an interest-grabbing marketing tool that let’s the public know what institutions of higher education are good at. That accomplishment, Tom asserts, is “clos[ing] the intergenerational caverns that can hinder education in colleges and libraries.”

Some predictions for the Class of 2024?

  • Most students entering college in Fall 2020 will have been born in 2004 or 2005.
  • They have never purchased a CD.
  • Some of their grandparents may have served in Vietnam.
  • Courses have always been electronic.
  • Gas prices have always been above $5/gallon.
  • Libraries are thriving institutions of vital knowledge and information 🙂
  • Thousands of requests to use the list are received each year from organizations like NBC and MTV, clergy who work with youth, U.S. Armed Service recruiters, institutions of higher education and even other countries (e.g. New Zealand).

    Here and There Simultaneously

    David M. Silver, Assistant Prof., Dept. of Communications, Univ. of Washington and founder of the Resource Center for Cyberculture Studies

    David described himself as a teacher first and foremost, which came across in the thoughtful nature of his presentation and the easy relationship he quickly developed with the audience. As a library advocate (my descriptor of choice), David sees technologies like DVDs, TiVo, mp3s, and cell phones as a part of college students’ cultural reality. Like the facts and figures organized into Mindset Lists, such realities influence the relationships students have with their libraries and vice-versa.

    While students may be physically rooted in one spot, their minds are often focused on happenings elsewhere (e.g. communicating with friends across campus via text-message while studying). Students are comfortable living “here and there” simultaneously and academic libraries need to adapt to users in this TiVo®-lutionary age. (TiVo® is a television accessory that automatically finds and records programs you want, as the online ad states, “all while you’re out living life.”)

    Although we cannot directly instill curiosity or a desire to know more in students, libraries can work to create a public space that encourages contemplation, provides room for questions and promotes active learning. “We are living in an age of annotation,” David said, where cultural history is constantly being “reconstituted and redistributed.” Libraries can encourage student participation in the growth of cultural memory by acting as, or contributing to, the “information commons” on campus. Such public places offer a crossroads where students check email, study, meet friends, mingle with faculty and perhaps learn something new from a display of art or academic work.

    At the University of Washington, for example, the library used public space to display of political cartoons from around the world after 9/11 where individuals could write notes and follow threads of commentary. In addition, libraries offer reading materials that can transport the inquiring minds of adults and children alike to distant lands and new realms of thought ripe for exploration. Libraries can indeed become a “pinnacle of the simultaneous ‘here and there.’”

    Additional Notes
    David is also co-director of The September Project, a grassroots effort that looks to community organizations like libraries- academic and public alike- to plan engaging awareness events on the weekend of September 11.

    In the abundance of information sheets, books, pamphlets and vendor give-aways collected over the course of the conference (FYI, I am now the proud owner of little red object that enables me to open, clean and write on CDs), this session distributed two handouts I feel are note-worthy for their content and simplicity:

    1) a 10 page program brochure including the President’s Report to Council- an asset for ACRL members in attendance, and

    2) to-the-point evaluation forms (which I saw being returned!)- a great tool to further future program planning.

    Overall, an interesting and inspiring program. I walked away with a better understanding of student library users and their expectations based on culture and circumstance. Librarians will do well to ponder the sometimes weighty, but often quite enlightening, realities of generational change and cultural evolution when reflecting on how to provide more effective services.

    MODS, MARC, and Metadata Interoperability PART 2

    Speaker 4 (first speaker of second half): Ann Caldwell, Brown University

    Overview of digital initiatives @ brown. The CDI was created in Oct. 01; metadata specialist position (Ann’s) created October 2002.

    Brown metadata model: Ann’s position includes all metadata, not just descriptive. Using METS to package, chose MODS over DC. Their model enables both shallow and deep discovery. For example: an art image can be searched in native VRA format in Luna, but in central repository as MODS. Everything has a MODS record.

    Early projects. Were at first only delaing with library materials – sheet music, etc. Used existing MARC-MODS tools. Still have no metadata creation staff but got interns from Univ. R.I. Library school. 150 hours/sem = 3 credit hours. Many students are on second careers and very focussed on their work. Began using NoteTab Pro, which they had also been using for EAD-creation.

    Broadening the base. Word got around campus very quickly that this was going on. Faculty and other groups began coming in with very creative projects, some hybrid of own materials and library materials.

    CDI dropped their current work to help faculty. The Scholarly Technology Group (part of IT) were contacted to be sure the CDI was not duplicating their efforts. They weren’t, STG wasn’t doing any metadata work to speak of.

    Ho to build MODS records. Some from MARC, some from scratch, some extracted from other dbs (FMPro) to convert to XML.

    NoteTab Pro: cheap. Downloaded the EAD “clip library” and modified it. Very flexible. All MODS, all METS records build in NTP. Programmed it to prompt the user through a series of templates. Constantly making changes to this.

    VRA & EAD records mapped to MODS transformed with XSLT. VRA records exported from image cataloging system (FMPro-based). Not all elements retained from VRA -> MODS. “subjects in VRA get a little squishy” EAD component-level content captured and converted to MODS on a 1-1 basis.

    What’s in the records? Have established a bare minimum, every MODS record validated against stylesheet for minimum and also certain local requirements. Don’t have subject analysis on all records.

    Storage and display: records mapped into PHP/MySQL (homegrown). All mapped into relational tables to enable the cross-collection searching. Records retrieved through search are displayed with stylesheets.

    Ann had several examples of table displays and a schematic diagram of the system [see her ppt.] She demo’ed searching the Brown repository.

    Current status. July ’04 added 1.66 professional position and some additional paraprofessional staff (didn’t catch the number). Still no additional staff for the metadata component. 20+ active projects now. Have started to work with audio and video. Audio hasn’t been a problem but video serving is still being addressed at the university level.

    There are now some main Technical Services staff generating MODS.

    Future directions: NoteTab works OK for some but some users (particularly outside the library) really want a web interface. The scientific/medical communities at Brown are very interested in adding content but don’t have time for description. Looking at TEI this summer; the STG group have had great success training students to do TEI encoding. Looking at the overall staffing, looking for efficiency opportunities. Digital backlog is now larger than the analog backlog.
    Brown digital library site.

    DEMO: NoteTab Pro. Showed MODS tools (building MODS through prompts), using NTB to create METS record and package. THIS WAS A VERY COOL DEMO! Can’t really do it justice here in the notes.

    Speaker 5 (second speaker from second half): Terry Reese, Oregon State University

    Terry is the Digital Production Unit Head @ OSU and was named a 2005 “mover and shaker” by Library Journal. Terry has a software dev. background.

    Started by giving some background on metadata interoperability and metadata tools: proliferation of metadata schemes; differences in best practices is also a source of some problems. Cited the Indiana study that showed metadata creation costs of about $3/book for copy cataloging, $27/book original, $20/these .

    In some cases, things are being cataloged more than once: things that go into DSpace or ContentDM. Now, they only create one record and derive/repurpose for other uses.

    Challenges of interoperability: one-to-many, many-to-one transformations (this is the problem of going from less to greater semantic richness, or vice versa, same problem Moen touched on in his talk). Other problems include different hierarchies and “spare parts” – leftover content that doesn’t fit anywhere. It may be better to discard than to try to make non-fitting data fit?

    MARCEdit crosswalking tool uses MARCSML as control schema to facilitate transformations. Due to the nature of its design (network, or star), no more than two tranformations will take place (looks like a wheel).

    DEMO of MARCEdit. Transformed an EAD record to MARC. It also has a MARC editor for people who aren’t comfortable editing MODS directly.

    Also has an OAI harvester built in to grab OAI records and transform them into MARC. They use it at OSU to grab DSpace records and input into the library catalog.

    This was a great Demo and there is a lot more to Terry’s presentation than I’m reflecting in these notes. His PPT will have more detail. It was a very impressive tool and a wonderful way to end this long session; it gave me the sense that non-programmers could get their hands on some tools and actually do some transforming. A great way to become familiar with these various schema. See Terry’s site for links to MARCedit and other goodies.

    MODS, MARC, and Metadata Interoperability PART 1

    MARC Formats Interest Group (LITA/ALCTS)
    Monday, June 27, 1:30 pm – 5:30 pm

    Description from LITA site: Libraries face challenges in integrating descriptive metadata for electronic resources with traditional cataloging data. This program will address the repurposing of MARC data and metadata interoperability in a broader context. It will then introduce the Library of Congress’ Metadata Object Description Schema (MODS) and present specific project applications of MODS. Finally, the program will offer scenarios for coordinating MARC and non-MARC metadata processes in an integrated metadata management design and introduce tools for simplifying interoperability.
    Speakers: Dr. William Moen, University of North Texas SLIS; Rebecca Guenther, Library of Congress; Ann Caldwell, Brown University; Marty Kurth, Cornell University; Terry Reese, Oregon State University

    This was an extremely dense but immensely useful session; PowerPoint presentations will be available online at the ALCTS site some time soon (as of June 28 they are not yet linked).

    Speaker 1: William Moen, Texas Center for Digital Knowledge, University of North Texas
    Summary from Claire: Moen put into very succinct and very clear language the reasons why we (librarians but more specifically catalogers) have to begin to know standards other than our own.

    Speaking on metadata interaction, integration and interoperability

    Problem statement … is there a problem? We used to think of interoperability as a systems problem; we now understand that there are different levels to the problem. There are many metadata schema, some well-documented and well-known (AACR2), others less so. Ditto for content standards. There are also a variety of syntaxes (MARC and XML, for example). Lorcan Dempsey calls this our “vital and diverse metadata ecology.” We don’t really have a problem UNLESS we expect these various standards to interact, which of course we do.

    So we are moving from a systems-oriented definition of interoperability to a user-oriented definition, Moen suggests a preliminary framework to help scope the work. Look at communities of practice: who is our community? Libraries, archives and museums are fairly tightly-knit communities with a good understanding of standards. As we try to cross into other communities, however, the costs of interoperability go up.

    Communities of practice, two types:
    -Networks of professionals (librarians, etc.) have similar language and shared meanings
    -Information communities are looser organizations , and include the creators of information, managers of information (librarians/catalogers), and users.

    Godfrey Rust (complete citation for this and other references will be in Moen’s ppt preso when it goes online) divides things into: PEOPLE, STUFF and AGREEMENTS.

    Interoperability cost vs. functionality. William Arms’ curve of cost v functionality (graph & cite in ppt). OAI harvesting, for example, has lightweight requirements, so it is easy to implement but less functional. Federated searching/Z39.50 is highly functional but more costly to implement.

    The library has developed very sophisticated structures over time. In the larger scheme of things, over time, probably these structures will not be as broadly adopted. The time is now: this is our opportunity to act if we want to try to see our standards adopted more broadly.

    There probably will never be ONE canonical metadata scheme BUT we may all be able to agree on XML, which is a great step forwards. Some apparently simple schemes like Dublin Core turn out not to be so easy to implement in actual practice. We do not want to be further marginalized, we want to (have to) learn to play with others and have to get over the “not invented here” syndrome.

    Mechanisms to address interoperability (with the fundamental assumption that there will NOT be one basic standard):
    -Application profiles

    Crosswalks and mapping. Mapping is the intellectual process of analyzing the standards and making matches. The crosswalk is the instantiation of the map. 1998 NISO white paper on crosswalks. This activity is successful when accomplished by someone who really knows the standards on both ends of the map: catalog librarians who know AACR2 will be responsible for becoming knowledgeable about other standards so that they can lead the mapping/crosswalking activity.

    Difficult decisions to be made while mapping include: should it be one-way only or reversible? Reversible/round-trip: MARCXML < -> MARC. MARC -> MODS, however, is not round-trip, there is some loss of data, albeit perhaps slight. So is the mapping one-to-one, one-to-many, many-to-one, etc.? Other difficulties include vocabularies: how to go from controlled to uncontrolled? For example, how does one indicate in Dublin Core that the subject is an LC heading?

    Mapping to an interoperable core. OCLC is working on this problem, trying to come up with something rich enough to act as a core: all things map to the core and then out again to other forms. They’ve been looking at MARC as the possible basis [note: see Terry Reese’s presentation on MarcEdit; he was the last speaker in this program]

    Application profiles: same elements used in different ways, and with different meanings. These uses can refind the standard definition of the element as long as the fundamental meaning is unchanged.

    Registries are necessary for application profiles to be successful. Ex: UK Schemas, EU Cores, others (see ppt)

    RDF is the foundation of the semantic web and is a grammar for expressing terms, semantics. Moen admits his difficulty with RDF. Is important, but struggles to explain it.

    Conclusions: Libraries are just ONE of the communities, we do not have a central role, but we may have a priveleged role thanks to our long experience. Some librarians continue to think that cataloging is different from metadata generation. We have to think about interacting with other communities. The challenge is to develop tools to hide the differences between formats (hide them from users of our systems). See Roy Tennant’s recent article about transparency. Moen demoe’d an SRW search on LOC which can show the data in MODS format or in XML, or in DC, etc. This is a good example of transparency: give the data to the user in a format that they can use.

    Speaker 2: Marty Kurth, Cornell University Metadata Services

    Provides services to faculty and others on campus. Interested in repurposing the library’s MARC. Metadata management design. What does all of this metadata mean for our shops and how do we set up systems and services that support interoperability over time? His preso is based on an article for Library HiTech that he co-authored 2004 (22:2).

    Explains what is meant by ‘repurposing MARC data:’ being able to reuse MARC outside of the library catalog. Example collections: Making of America (MOA), Historical Math monographs, HEARTH home ec. collection, May anti-slavery, Historical literature of agriculture. All 5 of these dl projects had print counterparts and thus MARC to build on.

    Metadata processing involves: mapping, defining relationships between schemas; transformation, the process of moving between schemes; and management, coordinating the tasks and the resources.

    Metadata management challenges: workflows are not yet well established. Mapping and transformation is not happening all in one place, it is happening all over the library and may not be well documented, or if it is, the documentation may be scattered. Goal was to move from projects to process.

    Why is repurposing MARC a logical place to begin? Firstly, we’ve got lots of it. Allows them to maximize the potential of the data. MARC mapping can be expensive; cost goes down as tools are developed. Typically this work is done by specialized staff for whom opportunity costs are expensive. It can be messy and difficult, it probably will generate multiple versions of data and records, etc. Thus, a good challenge.

    Collection-specific mapping variations are inevitable. MOA, May, HEARTH all involve TEI. Handling of date transformation between MARC and TEI, for example, varied between the MOA and the May collections. The mapping was further complicated because each project was delivered with a different platform (DLXS, EnCompass, and DPub). Each project had slightly different needs. Work was performed in different areas of the library.

    MARC mapping models. How to deal with the collection specificity? Looked at LOC MARC-> DC, but made local decisions on additional fields. Sought feedback on this library-wide.

    Managing transformations. Transformations also vary from collection to collection. Some were performed by vendors. Scripting and XSLT trans. were later implemented. The library catalog is still the database of record. The scripted approach to transformation extracts the MARC, transforms it into XML, and combines it with other data including admin and technical md, OCR’ed text, etc. The XSLT approach involved writing transformations to accomodate the possible entirety of any MARC recrod; the metadata staff then customize the XSLT for their particular collections. It is easier to tweak and modify as the project unfolds. Documentation is critical and had been lacking in the past. It is a key component in management of metadata over time.

    Metadata management: coordinating the intellectual work AND managing the tools and files that are products. The tools and process are resources to be managed. Important to know the user community for these tools and their needs for using and accessing them.

    Strategies: inventory existing relationships and processes (this is not something Cornell has specifically done). Identify the staff who will be responsible over time and who will mentor. Requires strategic buy-in. Important to communicate the importance of this more than once. [Marty’s ppt. here gives a useful example of such an inventory]

    Concrete next steps: how do we build a culture to embrace this? Develop reusable transformation tools. Build library consensus on mapping. Create a culture and a practice of sharing and revising. External stakeholder discussions, library-wide. Talk about the risks of NOT managing tools. Think about creating a repository for metadata management tools that is searchable.

    Speaker 3: Rebecca Guenther, Library of Congress

    Rich descriptive metadata in XML: MODS. Overview: background on MARC & XML, MODS intro, MODS’s relationship to other schemes.

    MARC and XML. We have large investments in MARC. Cataloging is an early form of metadata. Trying to retool to exploit flexibility of XML. Also trying to anticipate receiving metadata in other formats in XML or as part of a digital object.

    Evolution of MARC21. Until now, MARC has been both a syntax and an element set. In current environment, XML is being used more and more and more tools are available. Diagram shogin transformation from MARC21 to XML. First transform to MARCXML in order to be able to do other things (validation, etc.)

    MARC 21 in XML. MARCXML is lossless and capable of round-trip to MARC. Once it is in XML, we can then use stylesheets/XSLT to present in different environments/interfaces.

    MODS is a derivative of MARC. It uses XML Schema. It was initially thought of for library applications, but they are seeing other uses and implementations.

    Why bother? There is an emerging initiative to reuse metadata in XML: SRU/SRW, METS, OAI, etc. Looking for something richer than Dublin Core. Before MODS, not much in between MARC and Dublin Core. MODS is a core element set for convergence between MARC and non-MARC XML.

    Advantages of MODS: it is compatible with existing library database descriptions. Richer than d.c., simpler than MARC, partly because the language is more readable than numerical tags. The hierarchical structure more readily supports rich description of complex objects.

    Features of MODS. Uses language-based tags which share definitions with MARC. Description is rul agnostic. Elements are reusable and not limited as to number of sub-element. For example, the name tag can be used throughout the record, in author fields but also as part of related item-subject. Redundant elements can be repackaged more efficiently [Rebecca’s ppt will be useful here to clarify these points]

    Status of MODS. Started a MODS listserv in 2002. #.0 has been stable for about a year. 3.1 is coming out soon, doesn’t change anything in 3.0 but has been reordered to be compatible with MADS (Marc Authority). Registered with NISO.

    Relationship to other schema. General-purpose and compatible with MARC. More broad than many other formats (EAD, ONYX, etc.) Difference between MODS and Dublin Core: MODS has structure, DC is flat. Can more precisely modify/qualify fields in MODS, for example, publication info can be related to date in MODS, can’t in DC. MODS is more compatible with library data. MODS can include record management information.

    MARCXML vs MODS. Demo’ed music records in MARC, MARCXML, MODS. May not be exactly the same specificity when converting from MARC-MODS but most of the record converts.

    LC uses of MODS. Using to describe electronic resources (AV project, web archiving). METS. SRU/SRW implementation offers records in MODS (this is one of the available choices).

    MINERVA web archiving project. Exploring born-digital materials. Used MODS native (vs. creating as MARC and then converting to MODS); perhaps will some day put into the library catalog, but perhaps now. For web archiving, created 1 collection-level record, individual MODS records for each object.

    Election 2002 web archiving: cataloged the datea, creating MODS records for each site, some of which were captured more than one time. Other web archiving projects, yet to be cataloged: 9/11, 107th Congress, 2004 election.

    Demo’ed 2002 election archive. Used XSLT to transform MODS to HTML. Link to the archived site. Showing MODS in XML – date captured data includes start and end points for capture. Decided not to link to the live site, which in many cases disappeared almost immediately after the election anyhow.

    107th congress website archiving. Did in-house (MODS cataloging at LC). Used XMLSPY to catalog. Built own search and browse. Browse has drop-down menus to select the house or senate ctte.

    Iraq war. Now have an input form for the catalogers to use as they catalog w/drop-down menus, etc.

    I Hear America Singing project. METS + FEDORA w/MODS. METS packages all metadata and all digital objects, including sounds, CD covers and other images, etc.

    Other MODS projects. MusicAustralia and Screen Sound Australia are using MODS as an exchange format.

    Directions for MODS. Continue to explore interactions with METS. Continue to use for digital library projects @ LC. Richer linking capabilities than MARC. Website archiving. Looking at MODS tools, looking at using it with OAI as an alternative to D.C>

    Q&A for the first three speakers
    Q. When will MODS 3.1 be out?
    R.G. Had hoped last week, but within next few weeks. 4.0 will be a complete rewrite and is in the workds but will take more time, require broader discussion, etc.

    Q. As Cornell attempts to shift from a projects-oriented approach to a program-oriented approach, what will happen with the collection-specific approach, and have they talked about using MODS?
    M.K. Talk about it all the time but there is some political drag to this idea.

    Q. About LC web archiving; are any of the keywords or other data automatically extracted from web sites as they are archived/cataloged?
    R.G. Yes, worked with their IT folks who extracted from the HTML. For the Milstein project (music project from I Hear America Singing) the metadata was all manually created, not extracted.

    Q. Will MINERVA records go into library catalog?
    R.G. Initially, though ILS was where all the records had to go, but with emergence of federated search, are no longer thinking this is the case.

    Q. MARC records are dynamic and maintenance is possible (update an authority record, all records linking to it are updated)
    M.K. Still consider library catalog to be the catalog of record. Haven’t established periodicity for refresh but it is possible to do this, built in to their design.


    Tiny Trackers: Protecting Privacy in an RFID World

    Thankfully this RFID session was much warmer than the experts panel at the Hotel Intercontinental the previous day. Interestingly, I found it to be less well attended. About half of the seats in the ballroom were filled up. I suspect that the LITA top technology trends program drew a lot of potential audience members away.

    Overall I found the panelists — Jim Lichtenberg, Jackie Griffin, and David Molnar — to be entertaining and informative. I was familiar with much of the content but learned there is still work to be done as privacy issues have not yet been completely resolved in library RFID.

    Lichtenberg, a library technology consultant and regular Library Journal contributor, provided an overview of the technology. We’re still at a point where many librarians don’t fully understand how the technology operates and so this explanation was welcome even if it was a bit repetitive for the more RFID experienced members of the audience.

    Lichtenberg used Alice in Wonderland — to much humorous effect — to explain how using RFID is like “going down the rabbit hole.” He says the technology is truly transformative and although we don’t really know what the result will look like at the bottom of the well it will be a wonderland when we arrive.

    People are simultaneously excited by and terrified of RFID because it is the leading edge of a much more profound transformation of society, says Lichtenberg. He predicts we will experience more intense change in the next 20 to 25 years than we did with the advent of the Internet. The reason? Rapid advances in nanotechnology, biotechnology, information technology and the cognitive Sciences (NBIC). Lichtenberg discussed current research which could lead to nano robots being surgically implanted into humans to repair tissue and biotechnology that could lead to the reversal of the effects of aging.

    Accelerated technological change IS frightening. The key issue is the creative tension between the benefits of the technology and the need to protect privacy. At this point Lichtenberg listed the advantages (widely available, relatively inexpensive, better inventory control, increased self-check etc.) and disadvantages (high start up costs, indirect return on investment, immaturity of middleware, lower than expected accuracy and immature standards). Lichtenberg says that libraries considering implementation need to focus on supporting their clients and better understand their needs. Normally we think of RFID data-flow in libraries in only one direction. Information passes from tag to reader to middleware to library systems. The backwards flow of information, says Lichtenberg, will actually provide more important business intelligence for libraries. He reminded me very much of Lawrence McCrank from the Saturday RFID program with his call for intelligent and creative applications for library RFID which better serve our users needs. RFID can be used to push information to people.

    Lichtenberg wrapped up his presentation with a metaphor of a glass bottomed boat. As time goes by the muddy waters will clear and RFID will allow us to understand exactly what’s going on in the library. It’s not about tags and readers but transparency. What can we learn with the data?

    The next panelist was Jackie Griffin, director of the Berkeley Public Library. The Berkeley Library has come under much public scrutiny during their implementation of RFID. Griffith explained the history of the local movement protesting the installation and provided advice to librarians considering RFID so that they could avoid making the similar mistakes.

    Berkleyans, says Griffin, have a long tradition of protecting free speech issues. The Board of Library Trustees (BOLT) approved the library’s purchase of RFID over a year ago but the protesting didn’t begin until after the San Francisco Public Library proposed their implementation. Groups such as the Electronic Frontier Foundation, the American Civil Liberties Union, and Berkleyans Organized for Library Defense (BOLD) have weighed in against RFID in libraries. The most recent protest was only a week ago. A small number of protesters went to Berkeley City Hall to request that funding allocated to the library be removed if they continue with RFID.

    At this point, most of the conversion has been completed and it’s unlikely that RFID won’t be used at Berkeley. Griffin says she is very comfortable with the decision to go with RFID. The library has had enormous expenses for repetitive strain injuries. These expenses were enumerated by a consultant hired by the City of Berkeley to analyze costs. In addition, the library had capital funding to double the size of their building but they had no corresponding increase in operating budget. In order to serve more people with the same number of staff they needed to turn to technological solutions.

    During the course of investigating RFID Griffin was very involved in work outside the library protesting the Patriot Act. Griffin has a long history working with the intellectual freedom committee of the California Library Association. She is aware of privacy issues and government interference with freedom to read but says it didn’t occur to her at the time that RFID would be a risk. She cautioned the audience to be very aware of the potential consequences of any action they may take with technology.

    Once she was aware of the risk she had Lee Tien of the EFF come and speak to llibrary committee managing the project. She also consulted with authoritative experts such as David Molnar, a UC Berkeley doctoral engineering student, and the Samuelson law clinic (which specializes in the legal implications of emerging technology). These experts helped Griffin and her staff to draft their RFP and to develop best practices (which are posted on their public web site). They interviewed five vendors and selected they felt best addressed the issues.

    Griffith says that a bigger intellectual freedom issue is access to information. Many public schools in Berkeley lack media specialists and 30% of Berkleyans do not have a computer at home. If there is such a concern that library rfid tags may be used by the government to interfere with things people read then the real question is what the government is doing. Griffith says that RFID has allowed the Berkeley Public Library to reopen on Sundays and to return their book purchasing budget to near normal levels.

    The final speaker was David Molnar. Molnar continues to be interested in RFID security issues and he provided the nitty gritty details about how a library RFID system could be compromised. These risks are outlined in his paper, “Privacy and security in Library RFID” ( and they include: hotlisting, denial of service attacks, and vandalism. He discussed questions for librarians to consider in order to evaluate the risk to their constituencies.

    The first question is determining what is on the tag. Every library implementation that he has seen only uses barcode information and possibly, depending on the vendor, a security bit. Limiting the information on the tag limits what an adversary can do. Some might argue that it’s just a barcode which can’t be mapped to a book title without information from the integrated library system. Although libraries secure their ILS, it is now even more important to do so.

    Older library RFID tags which use the ISO 15693 have a static identifier burned on at time of manufacture. Some libraries have unique prefixes in their barcodes which can be used to make inferences. Any persistent identifier enables tracking via hotlisting, which is the creation of a separate database of items you know in advance. For example, you could read the tags of every copy of Osama Bin Ladin’s biography. Then you could use your reader and preexisting database list (the hotlist) to identify scholars of the middle east.

    The second question is who can read the tags? Anybody can obtain a reader that can detect the 13.56 Mhz frequency. The largest observed range he’s aware of is 3 feet, but getting a read from that distance requires a specialized antenna. Most reads are only viable in the range of inches. t’s the ubiquity of readers which will be a problem. When readers are installed at every Starbuck’s then those people carrying the Bin Laden book can possibly be tracked.

    The third question is who can write the tags? If a tag is re-writable then it needs to be locked against vandals in a security bit denial of service attack. Vandals can write their own information to the tag and lock it against any further writes effectively destroying the tag for library applications. Tag writing issues can also affect future upgrades to the system if a proprietary read/write protocol can’t be handled by another vendor’s system. Rewriting the tag at checkout could fix the hotlisting issue by removing the barcode as the item is removed from the library. Nobody sells such a solution yet and there are robustness issues. If the item is not checked back into the library properly than it won’t have the barcode information anymore.

    The final question is what type of encryption does the wireless communication between tag and reader have? Most systems are not securely encrypted and can be sniffed. There can be several meanings to the term encryption and you must understand what your vendor is doing if they say their product has it. If they encrypt using proprietary encoding that each library that uses the same vendor will have the same type of coding. Since it’s not different per library it can be reverse engineered. Some vendors encrypt the barcode information with a per-library key. This leads to static data and brings back the hotlisting and data tracking concerns. Finally some encrypt by pass wording the ability to read tags. How does the reader know which password to use? Is it the same for all tags or for each tag? The answer has serious implications. If it’s the same, then you’re back to the static identifier problem.

    RFID security is a multilayer problem, says Molnar, and you need to include privacy issues and the appropriate questions in your RFP. He recommends that libraries minimize the amount of data they put on a tag and that they test out vendors products in real-world settings. Tag readers are relatively inexpensive and can be used with open source software called RF-Dump ( Can you crack the system you’re interested in?

    There was a question and answer session between the audience and the panel members. Most of the questions were addressed to Jackie Griffin regarding the size of the collection and the protests in Berkeley. There had been reports in the news which conflated the purchase of RFID with library layoffs. Griffin says that they received a better budget than they anticipated and the layoffs didn’t happen. Library staff is getting more enthusiastic about RFID now that they feel it isn’t a threat to their livelihood.

    One audience member asked the panelists to comment on Lee Tien’s published remarks regarding the library as a moral compass regarding the use of RFID (if it’s ok for a library, then it must be ok everywhere). Lichtenberg says libraries are going to be far, far, far from the only place using RFID. The technology is so pervasive that libraries aren’t going to dictate its mind-share. Griffin says that what the profession is doing to discuss RFID is amazing. She is not aware of other communities raising and discussing these issues. Molnar says that the economics of RFID will improve with the tags becoming cheaper and increasingly used by industry.

    In sum, the panel was informative and a great review of the many questions librarians should consider if they purchase RFID. RFID security is still a research issue but the technology will not go away. Librarians are doing an excellent job raising awareness and discussing the issues but there is a need for more creativity in designing applications for library RFID that truly serve the library user.

    LITA Councilor’s Report

    [Updates underscored] Council 3 wrapped up at 12:15 on June 29, very timely and very congenial. Over our three Council meetings we had several items on the agenda that are very LITA-relevant. I will try to provide links to documents if/when they go online. Special thanks to our charming parliamentarian, Eli Mina, who in his Peter Lorre voice gently keeps us in line, and to ALA ED Keith Fiels for providing a rock-solid wi-fi connection that made many Councilors very happy. The days when 200 librarians could vanish for a week without any connectivity are long behind all of us. Note: we had over 27,000 registrants for this conference–an all-time record, 1,000 registrants higher than the runner-up (San Francisco, the last time we were there) and 7,000 more than Orlando.

    Council items LITA-worthy:

    Resolution in support of community broadband, out of Committee on Legislation, with strong encouragement from OITP. Pat Mullin commented at LITA Board that this is very “motherhood and apple pie.” In my own home town, Palo Alto, a “fiber to the home” initiative was ultimately stopped by (among other things) the chilling effect of potential legal challenges. Yet I can tell you that I had Councilors asking me what the heck this issue was about. No one on Council voted against it, but I wish COL had issued this resolution earlier, with a one-page or even half-page backgrounder (or even an email to Council), so the resolution’s vote could be a “teachable moment.” If you have community broadband stories, please do share!

    Biometrics. Initially Intellectual Freedom Committee was prepared to put a resolution on Council’s agenda very emphatically opposed to biometrics in libraries. This resolution was in response to two public library systems (Buffalo, NY and Naperville, IL) that are using biometrics for user login on Internet computers. I set aside my personal convictions about intellectual freedom (can I say, “EEK! Fingerprints?!”) to note that the resolution was poorly written and had not been reviewed by either Committee on Legislation or LITA. (This suggests an opportunity for reinvigorating lines of communication between LITA and IFC.) As strongly as I feel about intellectual freedom, I feel even more strongly that resolutions related to technology, particularly technologies new to libraries, need LITA’s input and should go forward with the most technologically correct insights and language possible. Expect opportunities for discussion prior to Midwinter and a hearing in San Antonio. I’d love to get input from LITA folk on this topic.

    Reduced division dues for retired ALA members. Remember that item on the ballot about lowering the quorum for ALA membership meetings? Well, it passed, and because it did, membership meetings have had enough people present (75 members based on current membership) to pass resolutions that Council then has to take action on. In this case, members expressed their concern about the need for reduced dues for division members.

    Initially I was prepared to vote against this resolution, as other division councilors were insisting that nobody else had business telling the divisions how to do business (“states’ rights,” as one Councilor put it). But after hearing the discussion on the floor, I ultimately agreed that it is reasonable for members to ask divisions to begin to discuss the dues structure for retired members. It is not a mandate—just a request to add this to our fiscal discussion. With a 66,000 member organization largely over 35, how we take care of our retired members will become increasingly significant. We gave the membership a tool to communicate with ALA governance, and we should be glad they are willing to use it to catch our attention on matters important to all of us.

    Minor tweak to ALA Strategic Plan. Someone in ALCTS alerted Bruce Johnson, ALCTS Chapter Councilor, that the ALA draft strategic plan mentioned state and federal standards but did not mention international standards. Bruce drafted substitute language, I seconded, and after some discussion, the new language passed. Note that I was able to cite LITA’s Strategic Plan in this discussion. Go LITA!

    On other fronts, Council passed resolutions related to workplace speech, “Threats to Library Materials Related to Sex, Gender, Identity, or Sexual Orientation” (encouraging state chapters to fight these heinous and hypocritical attacks on library collections), the Iraq War (primarily asking the nation to move its priorities from the Iraq occupation to services such as libraries), disinformation (we voted against it), and immigrants’ rights to free public libraries (particularly important in light of the latest attempts to implement national identify cards). A “Resolution on Equal Access to Resources in Non-Roman Alphabets in Libraries” passed Membership, but was defeated in Council (a resolution on cataloging that doesn’t emanate from ALCTS is doomed to fail). A resolution calling for ALA to establish a formal list of “Endangered Libraries” was referred to BARC (Budget Analysis and Review Committee) for fiscal review, but is not likely to pass when it reemerges. I voted for a resolution calling for ALA to ask U.S. News and World Report magazine to update its rankings of library schools, but as expected the resolution failed. Among our tribute resolutions was one for the “The Hollywood Librarian: Librarians in Cinema and Society,” a forthcoming film by the irrepressible Anne Seidl, who among her many accomplishments is a GIS geek.

    I have other reports I would like to process and run by you folks, such as the conclusions of the Committee on Conferences, but I’ll trickle these out over time. I enjoy representing LITA on Council, and encourage your comments, questions, and feedback. Let me know how I can serve you better!

    Greenstone Digital Libraries II

    Fortunately, my co-blogger came in on time and got the first part before laptop death set in. Meanwhile, I arrived late and got only the latter part of the session.

    There were probably about a hundred people at this session and the handouts ran out. Handouts and more have now been posted. (Thanks, Kyle!)

    I’ll add a few more notes to complement Claire’s summary:

    Allison Zhang

    The UI out of the box was very basic but it was customizable. Allison showed screen shots of different collections they had worked on, each with customized graphics.

    Two different kinds of materials were included in their projects: Digital collections (Dublin Core encoded) and Finding Aids (EAD encoded).

    The collections are available at Allison highlighted the Treasure Chest of Fun and Fact, which is a collection of some 20 years of comic books. For this collection they created metadata for each story, plus structural metadata to link from page to page within each story, and from each story in an issue to others in the same issue. They set up a conditional display format in Greenstone to show the table of contents for an issue differently from the page display. At the top of each page of content, there are links to page back and forth through the story. Her main point in showing all this: You must design your metadata.

    Tod Olson

    The Chopin Early Editions collection was designed to allow browsing by genres, e.g., nocturnes. In this project they used metadata to drive custom navigation features. Tod showed the pathway from data (catalog records, scanned images, structural metadata) to the integrated record in Greenstone. See PDF of his slides.

    Custom page-turner metadata contains previous-page and next-page information; if my notes are correct these were generated from the record creation process, a great idea!

    They are using Greenstone 3, with METS internal storage format, MySQL for metadata storage.

    The proof-of-concept project allowing music search by "playing" a displayed keyboard image was very impressive. Encoding the pitch intervals rather than the specific notes makes it much more likely to work. It’s good to see some progress on music searching like this.

    Laura Sheble

    The primary focus of the survey was support needs, but they also asked about characteristics of Greenstone users’ technical environment and collections:
    24 questions – general support mechanisms
    8 questions – collections and target audiences
    8 questions – contact info for follow-up and directory

    Unfortunately the legends on the charts were barely or not at all visible from where I was sitting, so I only caught a few of the findings that Laura actually said verbally; for more detail you can check the Handouts. A few selected highlights:

    • Total valid responses: 54
    • Most installations were on Windows OSs, fewer on Linux/UNIX
    • Large percentage of installations were in US/Canada
    • 93% of respondents were actively developing collections
    • About half were developing 0 – 1 collections, about a quarter were working on 6 or more
    • Half were university-affiliated, 20% regional or international centers, and surprisingly to me, about 7% were commercial enterprises
    • About 30% had a multilingual target population
    • Of the major support needs, many respondents cited local training and materials and local support organizations in their own country as needed. Interestingly, many of them were actively developing training materials.

    The survey is closed for this particular statistical analysis, but is still open for responses:
    Greenstone survey

    Q & A on Greenstone

    Q. Is Greenstone 3 available?
    A. It’s available; serving is in an advanced development state but building is still under development.

    Q. What is it written in?
    A. Greenstone 3 is Java-based.

    Q. OCLC ContentDM has resources behind it, while Greenstone is perceived to be hard to use. Wouldn’t I want to go with a commercial, supported product?
    A. There is a DL Consulting company. But open source software, by its nature, is not usually well marketed or professionally supported and documented.

    Q. We are a non-profit with an existing base of XML and our own DTD. Can this work with Greenstone?
    A. You could develop a plugin to translate your existing data to Greenstone internal format.

    Q. Does Greenstone support DSpace?
    A. It supports importing from DSpace internal format.

    Q. JPEG2000?
    A. Pretty much any image format (using Image Magic ?? I wasn’t clear if that was a required plugin for this).

    Q. Fedora?
    A. There’s work underway to support it as a storage format with Greenstone for the indexing and presentation.

    Q. What’s the maximum size of a Greenstone collection?
    A. About 11 million records. The BBC metadata currently contains about 3 million and is using Greenstone. There is a known bug after about 20 GB of text, but due to Greenstone’s compression algorithms it takes a long time to hit that limit. Due to support for cross-collection searching, though, you can segment a large corpus into smaller collections. Most people do currently use it for smaller collections.

    The joys of the Intercontinental

    Here is the room where the Greenstone session was held (the King Arthur Court room)!

    Greenstone Room Photo