2006

Copyright and mass digitization

This session was well-buried in the programming and meeting announcements, but the room was still full for an interesting discussion of copyright issues in the dawning era of mass digitization. The session was sponsored by the OITP Advisory committee and moderated by Clifford Lynch, who was introduced by saying that he needed no introduction to this crowd, rather than to say he is a “force for good.”

Panel participants included Jonathan Band, Esq., a copyright attorney; Liz Bishoff, Executive Director, Colorado Digitization Project; and, Dan Greenstein, University Librarian for Systemwide Planning and the California Digital Library.

Clifford Lynch gave a short introductory statement laying out the context for the discussion at hand: large-scale digitzation and its implications for libraries and other cultural memory institutions, such as museums, historical societies, and even public broadcasting. The Google program has captured the most press recently, but it is not the first and it won’t be the last large-scale digitization program. Since 1971, Project Gutenberg has been making out-of-copyright works of literature and social science available online. It has been going systematically for 30+ years, with tens of thousands of works. A great deal of digitization is going on within libraries as well. Major examples are American Memory and NYPL’s image library as well as numerous projects that go on with grant funding or local funding. We are moving into a period where massive collections will be available. How do these relate to library collections? What does it mean to make the material available? What is appropriate to make available, beyond just in/out of copyright? Example of criteria for American Memory. What technically does it mean to make these things available?

The three panelists then shared short statements on the subject at hand.

Jonathan Band:
Jonathan’s goal was to lay out four different approaches to the problem of copyright and mass digitization that various projects have adopted. The first is to digitize only works in the public domain, for example, the Microsoft deal with the British Library to digitize 100,000 books. This is a valid approach, but limited, since only about 20% of the world’s books are now public domain, and that number is not likely to grow anytime in the near future. The copyright terms keep getting extended, perhaps indefinitely, and there is no way to know when a large portion of these will become available under public domain.

Another approach is to digitize works with the permission of authors and to use tools such as creative commons licensing. Google and Amazon both do this with their publisher programs.

The third approach is best exemplified by Google’s library project, and is the most controversial. This approach relies on the Fair Use argument to assert that the copying of entire works is legal since only “snippets” are ever displayed to users.

The fourth approach tries to take up the question of orphaned works. These are the works that are technically still within copyright, but whose owners cannot be located for whatever reason. These tend to be works of high cultural but low economic value. The ALA Copyright Office is expected to come out with proposal on how to deal with orphaned works at the end of this month. Will deliver recommendation, which hopefully will go to legislation. The recommendation would be that we should encourage the use and reuse of these works of cultural value that have limited economic value. There would potentially be some kind of agreement in the legislation that if the owner reappeared and claimed their copyright, the library could take the items down with no damages owed (currently minimum fines for copyright infringement could run into the hundreds of thousands of dollars for, say, a collection of photographs from the 1930s, a major disincentive to a historical society or other institution thinking of digitizing them).

Liz Bishoff:
Liz wanted to turn attention not only to the legal and ethical issues, but to the pragmantic issues of mass digitization. We have moved into a transition time in which the question is no longer “can we digitize on a massive scale,” but can we do it on a massive scale that is economical. We have proven that it can be done technically. Now we have to find ways of working with publishers, as well as ways of building interfaces to make the collections usable.

In a landscape of digitization that includes parties outside of cultural heritage institutions, we need to build on our expertise to provide what other institutions and corporations cannot. We need to bring our archives and special collections out of the catacombs where we have preserved them for 100 years to our users. Staff need to be retrained, redirected. We need new funding and digital assets management to deliver these collections.

Liz pointed out that these archival collections must be drawn together with the products being created by students and faculty, by the members of the community, to establish open access strategies and build institutional repositories. We can’t believe that the library catalog is the gateway to all knowledge anymore.

As for what this means to ALA, the association needs to address this issue in education above all. The digital library curriculum needs to be taken up in earnest in our graduate programs–no one is talking about it. We also need legislation at the federal level that will make mass digitization by cultural institutions easier. Most of all, ALA needs to develop policy statements on these issues so that we do not have to leave it to the President to make statements that are his own “personal view and not the association’s.”

Dan Greenstein:
Dan rounded off the panel by taking up the question of what all this means to libraries. He made the case that libraries need to be much smarter about how we manage our collections and how we allocate resources in light of the potential of mass digital collections. His points were strong and well-reasoned. While not many librarians would argue with his points, we have yet to come close to putting any of it into practice.

We need to stop renting content (Dan wanted to be clear that he was purposely using the term “renting” instead of “licensing”, since “license” gives some impression that we own the content) to realize substantial annual savings. There are major opportunities in collection management if we do this right. OCLC indicates that over 26 million items are held by 10 institutions or more. At the 8 University of California government repositories, 93% of government content is redundant. This could represent huge savings in shelf space and money. Looking at the early JSTOR titles alone (23,000 copies), the UC libraries hold 5 copies of each between them. This small collection alone could represent 3.8 million dollars in savings.

Digitally reformatting collections means we can do less “care and feeding” of physical collections and realize substantial savings. The goal of keeping large physical collections on site has always been access, but with digital copies we can save huge amounts of money that we pour into the housing, climage control, etc, needed to provide physical access. We also get the opportunity to provide richer service environments to our users by taking content that OCA, Google, and others are putting out there and building a customized layer of services on top of it for our users.

Dan identified four major issues for libraries in approaching mass digitization: 1) Massively digitized collections need to exist at a third party repository in perpetuity. It’s not good enough for vendors or institutions to say, “we don’t preserve it, we just host it.” Individual libraries should not be in the business of massive data archiving; our expertise should be building services for delivering the data to users. 2) Collections need to be built in a way that supports repurposing the presentation and configuration for local communities. 3) The terms and conditions of access in formatting and developing collections needs to be transparent. As an institution building services on top of data, I need to know the underpinnings, the standards, etc. 4) Money. Do we really need new money? If we’re not pouring money into online rental, if we’re saving on collection management, if we don’t have to support the backend digital management process, we can make a fundamental change in how we allocate our resources.

The program closed with a round of Q&A, which, though stimulating, I will spare the blog since this post is growing long.