Google and Libraries: What’s in Store for Google Print and Google Scholar

Boy, that was a packed program! I thought yesterday’s “Top Technology Trends” was packed. Today there were even more people. (see photos…)

Participants

What everyone came to see was the panel discussion featuring Google’s Adam Smith along with representatives from the five libraries that have agreed to let Google digitize their books. In order of seating, that was John Price-Wilkin (Michigan), Catherine Tierney (Stanford), Ronald Milne (Oxford), Dale Flecker (Harvard), and John Balow (NYPL). Maurice York (Emory) on the far left was moderator.

Google Print

Although the program was subtitled “What’s in Store for Google Print and Google Scholar”, most of the attention was paid to Google Print — quite rightly because it involves libraries handing over to Google the very things that make them unique, namely, their collections.

It soon became clear however that some of the libraries appear to be engaged in “Pilot Projects”. Harvard for example, is starting out with 40,000 volumes.

Why Google?

The motivation for doing this was obvious: Google has the kind of deep-pockets (or claims it has) to undertake digitizing entire libraries — at a rate far faster than the libraries themselves could manage. It also has, in the words of Dale Flecker, the “nerve” to do it. When Google told Michigan it wanted to digitize their entire collection, John Price Wilkin called it an “amusing story”.

What the libraries get in return are their books back (natch) plus a digital copy of the material. What the libraries will do with their copy isn’t immediately clear. John Balow conceded that these are still “early days”.

Even Adam Smith admitted that Google is in “research-mode” concerning some aspects of both Scholar and Print. Its technology “continues to evolve”.

Access & Preservation

What isn’t in doubt is the increased access these titles will have once they’re part of Google. “It’s all about access,” Ronald Milne emphasized. For Oxford, the notion is to bring the “Republic of Letters” into the 21st Century.

Not addressed are issues of preservation. Indeed, with the exception of Michigan, most didn’t think this was a “preservation project”. The kind of “industrial” process that Google is using (mum’s the word on what it actually is) can only be used on books that are in good shape.

That said, it would be “more possible”, in the words of Catherine Tierney, for libraries like Stanford to concentrate on their more unique materials — with Google handling its part. Dale Flecker thought it might also make things cheaper.

Copyright

Google intends to scan everything including books not yet in the public domain. The user will only see “snippets” of works where Google has no agreement (read, permission) with the publisher. This naturally raises questions of copyright infringement.

Adam Smith stressed that Google wasn’t setting up a “book distribution system but an indexing system”. That said, Smith admitted that copyright is a “complicated issue”. He suggested a public listing of “orphaned” works post-1923 so everyone would know what was in copyright and what wasn’t.

Trust Google?

One recurring theme was whether it made sense to put so many (library) eggs in the basket of what ultimately is a profit-driven corporation whose first loyalty is to its stockholders.

None of the representatives seemed disturbed by this. As John Balow explained only half seriously, “We rely on the generosity of strangers. This is just another day of work.”

And what if Google should pull out?

“Time will tell,” John Price Wilkin concluded.

Links

Google Library Digitization Agreement With University Of Michigan… (Search Engine Watch).
Includes link to the U-Mich/Google Agreement

Michigan Digitization Project
Good information about Michigan’s project plus links to the other Libraries in the Agreement.

Don’t Get Goggle-Eyed Over Google’s Plan to Digitize. (Mark Y. Herring, Chronicle of Higher Ed. March 11, 2005).
Looks at the Agreement with a Grain of Salt.

Review of Google Scholar (Martin Myhill, Charleston Advisor – April 2005)
Balanced — even helpful — review of Scholar.

10 thoughts on “Google and Libraries: What’s in Store for Google Print and Google Scholar

  1. On the Google Scholar front: Adam Smith mentioned that Google had received permission to use full text from everyone “except Elsevier and ACS”.

    He also mentioned collaboration with OCLC Worldcat.

    Also, before I forget, perhaps the most “poignant” moment — well, kind of — was when Peter Murray (OhioLink) got up near the end and publicly thanked Google “for not being evil”.

    The funniest moment was when someone asked Google “to give 10 million dollars to OCLC.” Smith from Google responded: “Thank you for your recommendation.”

  2. For quick access, since I don’t yet have editing capabilities on the ALA web site (c’mon, ALA web office, get me in there….), I have posted the handout for the program and the questions the panel was asked on my personal web space. I’ll get these moved over to the Emerging Technologies Interest Group page once I get access to edit it.

    The program handout (Quick Facts on the Google Five) is at:
    http://leep.lis.uiuc.edu/publish/mcyork/google/QuickFactsOnG5.pdf

    The panel questions are at:
    http://leep.lis.uiuc.edu/publish/mcyork/google/GoogleQuestions-ALAChicago.pdf

    These questions were drawn from a poll I sent out on various discussion lists, such as LITA-L and web4lib, as well as from various articles that have been published in the last few months. Thanks to everyone who contributed questions and topics.

  3. Maurice, thanks for the additions. To tell you the truth, by the time I made it to the hall, all the handouts had aready been grabbed up! So I wasn’t even able to get a copy for myself.

    Also, this might not be possible, but it’d be nice to have a copy of Adam’s presentation. He may not want to distribute it but there were a couple things I missed while standing up — there were no more seats left — and typing furiously with one hand while using the other to hold up my laptop.

    Tough conditions for a blogger.

  4. Google Print uses JPEG 2000

    Heard during the Google and Libraries: What’s in Store for Google Print and Google Scholar presentation at ALA: participating libraries can receive either G4 TIFF or JPEG 2000 image files for the scanned books. No word yet as to whether they are JP2s

  5. Was there any discussion of Google digitizing federal government documents, which are mostly in the public domain? If so, did you get the feeling that Google was trying to find collections of government publications (i.e. whole depository libraries) or just happened to digitize government pubs that came with other library materials.

    Google Print could have some interesting interactions with a federal effort to coordinate the digitization of the so-called “Legacy Collection”:

    DIGITIZATION OF THE LEGACY COLLECTION GPO continues development of a Registry of U.S. Government Publication Digitization Projects. The Registry project will discussed at the GODORT Government Information Technology Committee (GITCO) meeting on Sunday, June 26 from 2:00 p.m. to 5:30 p.m. A GITCO working group is assisting GPO in the development of the Registry. It is being designed to complement the GODORT Clearinghouse of Government Documents Digital Projects and will contain records for projects that include digitized copies of publications originating from the U.S. Government. GPO and volunteer contributors from libraries, government agencies, and other non-profit institutions will help to develop the Registry by inputting records about digitization projects that are planned, in progress, or complete. The anticipated launch date is in fall 2005, when GPO’s Digital Media Services will start digitizing FDLP legacy publications. A draft white paper on the priorities for digitization is circulating internally and will be posted shortly for public review and comment.

  6. Hi Daniel,

    I probably missed a few things but I don’t recall Adam from Google mentioning GovDocs. He did go over future developments for a brief moment at the beginning of the presention. He mentioned “adding books; international, multi-lingual products; include works from other projects, create products that all libraries can leverage.”

    Of course, simply because he didn’t mention it, doesn’t mean they aren’t thinking of it. I got the impression that they were trying to grab everything in sight. You’d think the “free” stuff would come first.

  7. I’m not sure that Adam will want to distribute the presentation, but I will ask.

    I don’t think gov docs were addressed at this program, but at the Google program at ACRL in Minneapolis, a member of the audience asked about government repostitories and government information. I believe Adam’s response was as Leo guesses–that Google is interested in all sources of information and is considering every avenue for adding as much content as possible to increase the value of the index.

    I think it is important to note in this context that Adam placed a heavy emphasis on the fact that Google is all about “indexing, indexing, indexing”, not delivery of content. Like Google’s main index of web pages, Google Print is intended as a discovery tool, not as a vehicle for transferring or archiving the objects themselves (in the same way that you can view a cached copy of a web page in Google, but the cached copy is not what you would look at if you really wanted to look at the web site).

  8. Leo and Maurice,

    Thank you both for your answers about gov docs. I have to admit that from a govdocs perspective, the most exciting thing for me would be to have a copy of the publications Google was digitizing, like how the University of Michigan is receiving a digital copy of every book Google is digitzing. Although it’d be nice to have most gov docs indexed by Google as well.

  9. […] Since the rest of the sessions I went to have already been covered, I won’t bother going into detail. Monday’s Google and Libraries session, left me feeling a bit concerned with how in love with Google John Price-Wilkin from the University of Michigan seemed. While the representatives at the other Universities expressed legitimate concerns about working with Google, Michigan didn’t seem to have any (see their contract with Google for more details on the deal). While the others did not see Google Print as a “preservation” project, the Price-Wilkin said that the initiative was about preservation for the University of Michigan. Unfortunately (or unsurprisingly) no one talked about how they were going to preserve these digital surrogates. Contrary to Mr. Price-Wilkin’s statements, digitization is NOT preservation. It’s about providing access. The average lifespan of any file format is approximately three to five years, making it impossible to access the files on contemporary machines without some preservation measures being taken. There are a variety of methods for digital preservation, however all are expensive or time consuming, and all are incomplete solutions. Without employing some form of digital preservation, however, the files will quickly become obsolete and inaccessible. And researchers have suggested that preservation could end up costing a good deal more than the original digitization effort. I’d just like to hear that these schools have some sort of plan for preserving the digital surrogates, because there is no guarantee that Google will stick with it. In addition to the copyright issues and the fact that none of the schools seemed to know what they will do with their own copies, I still have some diabout this whole thing. Like Roy Tennant, I wonder what impact the Google Print project will have fair use and the funding of other non-Google digitization projects. […]

  10. For anyone interested in copyright issues, specifically with respect to U.Michigan’s plans to digitize all 7 million items as soon as possible, there are new resources and links now at Google-Watch. I am the person who filed the freedom of information request with the University, that revealed their confidential agreement with Google.

    The main page is at http://www.google-watch.org/umich.html

    There’s a page that focuses on my Section 108 concerns and my attempts to get U.Michigan to reconsider their legal position at http://www.google-watch.org/busted.html

    I feel that Google must be stopped, and the easiest way to do this is to deny Google access to copyrighted material by serving a court order on U.Michigan based on Section 108. Some legal scholars feel that this would be a slam-dunk under current copyright law, and it’s mainly a matter of finding an organization with legal standing to prepare a cease and desist, and the determination to take it all the way up the Supreme Court (Google has indemnified U.Michigan against this, and U.Michigan will most likely fight it). Libraries are not the rights holders, so I’m trying to interest the Association of American Publishers, or the Authors Guild, or the National Writers Union, or some other umbrella organization that represents rights holders.

    I know that librarians are anti-copyright, but in this particular case they should seriously consider helping any efforts to stop Google. A better deal from a more public-spirited source of funding will come along within ten years. Rights holders and libraries would be smart to deny Google access at this point in the history of library digitization. Google is not the way to go.

Comments are closed.