Web 2.0 – Becoming Library 2.0

Stephen Abram, VP of Innovation at SirsiDynix, closed out the 2006 LITA Forum on Sunday morning.

One of his first statistics was that libraries collectively ship and circulate more than Amazon.com every day. But we’re not like Amazon in a lot of other ways. We’ve decided we should be making decisions for our patrons, instead of letting them choose. Why don’t we have Amazon-style recommendation engines? We could at least give the user a choice of whether or not to keep their history private.

It all comes down to the user – we need to understand them better before we do something like charge in to fix the OPAC, convinced we know what is wrong with it.

Libraries do community better than Google – it’s our trump card, but only for now. We can’t cede that ground to the giants. Google Scholar has deals with about 200 database providers. What if they decide to parse your Gmail account and deliver scholarly articles targeted extremely specifically to you? In five years they’ll have 50 million books online to apply that methodology to as well.

Abram went on to talk about how students are going to use Google and the like heavily no matter what we do. We need to focus more on teaching them how to use the tools in Google well – advanced queries instead of just a couple words. We can also educate our high school and college age users about the reality of Google. How many students know just how extensively search engine optimizers manipulate the top Google hits on hot button issues?

Millennial users have higher IQs and more advanced brains, but ultimately believe their skills are more advanced than in reality. They’re format agnostic, and don’t want to have to deal with separate databases for their searches. Abram says we need to be constantly looking at realities like this, on a rolling five year planning horizon – who’s on their way to us?

And our future users really are changing – video games support a wide variety of learning styles beyond the traditional, and meanwhile sites like Facebook are allowing students to create a sustainable social network for life. Even if the definition of ‘friend’ on social network websites is different than what previous generations think of.

Abram showed an image of a giant swiss army knife with dozens of tools – no matter how useful each tool may be, you can’t tell what each one is until you unfold it. This applies to the tools in libraries as well, so we need to be more transparent to make up for it.

What we need to do is create an experience. Be the fabric of the community, not appended to it. To do this, we need radical trust – that’s what can create Library 2.0. But there’s no one step by step route to that destination. We will necessarily go through a process of trial and error. But don’t be afraid to experiment! It’s going to be required.

Ultimately, delivering information isn’t our job – it’s improving the quality of the questions.

To make these changes in how we deliver service and relate to our users, we’re going to need more time in the day. Productivity tools exist now to help us toward that goal, if we’re willing to take advantage of them. RFID and self service checkout, for example.

In conclusion, we need to rededicate ourselves to a focus on the end user. Not just today, but for life, taking into account how their needs change over time. How do we become that librarian 2.0? We play! Keep up with new technology, don’t be afraid of it. Try new things and see what happens. You can do it on your own, or even better institutionalize the change like the Public Library of Charlotte & Mecklenberg County’s “Learning 2.0” initiative.

Low threshold strategies for libraries to support “other” types of digital publishing

Robert H. McDonald and Shane Nackerud summarized two different aspects of low threshold digital publishing. Robert covered Florida State University’s program of various institutional repository tools, and Shane outlined the University of Minnesota’s UThink blogging platform.

One of the big advantages of an institutional repository program to FSU was that it gave them something to highlight during the recent SACS accreditation process. Their philosophy for the project revolves around the idea of “Barrier free access”.

The structure involves three main tiers:
-The actual Institutional Repository (run on Bepress’ Edikit and PKP Open Journal systems)
-Outreach / Communication (blogs, web sites, wikis, etc)
-Finding Aids for the stored materials

EdiKit is a hosted service and easy to implement. Ex Libris’ Digitool service is going to be the main site for submitting documents to the repository.

The open source content management system Drupal is used to manage the respository’s web site. A big plus is that it automates the creation of a wide variety of RSS feeds. Robert considers RSS readers to be valuable real estate of our users, and any effort we can take to reach them is useful.

Implementing MediaWiki took more training than the organizers expected – fundamentally people just aren’t familiar with the edit it yourself model.

Future plans: get more faculty using the system, promote the basic ideas of open access journals, and work more on integrating everything together.

Robert’s presentation is online here: http://www.rmcdonald.info/presentations/lita2006

Shane gave us a tour of the history and current implementation of the University of Minnesota’s UThink blogging system. Essentially, the university provides free blog hosting to all students and faculty. Over 3500 blogs are currently hosted, more than 1000 of which are still being actively posted to. That’s more than 45,000 total posts.The system was built on a relatively low end server, with 120gb of hard drive space. Even with this limitation, UThink still has been able to let students upload files such as mp3s for the purpose of running a podcast. The blogs themselves are run on the Movable Type platform. There was quite a bit of tweaking in the background necessary to tie the blogs into students’ existing campus network accounts, but it all works seamlessly now.

One main goal in the creation of UThink was to retain the content students are blogging elsewhere as a sort of cultural memory of the university. Additionally the system promotes intellectual freedom, changes attitudes about the library, and helps form communities of interest. The most popular blog the system ever hosted, for example, was all about the sports teams on campus.

Interestingly, only 60% of the blogs are run by undergrad students. Shane theorizes that a lot of undergrad students have existing blogs elsewhere already set up when they come to campus, and don’t feel a particular need to change systems. Anecdotally supporting this, grad students tend to have most of the personal blogs (as opposed to class blogs, for example). Once students graduate, they retain access to their blog as long as they log in at least once every six months.

Two main types of academic use have emerged. Either a professor uses a blog to start discussion, or a professor requires students to maintain their own blog on class related matters.

Unexpected uses have also shown up. For example, other official campus sites outside the library have used the blogs’ RSS feeds to populate their own content.

One of the biggest hurdles in maintaining the system is comment spam. UThink recently added a captcha system (they require a user to type in letters from an image) to combat it. Also, some students don’t use the service because it is not anonymous.

Plusses of running the Uthink program:
-An opportunity to defend intellectual freedom (as when a local business threatened to sue if a disparaging post wasn’t taken down – Shane stood his ground and they went away)
-An opportunity for education in the area of RSS, podcasting, design, etc.
-A massive cultural memory repository has emerged – imagine if something like this was running around the time of September 11th, for example.

Lessons learned from the program:
-Serve those who want to be served
-Work within the current academic processes
-Using UThink to enhance existing library services has been more difficult than expected, but it has opened doors for discussion.
-A committee is needed – this is time consuming! Shane did most of the work himself, but would do it differently a second time.
-In the end, intellectual freedom and cultural memory are the big winners.

CUIPID 4: Building a faceted searching and browsing interface for your library catalog

(note: the preconference material referred to the software as CUIPID 3, but a new version has since been completed)

CUIPID (pronounced “cupid”) is the University of Rochester’s Catalog User Interface Platform for Iterative Development. Built by their Digital Initiatives Unit, it serves as an experimental base for library catalog enhancements.

David Lindahl and Jeff Suszczynski were on hand to walk us through what CUIPID is, as well as some insight into the development process.

First we learned a bit about just what the Digital Initiatives Unit is. The staff of 8 (including a wide variety of non-librarian disciplines, such as an anthropologist) performs constant research into user needs via work practice studies and other methods. They just finished a 2 year comprehensive study of how undergrads write papers. We saw a video clip from interviews conducted at night in a UofR dorm, which was both informative and quite funny. Some interesting facts learned: Freshmen don’t stop at just the first three results of a search, and are not afraid of the reference desk. And most are less capable with technology than expected.

On to CUIPID, which has gone through a lot of changes. Version 1 used a small subset of records in MARCXML format, and solved 80% of UI issues they had previously identified. It collapsed similar editions via text matching, used the Google spellcheck API, etc. But unfortunately it wouldn’t scale up well – the license for the Verity indexing tool they used would be prohibitively expensive to use for the full sized catalog.

Next came SARA (I didn’t catch what the acronym stands for). It was a home made metasearch engine, covering all types of material the library holds – books, web sites, video, subject guides, databases, etc. It ran multiple concurrent queries to the catalog – no need to select author, title, etc. before the search. Users would narrow results by type later. Unfortunately, SARA had extremely debilitating performance issues.

CUIPID 2 was built on a trial version of TextML, an XML database product. It’s interface and features were similar to SARA’s, and also faced scalability issues. In addition, TextML would not have been free for the full version.

CUIPID 3 indexed more than 3 million records, using MS SQL server and ColdFusion for the interface. It was pretty similar to the current CUIPID 4, which was covered in more detail:

CUIPID 4’s features are going to be slowly integrated into the U of R’s existing Voyager catalog. The name has been informally changed to simply C4, “because it’s fun” to say. It follows the previously described faceted method of searching, letting the user drill down to the correct categories of results. Their inspiration for this system came from web sites like Sears’ and Home Depot’s.

It interfaces fully with the current list of student login information, allowing services like placing holds and recalls. There’s a number of relatively small features that are nice touches – displaying contextually appropriate metadata, for example. So for a movie result the director gets displayed, instead of the author for a book. The system makes extensive use of various APIs, pulling in external data like Amazon’s book covers and reviews, recent blog posts about a title via Technorati, etc.

Describing CUIPID 4 admittedly sounds sort of dry. But we got to see a live demo of the system, and it really blew me away. The interface is very intuitive, response time is fast, and it seems to be a pretty polished product even now.

Features to be added in the future include replacing the Amazon images with local copies, imiproved acceptance of unicode in catalog records, holdings records, and FRBR functionality (either homegrown or via OCLC’s system).

A separate project of the Digital Initiatives Unit was mentioned briefly – the eXtensible Catalog (XC). While still in early pre-planning, ultimately they hope to make the XC an open source catalog to hold all types of collections. It will be designed to be experimented with, and be compatible with your existing ILS and any form of scripting (PHP, ASP, CF, etc). Sounds like a very exciting project to me – more information is at www.extensiblecatalog.info.

This presentation had a huge amount of data for me to take in, but I’m glad I went. It was really interesting to see some of these catalog innovations in practice.

Putting All The Pieces Together – Cyberinfrastructure

Tim Daniels and Doug Goans from Georgia State University presented this talk.

I had never heard the term ‘cyberinfrastructure’ before this presentation, and am still not sure I entirely grasp the basics. But here’s what I picked up:

Cyberinfrastructure is really a mindset or an overall vision at an institution. It takes a more global view of collaboration – if we collaborate well with each other already, why not take it further?

An example of this mindset in action might be collaborating statewide on agreed metadata, or consortium catalogs. I felt like a lot of this session tied back into yesterday’s Death of the OPAC topic. It also fits well with the basic idea of economies of scale.

But first, cyberinfrastructure has to be set up within your own campus. According to a rough survey taken by the presenters only 43% of libraries have a defined technology plan, and that has to change.

Internal library IT workers should get out among the broader campus IT community more, and take advantage of those resources. Of course, there will be issues between the two levels such as funding priorities. But it can be overcome. Security is another issue, especially given recent breaches that get a lot of press.

Once a larger pool of IT resources is developed to draw from, take advantage of it! Add new services like virtual reference, multimedia tutorials, e-reserves, etc. These tools can often be transparent to faculty. Be sure to point out resources to them that might traditionally fall outside of their subject areas.

Perhaps most importantly, a concrete spelling out of the scope of a cyberinfrastructure helps leave things more open and accessible down the road.

The concept of cyberinfrastructure is still evolving – the presenters mentioned that it was a relatively new idea to them as well.

Presentation materials are online here.

Archiving & Preserving the Web

Kristine Hanna was the main speaker for this session, and Linda Freuh also contributed. Both are from the Internet Archive.

The session opened with a brief outline of the history of the Internet Archive. They were founded in 1996, and are a non profit organization dedicated to, well, archiving the Internet. They crawl two billion pages a month, plus other media files like audio clips. These snapshots are then stored and made available online. Currently the archive holds 55 billion pages from 55 million sites! To put this in perspective, Kristine estimated that if printed out the pages would reach to the moon and back 19 times.

IA makes no distinction between what should be archived and what shouldn’t – the web is so ephermeral that they’re focused on just grabbing the data for now. All software used in the process is open source and developed from partnerships between IA and other organizations. This includes their crawler, the “Wayback Machine” method of displaying the stored sites, a search engine, and the file format of the archives.

In my mind I had always pictured the Internet Archive as a giant behemoth of an organization. But in reality they have just forty employees! As someone pointed out, that means that 5% of their entire organization was here today. They are completely non profit, and even services that have a fee (such as custom archives for organizations like the Library of Congress) are done at cost.

I was also unaware of all the special projects IA takes on. They’ve branched out a bit from the general archive, and also create special collections around big events like Hurricane Katrina.

As I mentioned earlier, IA works with a number of clients on special projects as well. Users include the Virginia and North Carolina state governments. Others are much broader than just one state – Working with France, they crawled archived the entire .fr domain! Same with .au in Australia!

As a relatively small organization, safe backup of all this information is an important issue. IA follows the Lots of Copies Keeps Stuff Safe philosophy, running mirror servers in places like Egypt in addition to the main California facility. Because this is such a huge amount of data and IA doesn’t have access to the higher bandwidth of Internet2, the backups are actually physically shipped around the world on massive racks of hard drives.

Both Kristine and Linda emphasized that they are not librarians. Instead, they say that the Internet Archive works only as “technical partners” to existing organizations and their expertise. And their services are “…only good because we get lots of user feedback.” In some cases entire projects are suggested by users, including a new archive of topographical maps of the United States.

During Q&A, the presenters noted that if any content owner would like their sites removed from the archive, they need only ask. Also, the IA crawler obeys robots.txt files and will ignore servers if directed to. Internet Archive isn’t large enough and doesn’t have enough money to get into the legal area necessary to clarify these issues. But, Kristine also mentioned that part of “archiving it all” means getting the “bad” stuff along with the good – pornography, ads, etc.

The Internet Archive’s book scanning project was also brought up during Q&A. So far they’ve scanned 80 thousand books, and the main barrier to moving faster is a lack of money to build more “scribe” machines to scan books. All books scanned so far are public domain.

The session closed with brief mentions of two upcoming projects from the Internet Archive:

  • Searching the archive by a method other than a known URL is in the works.
  • The archive of 1996-2000, the so-called “historical web”, will be broadened.