Cil 2006 – Future of Catalogs

Roy Tennant, California Digital Library
Andrew Pace, North carolina State University (NCSU)

Packed house which was to be expected. It’s like E.F. Hutton when Roy and Andrew speak. (Do you remember that commercial?) And on top of that, everyone has an interest in OPACs – reference librarians, access services librarians, catalog librarians, etc. Lots of energy and opinions in the room.

Roy Tennant
Hoping to kill off the word OPAC. It harkens back to public access as an afterthought.

What catalogs do well:
(I’m just talking about a finding tool)
1. Inventory control
2. Known item searching
3. Location searching(within four walls)

What they don’t do well:
1. Any search beyond known item
2. Searching anything beyond books
3. Displaying results by logical groupings(FRBR) – show complextiy when needed
4. No faceted browsing
5. Relevance Ranking
6. No recommendation services

How did we get here?
Automation began in back room – let’s automate circulation
Moved into public areas as afterthought
Optimized for our own needs

Key Problems:
-conflated management with discovery purposes
-created stovepipe systems (data goes in, but hard to get out)
-abdicataed responsibility to vendors (disempowered)
-slow to expoit new oppostuinities
-collaboration on building software between libaries

Assertions:
-library catalog is finding tool among many
-acknowledege limitations of catalog
-users may want to broaden search outside of library collection

Future of catalogs:
-one system among many (interoperable)
-function well alone
-refocused on local inventory (limit to what is in your building)
-It will not be the central search tool for library

Signs of life:
Bibliographic Services Task Force (UC)
LOC – Changing nature of catalog and integrations with other discovery tools
Open Source PINES consortion – demo.gapines.org
RedLightGreen – redlightgreen.com (concepts of FRBR, subjects displayed with first search result)
OCLC Research – Curioser (drill down by publisher, language, date, type)
UC San Marcos – MetaLIB – books by availability (no time limit, 24 hours, 5 to 10 days)

Andrew Pace
NCSU Libraries Catalog

State of catalogs – obsessed with authority searching
NextGen OPAC – visual search, clustered search
NCSU Search results “narrow by” shown on left (Topic, Genre, Format, Language, Region)
*each user link begins to drill down – introducing complexity at the point of need

Pursuit of Features:
-Speed
-Relevance Ranking
-Faceted Browsing
-spell checking
-Automatic stemming (child brings up children)

Purchase Decision:
-Authority infrastructure underutilized
-lots of broad topical keywords

Technical Overview:
-Endecan ProFind co-exists wiht Sirsi Dynix Unicorn ILS and Web2 online catalog
-Endeca indexes MARC records from Unicorn
-Endeca ProFind software
-Full details at http://www.lib.ncsu.edu/news/libraries.php?p=1998&more=1&c=1&tb=1&pb=1

Did someone say MARC is dead?
-Endeca likes flat file for ingest

Yes, flat files with limited to needed fields. I think it’s great that they are re-thinking MARC. It really opens up some opportunities to repurpose our data.

Endeca Catalog search (http://www.lib.ncsu.edu/catalog/)
3 search tabs, default to keyword
10 dimensions of MARC data

Challenges:
-Using LCSH like it’s never been used before
-Using LC classification
-Integration with Web2 (the NCSU web
-Creating featuritis (scoping project)

Usabilty Testing:
-around 150,000 searches, a large test group
-most used dimension is LC classification, next used dimension is subject:topic

Future Plans:
-FRBR-ized display
-Patron generated taxonomy?
-Death of authority searching

Andrew mentioned that we need to find a way to get our ILS, ERM, open web, repositiories to play together – this is just a start…

And I have to quote one of his closing lines: “And nobody in that vendor hall can sell you a product that will allow you to get your ILS, ERM, open web, repositiories to play together.” Amen.

It was an informative session and I’ll be taking away some new ideas for digital library applications. Some new maxims:
1. introduce complexity at point of need
2. test basic access points with your users and stick to them across your applications
3. store your data in a readable/writeable format (flat file)
4. prioritize single manifestation of work during first result – let user select type of work later
(Hamlet – the work -> Users select the play, movie, screenplay, etc.)

CIL 2006 – Dead and Emerging Technologies

I’ve been reticent about posting my reaction to this late night session. It’s usually lots of fun and you can get a good sense of library technology trends on the bleeding edge. More of the same this year. The theme was library 1.0 versus Library 2.0. It’s all fair game, but I have to agree that some of this 2.0 stuff is pretty familiar. I’ll let you be the judge. Read on for commentary after the brain dump…

Michael Stephens
Provided a quick look at current 2.0 trends
Gamers are entering our workspace
Remix Culture – Remixing library data
37% of library have blogs – “Our work is not done.”
Dream, plan for your users and have fun.

Amanda Etches-Johnson
Presented a survey of old library technology plans; a fun juxtapostion because the old stuff reads like the new library 2.0 evangelism
“It’s nothing new; it’s an evolution – new tools to fulfill our visions”
subject guides becoming wikis
newsletter becomes blogs
html becomes rss
virtual reference becomes IM
RSS – repurpose our content
* bring out the social

Aaron Schmidt
Painted a picture of web 2.0 – reimagining the social fabric of the web
Revitalized OPACS – WPopac
Google Office – Writely, Gmail
Questioning Google policies in china – threat to the social web?
Digg – votes for best content
Mashups – Google Maps api
Bluetooth Watches
Origami project (MS) mobile computer
Myspace.com – meeting teens where they are
Yahoo Messenger – new version
2.0 – people in our online libraries

Bill Spence
Very funny piece; used humor to point out that what’s old (1.0) is not necessarily dead
2.o vs. 1.0 – Is it really better?
USB is dominant format (He did not see a floppy used during first day of conference)
OS – DOS is not dead
Notepad – “I use it every day!”
Pine is not dead
Browsers – had a demo of Mosaic

Darlene Fichter
Framed discussion of library 2.0 with Tim O’Reilly definiton of Web 2.0
Web 2.0 = small pieces loosely joined, read/write web (from O’Reilly)
content development – participation from our users
long way home to an article – music interlud from Supertramp showing awful process of finding an article (played to big laughs…)
“embrace it (new web 2.0 technologies) – become a digital native”
Digital Read/Write Participants
Learning with others outside the field (sociality of what we do)
tolerance for “not in control”
library 2.0 = books and stuff + radical trust + participation

Marshall Breeding
Sobering take on current library applications
OPACS – our search technologies lag, new user expectations
Don’t count on users starting with us – this is a Google World
Libraries must build own search and retrieval interfaces – effective, elegant interfaces
Push library content into other information spaces – SOAP and web services
New interfaces – AquaBrowser library, Endeca Guided Search
Library adoption of new software occurs at glacial pace….
Web 2.0 – beware of Gartner Group hype cycle of new tech (http://www.riarlington.com/hypecyc.html)
Fundamental mission of libraries is at stake – what is role for libraries in discovery process?
Focus on the strategic, not the cool.
“There are many Del.icio.us side courses, but focus on main course.”

Stephen Abram
Always a lot of fun and this was a call to act and live in our user’s information environments
Get out of the box!
Live in user’s environment
“Libraries are about the experience, not search.”
Build some intelligence into the interface
Keep systems open (canning wireless example…)
Become part of MySpace and Facebook communities

Food for Thought (because I didn’t livebog this):

It is what it is… I’m down with some of the program, but I think we have to know our channels. Are communities like MySpace receptive to a library component? To me, it feels like approaching a group of kids playing on a street corner and asking them if they need to use the library. It’s about context and recognizing the expectations of the community. Yes, libraries need to be active in places outside of our traditional settings, but intruding on a space where we are not expected could and will have negative consequences. Maybe library 2.0 is about informed participation in our user’s spaces.

Another reaction… I think Marshall is right on in calling for “building our own library applications.” Andrew Pace and Emily Lynema of NCSU took it upon themselves to reinvent the OPAC (http://www.lib.ncsu.edu/catalog/). Thanks to both of them for giving us a new vision. But, I think we also need a grassroots effort. I’m talking about building simple apps and sharing the code. I’m talking about introducing web development approaches (Read AJAX) and finding ways to teach others how to do it. It won’t always be easy and sometime our heads might hurt. At the same time, I feel we have to start learning this programming stuff somewhere. CIL seems like as good a place as any to introduce difficult concepts and start to tease out the “how tos”.

And one final thought, each of the panelists spoke about the importance of “becoming a digital native” (Darlene Ficter). Calling for librarians to experience the common applications of our user’s digital existence – FaceBook, MySpace, Yahoo Music, etc. This a a great, solid concept. I would add that as we immerse ourselves in these environs, we should start to look for web conventions, interface design choices, classification models that we can apply to our library applications.

CIL 2006 – Structured data, Web 2.0, Libraries

Lorcan Dempsey

Second day of the conference and my first post… It’s been busy, but exhilirating. This was a good session that really worked to bring together the possibilities of web 2.0 for libraries. Lorcan began by emphasizing the need to make bib data work harder; of releasing the value from Library Marc and IsO markup.

Lorcan framed the conversation by using the definition of web 2.0 from Tim O’Reilly. Web 2.0 is:

  1. flat applications – apis and mashups, rss, web services; lightweight service composition
  2. rich interaction – AJAX, smooth applications within browser
  3. data is new functionality – collection, exploitation of metadata and bib data
  4. participation – social networking, social tagging

Lorcan walked through how these web2.o features are appearing in OCLC research applications.

Lightweight Service Composition Example
audience level grease monkey script – algorithm ranks book according to worldcat holdings (library holdings indicate type of audience – children, adolescent, adult, research…)
http://www.oclc.org/middleeast/en/research/projects/audience/default.htm

Rich Interaction Example
Livesearch of LCSH – FRBR inspired results, narrow by Dewey attributes. Brings smooth interaction of AJAX to searching bib records. Creates an application that is very responsive to user request (The new web site tools and technologies session talked about the “how to” of AJAX yesterday. Look for those slides on the conference web site – http://www.infotoday.com/cil2006/Presentations/ .)
http://lcsh.orhost.org/

Data is new functionality example
FictionFinder – interface that supports searching and browsing of fiction works in worldcat;
algorithm about most purchased books by library dictates display order, faceted browse by work; created special indexes (fictitious character, literary form…)
http://www.oclc.org/research/projects/frbr/fictionfinder.htm

Lorcan closed by calling for libraries to enable people to prospect our data – keep them around. We must imagine new ways to mine our data to show different filters and views to our users.

Jason A. Clark for LITA

E-Matrix: NCSU Library Eresources Management System

Andrew Pace and Stephen Meyer, NCSU Libraries
Sunday, October 1, 2005

A great session with the bigger picture of eResource management in mind. Useful for any librarian looking to manage a dispersed and disparate set of library data. Andrew and Stephen have got their mind around what it means to administer records of database subscriptions, ejournals and print journals while at the same time managing access and display issues for librarians and the users we work for. It’s an all encompassing system that reworks how we can manage our growing eResources and it will involve ALL departments in the library.

Andrew began with a review of Electronic Resource Management (ERM). ERM has been talked about for a while with little progress. The DLF as well as library vendors have started to put some thought into it. Companies and parties have also been looking to develop these systems. (Innovative Interfaces was first to market.) At its core, ERM is about re-envisioning collection management. For some heavy reading on the subject, check the DLF ERMI site (includes a link to DLF ERMI report). Andrew stressed that this was not something that happened overnight. There were rumblings of it around 1999… When it did get scoped out by an NCSU working committee, the E-Matrix had three objectives: managing acquisitions, providing access via discovery/display and collection management.

Stephen presented on scoping the data – deciding on what type of data needed to be included in the E-Matrix. It’s a large moving target, but it was limited to acquisitions data, licensing data, bibliographic data and subject (display) data. Both Andrew and Stephen emphasized that deciding what data they needed was the easy part. The contentious part was deciding what department would be the “authoritative data store” for the E-Matrix. (A role traditionally held by tech services and the Library OPAC…)

After a brief talk about licensing and acquisitions data (stuff that might stay behind the scenes a bit), Stephen continued with a quick rundown the data for the public interface of E-Matrix. And this is where it got pretty interesting if you’re thinking about how to display multiple facets of resources for your users. The ERM committee asked a group of public service librarians to come up with a vocabulary to use based on the following facets:

  1. Container – type of resource (e.g., article database, online data set…)
  2. Content – what is inside resource (e.g., images, citations…)
  3. Aboutness – what is the resource about (e.g., general subjects – biology, education…)

Just seeing these facets was really helpful. We can get a rich set of access points for our resources if we used each of these options. And that was Stephen’s next point as he showed a mock-up of the public interface of the E-Matrix. It was pretty text-heavy, but it offered lots of access points via tabs, and multiple displays. (A user might need to spend some time there before really getting comfortable with it.) Stephen did a walk-through of the display of the two major components of the E-Matrix – Databases and Journals. Before handing it back over to Andrew, he mentioned some future directions like the ability for librarians to create custom pages with a simple html select and drop-down form. Very cool stuff.

Andrew did talk a bit about the back-end specs, Oracle, JAVA (JSP, Struts), PL/SQL database. The database scheme was flashed on the screen and it was complex. Over thirty tables, at least. “It’s a complicated problem” – Stephen. Andrew and Stephen aren’t sure how to share the code, but would like to. Boston College Libraries has been working on an ERM and they have a data dictionary and other documentation available at http://www.bc.edu/bc_org/avp/ulib/staff/erm/erm-db/. BC Library is not supporting – just offering documentation for those interested.

A great macro view of where libraries can go with managing eResources. And even if you can only use bits and pieces of the E-Matrix idea, you’re still going to be improving things. All kinds of information (including presentations) is available at http://www.lib.ncsu.edu/e-matrix/. For another take on the session, check Karen Coomb’s earlier post from her blog.

Utilizing the Benefits of Native XML Database Technologies

Alan Cornish – Systems Librarian, Washington State University Libraries

Another take on the session… You should also check Karen’s earlier post.

What’s a Native XML database exactly? Alan defines Native XML as a document storage and retrieval model where an XML doc is considered the basic unit of storage, the database is DTD or schema independent, and an XML-specific query language is used to manage, retrieve and display data. No relational databases or SQL here, kids.

Alan gave a nice, brief overview of XML, DTDs and then introduced the software used for his project. Textml is XML server software from Ixiasoft. He also mentioned Cooktop – some freeware that actually worked as a pretty robust XML editor.

Alan demoed how Textml works, showed query syntax and drew comparisons with SQL statements. Those of us used to SQL syntax (SELECT * FROM tablename WHERE keyword = things) This native XML stuff is a whole different ballgame – the query syntax matches xml syntax. (i.e., The query is actually another xml document. ) It’s verbose and a little intimidating.

Here’s an example based on the same query above:

<code><xsl :stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>
</xsl><xsl :template match=”*”>
<xsl :apply-templates select=”tablename”/>
<xsl :value-of select=”things( )”/>
</xsl>
</code>

This example is not spot on, but you get the idea. SELECT * FROM … is looking pretty good.

So, how can applications use native XML? Alan showed how a pilot version of the NWDA (Northwest Digital Archives) is going to using native XML to manage, find and reuse archival finding aid content. Alan showed the XML index that runs the NWDA, well-formed xml document that map out the relationships between items and collections. We also got to see how XML can be used to create a pretty standard search and retrieval interface: find titles, free text search, browse records screen. Online XSL transformation creates the display for individual items. This could be a bottleneck if the XML doc gets large. People asked questions about retrieval performance and Allen pointed out that when performance in search and retrieval lagged it usually happened during the XSL transformation stage.

One of the more interesting tidbits from the session was the Adobe XML Metadata Packet (XMP). XMP is an Adobe metadata option based on W3C’s RDF which packages basic metadata and embeds it within pdf, jpeg, tiff (adobe type files). It’s really simple to add to Adobe documents. By using the document properties menu in Adobe Professional 7, you can enter your metadata. This is pretty cool stuff and could really be useful if one could begin a digitization project marking up pdfs, jpegs or tiffs with XMP. As proof of concept, Alan showed how to query using Textml and the XMP packet. XMP would be really easy to use with an Electronic Theses and Dissertation project or any project with a reliance on Adobe document types.

Other XML server software equivalents to Textml: Tamino, NATIX, and eXist (open source). Try a google search on “native xml server” for additional options.

It was another worthwhile session. Not sure I got my head around all of it… Oh well, it’s gettin’ late and I’m off for some food and grog.

Pervasive XML for the Digital Library: Tools, Tricks, and Techniques

Beth Goldsmith, Los Alamos National Laboratory

A nice session from yesterday that had the feel of an XML workshop. Beth offered a quick introduction to XML and XSLT (XML stylesheet transformations) and then got into the nitty gritty as to how Los Alamos is applying the technology.

Some of the highlights:

Beth mentioned one of the themes Roy Tennant’s keynote: agility. When speaking of agility Roy was talking about our need to develop applications and systems faster, but to do it in a way where different pieces of our applications can be re-used in different contexts – a modular approach moving away from the proprietary, vendor-specific model. (The poor OPAC was the whipping boy example again.) Anyway, Beth’s point was that XML enables the modular approach and and allows us to take data and re-purpose it.

Currently, Los Alamos work-flow looks something like:

  1. Work with vendors to buy data
  2. Transform the data with XSLT into MARCxml
  3. Create search and retrieval applications to display data to users

Most of us wouldn’t have the personnel, to do this type of work in house, but it’s an intriguing idea. In many ways, we outsource work and are bound to programmers outside of the our industry to make our applications. Beth’s point was that we can start to take more control and empower from within. She pointed out that by spending a day or two teaching Los Alamos metadata librarians how to use XML and XSLT, she was able to bring those parties with intimate knowledge about metadata into the programming process – cutting out intermediaries.

The rest of the session got into the “how to” side of things. I don’t want to make your eyes gloss over…

Strengths of XML – valid, well-formed, can be transformed into many different objects

Drawbacks of XML – bloated file size (a simple, delimited text file transformed to MARCxml blew up to nearly 3 times it’s original size).

XML Tools and tips:

  • Altova XML spy – good for coding and modeling, mapforce component of software that can start to crosswalk between XML schemas
  • Browsers – IE and Mozilla have XML rendering engines that can read XSL stylesheets, great for debugging
  • References: check Standard XML libraries at SourceForge, get stylesheets at TopXML

It was a long and rich session and I’m really only scratching the surface. Check out the full version of the talk at http://library.lanl.gov/lww/articles/LITA2005/Presentation/