General information

A "Next generation" library catalog – Technologies (Part #3 of 5)

This is an outline of computer technologies for implementing a “next generation” library catalog. In two sentences, this catalog is not really an catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. It provides its intended audience with a more effective means for finding and using data and information.

The full text of this document formatted as a single HTML page is available at:

http://dewey.library.nd.edu/morgan/ngc/

Technological design

model for a next generation opac
Model for a “next generation” library catalog

Technically speaking, this “next generation” library catalog is the combination of a relational database with a full-text index. Access to this database/index combination is provided through open standards such as Z39.50, SRW/U, OpenURL, and OAI-PMH.

Database

The database is designed to contain XML files enhanced with facet/term combinations. When at all possible these XML files (based on any DTD or XML schema) will represent a complete, atomic information resource such as a book marked-up in TEI or an article in XHTML. When it is impossible or impractical to acquire the full text of a resource, then a metadata record describing the resource will be used instead. In these cases MODS records, EAD files, or something like Dublin Core elements harvested from OAI repositories and packaged as RDF streams will be stored in the database.

The description of each XML record in the database will be enhanced with one or more facet/term combinations. The definition and creation of these combinations are left up to each library to articulate, but some facets might include: Formats, Audiences, Disciplines, XML Types, or Flags. Some example terms might include: Books, Journals, Articles, Freshman, Sophomores, Juniors, Astronomy, Music, Philosophy, MODS, TEI, RDF, Downloadable, Checked-out, or World-accessible.

Given this design, the database only contains four tables as illustrated by the following entity-relationship diagram:

er diagram
Entity-relationship diagram for the “next generation” library catalog

Indexer

Databases are great for storing data and information, but, ironically, they are weak on finding data and information. Indexers are best used for facilitating search, and thus the need for an indexer in the “next generation” library catalog.

The entire content of each record in the database will be indexed. This means each record will be full-text searchable. To improve both precision and recall each record will also be indexed according it its most significant XML elements. Because the database will contain a wide variety of XML files represented by different DTD’s and schemas, a best practices effort will be made to additionally map the XML elements to Dublin Core names. Using this approach records from the index should be accessible via free-text terms/phrases, DTD/schema-specific element names, as well as a set of commonly used fields represented by Dublin Core. At the very least the indexer is expected to support unicode and incremental updating. The native query language of the indexer is expected to support free-text, phrase, and field searching as well as Boolean operations, nested queries, right-hand truncation, stemming, sorting, and relevance ranking.

End-user access

End-user access to the database/index will be through standard protocols such as Z39.30, SRW/U, OpenURL, and OAI-PMH. Access to the system using their native interfaces is discouraged. While the use of the protocols may bypass some “kewl” features of the underlying database or search engine, they will enable the system to be more modular in nature. Such an approach will enable one indexer to be supplanted by another indexer with much greater ease. Similarly, when new databases become available or better protocol implementations are created the older tools can be replaced with their improved counter-parts.

This does not mean end-users will be expected to know the specific syntaxes of the abstract protocols in order to access the system. At the outset users will be presented with an HTML-based interface. From there they will be able to search and browse the system. Queries will be sent to the Z39.50 or SRW/U servers for processing and items will be returned supplemented with ways to expand and narrow the results. This search/browse process is repeated until the end-user identifies specific items of interest.

When an item from the search results is selected by the user an OpenURL is sent to a resolver. The resolver will return a list of functions — services — that can be applied against the item. Borrow. Download. Edit. Review. Annotate. Share. Delete. Index. Search. Find More Like This One. Discuss. Find author and email them. Create citation. Summarize. Write paper. Trace idea backwards. Trace idea forwards. Print. Email. Collaborate. Save. Archive. Translate. Show definition. Show synonym. Graph. Chart. Prioritize. Evaluate. Rate. Rank. Illustrate. Authenticate source. Etc. This list of services will be generated based on the characteristics of the item and represents a set of things the user can do with the item. For example, full-text articles are downloadable, and physical books are borrowable. The list of services is only limited by the time, energy, and imagination of the implementors.