technical skills of librarianship

The other day I was asked Someone about ways they could move from reference type of work to more systems sort of work in libraries. I was happy to share my thoughts on the topic, and below is what I said.

Food for thought.

Someone wrote:

> Anyway, I am seeking your advice. I am very hungry to move from my
> reference position into a systems related position, preferably one
> that centered on web application development for libraries….
>
> Can you offer any advice?

I can afford lots o’ advice, and I’m always to help out in these regards. 😉

I sincerely believe librarianship is overflowing with opportunities for people who want to exploit the use of computers to facilitate library-like activities. While it is difficult to ignore the changes in Library Land with the advent of the globally networked computers — the Internet, I do not think libraries will be disappearing. Certainly the processes of librarianship (collecting, organizing, archiving, and disseminating data and information for the purposes of increasing knowledge and understanding) are still needed and desirable by all sorts of communities. Put another way, there are plenty of opportunities for librarians who want to apply the use of computers to traditional library tasks.

There are five technologies, listed in priority order, I suggest you spend time learning in order to increase your skill set:

1) XML – XML is a sort of modern-day alchemy. It represents a way to turn data into information, as well as a way to unambiguously transmit that information from one computer to anther computer. Take the following string of characters: 1776. It is just an integer. It has no context. One thousand seven hundred and seventy-six what? By marking up the integer in an agreed upon fashion, say 1776, the integer takes on a new meaning. It has value and thus it becomes information. Magic. Since the syntax for marking up XML is simple and agreed upon, there are many communities (government, business, academia, etc.) who use tools to read and use XML information. This represents a giant opportunity for the library community.

All you need to learn about XML is a plain-text editor and a relatively modern Web browser such as Firefox. Mark-up, by hand, things like electronic texts. Associate them with (CSS or XSLT) stylesheets. Process them with an XML processing application like a Web browser, and you end up with human-readable information. No programming. Just information and display.

Attached is an example I was working on yesterday. The XML file is the search results against an SRU-accessible index. The XSL file transforms the XML into HTML and displays it. Save both the XML and XSL file in the same directory, and then open the XML file with Firefox (or some other ‘modern’ Web browser) to see what I mean.

2) Relational databases – Libraries love lists. We maintain lists of books, journals, Internet resources, authority files, controlled vocabulary terms, etc. In an electronic environment lists are best created and maintained using relational databases. Learn how to design, create, and maintain relational databases using the lingua franca of relational database technology — SQL (Structured Query Language). There are many relational databases applications to explore. Microsoft Access. Filemaker Pro. MySQL. Postgres. Access is widely supported in the desktop environment. Filemaker is cross- platform, has a great interface, exports XML, and includes an integrated HTTP (Web) server. Postgres is very standards compliant but technologically a bit challenging. MySQL is widely popular but comes with little or no interface.

If you have access (no puns intended) to Filemaker, then I suggest it first, simply for the interface. Second, I suggest MySQL. It will install nicely on your desktop as well as server computer. Get a book like Databases for Mere Mortals in order to become familiar with the SQL language and normalization techniques. Learn the huge difference between “flat files” and real databases.

Excel and tab-delimited files are not “real” database applications!

3) Indexing – Believe it or not, databases suck as facilitating search, especially considering today’s user expectations regarding relevance ranking. In order to search a (relational) database queries must be applied against specific fields; relational databases do not support free text searching. Databases can fake it by applying queries to many fields, but this gets overly complicated very quickly. Furthermore, relational databases do not have the facility to rank search results according to some statistical analysis; relational databases can only sort results numerically or alphabetically.

Indexers overcome these problems. Indexers read sets of documents, break them up into their atomistic parts (words), and write these parts to a file along with a pointer back to the original document. Searches are then applied to these lists. They work exactly like the back-of-book indexes except all of the words in documents are included, not just the one’s a human thought were important.

To learn about indexing get a program called swish-e. Then acquire bunches of HTML or XML documents and save them in a directory. Turn swish-e loose on the directory to create an index. Use swish-e to search the resulting index. As your experience matures you will learn how to write reports against your database, pipe it to your indexer, and search your database that way. Either way, you will end up with a much more powerful search interface when compared to the use of SELECT statements in SQL.

Databases and indexing are two sides of the same information retrieval coin.

4) Web serving – Increasingly people expect to acquire the information the require for learning, teaching, and research through a Web browser. In order to meet these expectations libraries need to host Web (HTTP) servers.

Believe it or not, this is really easy. Get a copy of Apache, install it on any Internet-accessible computer, and start filling it up with stuff. You do not need a big bad computer for this, and I challenge you to fill it up with so much stuff that it becomes too slow. (Actually, the challenge here will be putting into practice the principles of good information architecture. Specifically asking and then answering questions regarding the context, content, and users of your server.) When your server gets full you will have learned a whole lot and be ready to go to the next step.

5) Programming/scripting – Finally, you will want to “glue” all of the above technologies together into a coherent whole. You will want streamline the data-entry and reporting processes. You will want these process to run regularly and automatically. You will want graphical user interfaces to your XML data, relational databases, searching functionality, etc.

To make this happen you will need to write computer programs. I prefer Perl, but just about any computer language will do. Each have their own strengths and weaknesses. Java is probably on your desktop computer. Perl is particularly strong for manipulating texts. PHP is particularly adept at Web applications. Programming requires a person to think very systematically — almost mathematically. The programmer must understand the step-by-step processes required of a system. There are no leaps of comprehension in computer programs. Computer programing is keen on syntax (as XML is), but once that syntax is mastered real productivity occurs. (Ironically, marking up cataloging records using AACR II requires similar skills and attention to detail.)

Finally, as far as I know, I do not think there are very many accrediting agencies teaching these skills to any great degree. Our profession is aging quickly and there is not a critical mass of library practitioners who can apply these technologies as well as understand the principles of librarianship. At the same time, the processes of librarianship, with the possible exception of archiving, can be closely associated with the technologies outlined above:

* collection – XML, databases, and programming
* organization – XML, databases, HTTP servers, and programming
* preservation – XML (maybe)
* dissemination – XML, indexing, HTTP servers, and programming

Please do not be overwhelmed. All of these things can be learned and practiced on your desktop or home computer. They lend themselves better to server-class operating systems such a Unix/Linux, but learning about these operating systems is challenging in itself and not readily applicable to librarianship. All you need is the ability to read books, the desire to learn, and the time to do it.

Good luck, and I hope this helps.

—
Eric Lease Morgan
University Libraries of Notre Dame

August 7, 2005