2008

Do They Really Know What They Need?

User-Centered Design for Humanities Collections within a Digital Library – LITA Forum 2008

Mark Phillips and Kathleen Murray, University of North Texas presented jointly on the challenges, goals and outcomes of user-centered design for humanities collections within a digital library. A link to their presentation is here.

Mark is the Head of the Digital Projects Unit and has been involved with software development and digital content creation for the Portal to Texas History. Kathleen is a postdoctoral research associate working in the Digital Projects Unit at the University of North Texas Libraries. She has been involved in
state-wide and national digital library projects and has presented at major library
and information science conferences in the areas of needs assessment, digital
libraries, and web archiving.

Mark and Kathleen took turns presenting the challenges. Mark started by giving the technical background of the IOGENE project.

They had to take this in three steps:

1.    Evaluate the DPU Infrastructure – and plan for creating a new infrastructure

2.    Consult with the IOGENE User Studies – What are the challenges that they bring to the table? What challenges do their needs bring?

3.    Process Model

DPU
Their 2004 infrastructure was built on the Keystone digital library system.

Advantages: open source and highly customizable. They were able to really tweak the system to suit their needs. There are automatic metadata entry forms, and it saves the files as XML on a file system.

Disadvantages: Very limited Keystone user group. If there was a problem, they had to solve it themselves.
Often tied to very outdated software. Made it difficult to more forward and advance. And since the model was so customizable, they were not able to share their code with any other users. Scaling the systems was also difficult. As they grew – they had problems with management. The technologies were difficult to work with going forward. Worked against the developers.

Their 2008 infrastructure:
Using Solr for indexing and Django.
Advantages: Uses open source technologies, highly customizable and highly scalable, it is standards based (ARK, METS, DC, MODS, SRU), designed around their specific requirements, XML files on a file system.

Disadvantages: in-house development takes time and money!

Components: Python, Django, Subversion, Trac, JQuery, Solr, Open Layers, Apache, Ubuntu/Suse, MySQL, memcached
This was a great way to look at software development at the University of Northern Texas.  Standards and specifications: METS, ARK, MODS; Dublin Core, Grabit, Bagit, PairTree, SRU, ARC/WARC

Kathleen:
Background on the IOGENE project – they have seen a great deal of growth. But why genealogy? It would be fun, there is an increased interest in the amount of people wanting to do genealogical research, increasing number of seniors that are doing the research, and an increasing use of internet resources to do the research.

In setting up this project – they first decided upon an information retrieval framework for genealogists. Kathleen gave a brief demonstration of how the following genealogical sites worked for background data: Familysearch, Heritage Quest, and Ancestry.com. Each of these sites has multiple ways for the searcher to approach their search.

The Portal to Texas History only has one field within their basic search, but they also have an advanced search. But how to change their interface to best suit their users needs?

When assessing their portal – they used focus groups, usability testing and comment log analysis. The participants were members of geological societies and portal users.

The key findings were that the genealogists are primarily interested in names (first name and surname), locations (county, city, state, community) and time period. Their digital library places and emphasis on title, author/creator and subject.

They decided that they needed to rethink theirs operations and planned direction in order to better serve their genealogical users to enable the discovery of their collections and content.
While their digital library is very standards driven, the genealogical portals are more content driven.

Challenges:
1.    Name searches – really needed and requested by user groups. Priority needed for “exact phrase” search. Wanted visibility and guidance for “name” searches. By basing their model on Dublin core, this creates real problems. The data model gets “mushy” as you add in the creator, subject names, etc.
2.    For the advanced search, they wanted county name, subject, era, Soundex code, -‘names begin with’ and wanted to identify the familiar object “types”. Also wanted to select the number of search results, including “all.”
3.    Relevance: this is a problem. Everyone defines this differently. The order now is:
Surname, Location and Date together. Then display the results be relevance in this order: exact phrase (s), adjacent terms, terms proximally located, then single terms. Luckily they are able to modify their code fairly easily to accommodate most of these requests.
4.    Search Results – Serials – wanted to see the title as well as the table of contents. For objects that are part of the same series – they wanted one listing with all volumes listed. “hits in text” – wanted the number of the hits within the text.
Wanted to know the size and if it was available to download. Wanted it automatically! Open selected object sin new windows or tabs. Wanted to limit them by first name or date range, grid display that has limited metadata – limited the number of fields displayed.

They were not able to give the users everything that they wanted! This is where to bring the “education” side of librarian ship into the process.

5.    Metadata: include location information and link to maps, include place names not commonly known or used, limit to one page formatting for printing, simple text format vs. tabular format, open linked content in new window or tab.

In conclusion, they have their developers, their user design group, and their user studies. How to best utilize their assets to create the best process model? It’s a complex process, and one that involves all parties and much discussion. Unfortunately, in order to make the best use of time, money, and resources – not all of the users requests will be input into the final product.