Utilizing the Benefits of Native XML Database Technologies

Alan Cornish – Systems Librarian, Washington State University Libraries

Another take on the session… You should also check Karen’s earlier post.

What’s a Native XML database exactly? Alan defines Native XML as a document storage and retrieval model where an XML doc is considered the basic unit of storage, the database is DTD or schema independent, and an XML-specific query language is used to manage, retrieve and display data. No relational databases or SQL here, kids.

Alan gave a nice, brief overview of XML, DTDs and then introduced the software used for his project. Textml is XML server software from Ixiasoft. He also mentioned Cooktop – some freeware that actually worked as a pretty robust XML editor.

Alan demoed how Textml works, showed query syntax and drew comparisons with SQL statements. Those of us used to SQL syntax (SELECT * FROM tablename WHERE keyword = things) This native XML stuff is a whole different ballgame – the query syntax matches xml syntax. (i.e., The query is actually another xml document. ) It’s verbose and a little intimidating.

Here’s an example based on the same query above:

This example is not spot on, but you get the idea. SELECT * FROM … is looking pretty good.

So, how can applications use native XML? Alan showed how a pilot version of the NWDA (Northwest Digital Archives) is going to using native XML to manage, find and reuse archival finding aid content. Alan showed the XML index that runs the NWDA, well-formed xml document that map out the relationships between items and collections. We also got to see how XML can be used to create a pretty standard search and retrieval interface: find titles, free text search, browse records screen. Online XSL transformation creates the display for individual items. This could be a bottleneck if the XML doc gets large. People asked questions about retrieval performance and Allen pointed out that when performance in search and retrieval lagged it usually happened during the XSL transformation stage.

One of the more interesting tidbits from the session was the Adobe XML Metadata Packet (XMP). XMP is an Adobe metadata option based on W3C’s RDF which packages basic metadata and embeds it within pdf, jpeg, tiff (adobe type files). It’s really simple to add to Adobe documents. By using the document properties menu in Adobe Professional 7, you can enter your metadata. This is pretty cool stuff and could really be useful if one could begin a digitization project marking up pdfs, jpegs or tiffs with XMP. As proof of concept, Alan showed how to query using Textml and the XMP packet. XMP would be really easy to use with an Electronic Theses and Dissertation project or any project with a reliance on Adobe document types.

Other XML server software equivalents to Textml: Tamino, NATIX, and eXist (open source). Try a google search on “native xml server” for additional options.

It was another worthwhile session. Not sure I got my head around all of it… Oh well, it’s gettin’ late and I’m off for some food and grog.