This is part four of my Linked Data Series. You can find the previous posts in my author feed. I hope everyone had a great holiday season. Are you ready for some more Linked Data goodness? Last semester I had the pleasure of interviewing Julie Hardesty, metadata extraordinaire (and analyst) at Indiana University, about Hydra, the Hydra Metadata Interest Group, and Linked Data. Below is a bio and a transcript of the interview.
Julie Hardesty is the Metadata Analyst at Indiana University Libraries. She manages metadata creation and use for digital library services and projects. She is reachable at firstname.lastname@example.org.
Can you tell us a little about the Hydra platform?
Sure and thanks for inviting me to answer questions for the LITA Blog about Hydra and Linked Data! Hydra is a technology stack that involves several pieces of software – a Blacklight search interface with a Ruby on Rails framework and Apache Solr index working on top of the Fedora Commons digital repository system. Hydra is also referred to when talking about the open source community that works to develop this software into different packages (called “Hydra Heads”) that can be used for management, search, and discovery of different types of digital objects. Examples of Hydra Heads that have come out of the Hydra Project so far include Avalon Media System for time-based media and Sufia for institutional repository-style collections.
What is the Hydra Metadata Interest Group and your current role in the group?
The Hydra Metadata Interest Group is a group within the Hydra Project that is aiming to provide metadata recommendations and best practices for Hydra Heads and Hydra implementations so that every place implementing Hydra can do things the same way using the same ontologies and working with similar base properties for defining and describing digital objects. I am the new facilitator for the group and try to keep the different working groups focused on deliverables and responding to the needs of the Hydra developer community. Previous to me, Karen Estlund from Penn State University served as facilitator. She was instrumental in organizing this group and the working groups that produced the recommendations we have so far for technical metadata and rights metadata. In the near-ish future, I am hoping we’ll see a recommendation for baseline descriptive metadata and a recommendation for referring to segments within a digitized file, regardless of format.
What is the group’s charge and/or purpose? What does the group hope to achieve?
The Hydra Metadata Interest Group is interested in working together on base metadata recommendations, as a possible next step of the successful community data modeling, Portland Common Data Model. The larger goals of the Metadata Interest Group are to identify models that may help Hydra newcomers and further interoperability among Hydra projects. The scope of this group will concentrate primarily on using Fedora 4. The group is ambitiously interested in best practices and helping with technical, structural, descriptive, and rights metadata, as well as Linked Data Platform (LDP) implementation issues.
The hope is to make recommendations for technical, rights, descriptive, and structural metadata such that the Hydra software developed by the community uses these best practices as a guide for different Hydra Heads and their implementations.
Can you speak about how Hydra currently leverages linked data technologies?
This is where keeping pace with the work happening in the open source community is critical and sometimes difficult to do if you are not an active developer. What I understand is that Fedora 4 implements the W3C’s Linked Data Platform specification and uses the Portland Common Data Model (PCDM) for structuring digital objects and relationships between them (examples include items in a collection, pages in a book, tracks on a CD). This means there are RDF statements that are completely made of URIs (subject, predicate, and object) that describe how digital objects relate to each other (things like objects that contain other objects; objects that are members of other objects; objects ordered in a particular way within other objects). This is Linked Data, although at this point I think I see it as more internal Linked Data. The latest development work from the Hydra community is using those relationships through the external triple store to send commands to Fedora for managing digital objects through a Hydra interface. There is an FAQ on Hydra and the Portland Common Data Model that is being kept current with these efforts. One outcome would be digital objects that can be shared at least between Hydra applications.
For descriptive metadata, my understanding is that Hydra is not quite leveraging Linked Data… yet. If URIs are used in RDF statements that are stored in Fedora, Hydra software is currently still working through the issue of translating that URI to show the appropriate label in the end user interface, unless that label is also stored within the triple store. That is actually a focus of one of the metadata working groups, the Applied Linked Data Working Group.
What are some future, anticipated capabilities regarding Hydra and linked data?
That capability I was just referring to is one thing I think everyone hopes happens soon. Once URIs can be stored for all parts of a statement, such as “this photograph has creator Charles W. Cushman,” and Charles W. Cushman only needs to be represented in the Fedora triple store as a URI but can show in the Hydra end-user interface as “Charles W. Cushman” – that might spawn some unicorns and rainbows.
Another major effort in the works is implementing PCDM in Hydra. Implementation work is happening right now on the Sufia Hydra Head with a base implementation called Curation Concerns being incorporated into the main Hydra software stack as its own Ruby gem. This involves Fedora 4’s understanding of PCDM classes and properties on objects (and implementing Linked Data Platform and ordering ontologies in addition to the new PCDM ontology). Hydra then has to offer interfaces so that digital objects can be organized and managed in relation to each other using this new data model. It’s pretty incredible to see an open source community working through all of these complicated issues and creating new possibilities for digital object management.
What challenges has the Hydra Metadata Interest Group faced concerning linked data?
We have an interest in making use of Linked Data principles as much as possible since that makes our digital collections that much more available and useful to the Internet world. Our recommendations are based around various RDF ontologies due to Fedora 4’s capabilities to handle RDF. The work happening in the Hydra Descriptive Metadata Working Group to define a baseline descriptive metadata set and the ontologies used there will be the most likely to want Linked Data URIs used as much as possible for those statements. It’s not an easy task to agree on a baseline set of descriptive metadata for various digital object types but there is precedence in both the Europeana Data Model and the DPLA Application Profile. I would expect we’ll follow along similar lines but it is a process to both reach consensus and have something that developers can use.
Do you have any advice for those interested in linked data?
I am more involved in the world of RDF than in the world of Linked Data at this point. Using RDF like we do in Hydra does not mean we are creating Linked Data. I think Linked Data comes as a next step after working in RDF. I am coming from a metadata world heavily involved in XML and XML schemas so to me this isn’t about getting started with Linked Data, it’s about understanding how to transition from XML to Linked Data (by way of RDF). I watch for reports on creating Linked Data and, more importantly, transitioning to Linked Data from current metadata standards and formats. Conferences such as Code4Lib (coming up in March 2016 in Philadelphia), Open Repositories (in Dublin, Ireland in June 2016) and the Digital Library Federation Forum (in Milwaukee in November 2016) are having a lot of discussion about this sort of work.
Is there anything we can do locally to prepare for linked data?
Recommended steps I have gleaned so far include cleaning the metadata you have now – syncing up names of people, places, and subjects so they are spelled and named the same across records; adding authority URIs whenever possible, this makes transformation to RDF with URIs easier later; and considering the data model you will move to when describing things using RDF. If you are using XML schemas right now, there isn’t necessarily a 1:1 relationship between XML schemas and RDF ontologies so it might require introducing multiple RDF ontologies and creating a local namespace for descriptions that involve information that is unique to your institution (you become the authority). Lastly, keep in mind the difference between Linked Data and Linked Open Data and be sure if you are getting into publishing Linked Data sets that you are making them available for reuse and aggregation – it’s the entire point of the Web of Data that was imagined by Tim Berners-Lee when he first discussed Linked Data and RDF (http://www.w3.org/DesignIssues/LinkedData.html).
A big thank you to Julie for sharing her experiences and knowledge. She provided a plethora of resources during the interview, so go forth and explore! As always, please feel free to leave a comment or contact Julie/me privately. Until next time!