Happy Friday everyone! This is part five of my Linked Data Series. You can find the previous posts by going to my author page. Last week I was fortunate enough to attend Mashcat 2016 in Boston. It was a wonderful one-day conference. We had some very interesting conversations aimed at breaking down communication barriers in libraries (archives and museums), and I was able to meet some fantastic professionals (and students).
In addition to attending, I also presented a talk titled Finding Aid-LD: Implementing Linked Data in a Finding Aid Environment (slides). During the presentation I identified various Linked Data publishing strategies that are currently being implemented. I thought this would be a neat topic to post here as well, so today I’m going to give you the deets on Linked Data publishing strategies.
Survey of Publishing Strategies
Note that these strategies are not mutually exclusive. You can combine these strategies for any particular solution.
A data dump is a zipped file or set of files that contain the complete dataset of a provider.
Somebody wants to download a provider’s full dataset for research, reuse, etc.
A subject page is a document or set of documents that contain all the data about a resource. Subject pages are very similar to traditional metadata records. Common practice is to use content negotiation so that when you go to a URI, the URI will redirect to a human-readable or machine-readable document based on the HTTP ACCEPT header. A newer and increasingly popular practice is to embed RDFa into HTML documents. Google and the other big search engines index RDFa and other types of embedded metadata. RDFa is becoming an added layer to content negotiation, and in many cases an alternative altogether.
A person wants to dereference a resource URI and discover new knowledge by browsing through resource links.
Triplestores and SPARQL Endpoints
Triplestores are databases for storing RDF triples/data. SPARQL is a query language for RDF and most commonly accesses RDF data through triplestores. SPARQL can run very complex, semantic queries on RDF and can infer new knowledge based on the complex queries. A SPARQL endpoint is a server access point that you go to to run queries on a triplestore.
A researcher wants to run complex, semantic querying of the data. A reference librarian needs to perform a complex query during a reference session.
Triple Pattern Fragments
A relatively new strategy is through Triple Pattern Fragments (TPF). TPF aims to be an efficient solution for querying RDF data (you can read more about what I mean here). TPF breaks queries down into triple patterns (subject predicate object). Example:
Give me all the resources whose birthName is “Christopher Frank Carandini Lee”.
?subject <http://dbpedia.org/ontology/birthName> "Christopher Frank Carandini Lee"
There are currently two types of TPF software: TPF servers and TPF clients. The server runs simple triple pattern queries as shown above. The client uses triple pattern queries to run complex, SPARQL-like queries. According to their website, TPF clients have lower server cost and higher availability when compared to SPARQL endpoints, which means that the former might be a good alternative to the latter. The only caveat is that a TPF client uses more bandwidth and has a higher client cost.
Linked Data API
A Linked Data API is an effort to transform complex RDF data into simple RESTful APIs. The only such software that I’ve found is aptly named Linked Data API. According to the documentation, Linked Data API is an API layer that sits on top of a SPARQL endpoint. It can generate documents (subject pages) and run “sophisticated queries” (though, I don’t think they can be as complex as SPARQL queries). I’ll confess that this strategy is the one I’m least knowledgeable about, so please feel free to delve into the documentation.
- Cool URIs for the Semantic Web
- Apache Jena
- Linked Data Fragments | In depth
- linkeddata.org, Tools
- Linked Data Platform
- SPARQL By Example
I hope this gives you a good idea of the plethora of ways to publish Linked Data. If you know of any others please list them in the comments. As always, I invite you to post questions and comments below or send them to me via email. Thanks for reading!