A Linked Data Journey: Survey of Publishing Strategies

Image Courtesy of Shelly under a CC BY 2.0 license.

Introduction

Happy Friday everyone! This is part five of my Linked Data Series. You can find the previous posts by going to my author page. Last week I was fortunate enough to attend Mashcat 2016 in Boston. It was a wonderful one-day conference. We had some very interesting conversations aimed at breaking down communication barriers in libraries (archives and museums), and I was able to meet some fantastic professionals (and students).

In addition to attending, I also presented a talk titled Finding Aid-LD: Implementing Linked Data in a Finding Aid Environment (slides). During the presentation I identified various Linked Data publishing strategies that are currently being implemented. I thought this would be a neat topic to post here as well, so today I’m going to give you the deets on Linked Data publishing strategies.

Survey of Publishing Strategies

Note that these strategies are not mutually exclusive. You can combine these strategies for any particular solution.

Data Dump

Details
A data dump is a zipped file or set of files that contain the complete dataset of a provider.

Use-case
Somebody wants to download a provider’s full dataset for research, reuse, etc.

Examples

Subject Pages

Details
A subject page is a document or set of documents that contain all the data about a resource. Subject pages are very similar to traditional metadata records. Common practice is to use content negotiation so that when you go to a URI, the URI will redirect to a human-readable or machine-readable document based on the HTTP ACCEPT header. A newer and increasingly popular practice is to embed RDFa into HTML documents. Google and the other big search engines index RDFa and other types of embedded metadata. RDFa is becoming an added layer to content negotiation, and in many cases an alternative altogether.

Use-case
A person wants to dereference a resource URI and discover new knowledge by browsing through resource links.

Examples

Triplestores and SPARQL Endpoints

Details
Triplestores are databases for storing RDF triples/data. SPARQL is a query language for RDF and most commonly accesses RDF data through triplestores. SPARQL can run very complex, semantic queries on RDF and can infer new knowledge based on the complex queries. A SPARQL endpoint is a server access point that you go to to run queries on a triplestore.

Use-cases
A researcher wants to run complex, semantic querying of the data. A reference librarian needs to perform a complex query during a reference session.

Examples

DBpedia SPARQL endpoint

Software

Triple Pattern Fragments

Details
A relatively new strategy is through Triple Pattern Fragments (TPF). TPF aims to be an efficient solution for querying RDF data (you can read more about what I mean here). TPF breaks queries down into triple patterns (subject predicate object). Example:

Give me all the resources whose birthName is “Christopher Frank Carandini Lee”.
?subject <http://dbpedia.org/ontology/birthName> "Christopher Frank Carandini Lee"

Software
There are currently two types of TPF software: TPF servers and TPF clients. The server runs simple triple pattern queries as shown above. The client uses triple pattern queries to run complex, SPARQL-like queries. According to their website, TPF clients have lower server cost and higher availability when compared to SPARQL endpoints, which means that the former might be a good alternative to the latter. The only caveat is that a TPF client uses more bandwidth and has a higher client cost.

Linked Data API

Details
A Linked Data API is an effort to transform complex RDF data into simple RESTful APIs. The only such software that I’ve found is aptly named Linked Data API. According to the documentation, Linked Data API is an API layer that sits on top of a SPARQL endpoint. It can generate documents (subject pages) and run “sophisticated queries” (though, I don’t think they can be as complex as SPARQL queries). I’ll confess that this strategy is the one I’m least knowledgeable about, so please feel free to delve into the documentation.

Conclusion

I hope this gives you a good idea of the plethora of ways to publish Linked Data. If you know of any others please list them in the comments. As always, I invite you to post questions and comments below or send them to me via email. Thanks for reading!

Introduction

Survey of Publishing Strategies

Data Dump

Subject Pages

Triplestores and SPARQL Endpoints

Triple Pattern Fragments

Linked Data API

Related Resources

Conclusion