ADE in the Library eBook Data Lifecycle

Reader: “Hey, I heard there is some sort of problem with those ebooks I checked out from the library?”

Librarian: “There are technical problems, potential legal problems, and philosophical problems – but not with the book itself nor your choice to read it.”

[Update 2014.10.14@12.04pm: more info on security in the library data lifecycle added at the bottom of this post]

As mentioned, there are (at least) three sides to the problem. Nate Hoffelder* discovered the technical problem with the way the current version (4) of Adobe Digital Editions (ADE) manages the ebook experience, which was confirmed by security researcher Benjamin Daniel Mussler, and later reviewed by Eric Hellman. The technical problem, that arguably private data is sent in plain text from a reader’s device to a central data-store, seems pretty obvious once it was discovered. The potential legal problem stems from laws in every state which protect reader privacy which set expectations for data security, plus other laws which may apply. The philosophical problem has several facets, which could be simplified down to the tension between privacy and convenience.

When a widely-used software platform is found to be logging data unexpectedly and transmitting it for some unknown use it causes great unease among users. When that transmission is happening in plain text over easily-intercepted channels, it causes anger among technologists who think a leading software developer should know better. When this is all happening in the context of the library world where privacy is highly valued, there is outrage as expressed by LITA Board member Andromeda Yelton.

Here are the library profession’s basic positions:

  1. Each individual’s reading choices and behavior should be private (i.e. anonymized or, better, not tracked)
  2. Data gathered for user-desired functionality across devices should be private (i.e. anonymized)
  3. Insofar as there is any tracking of reading choices and behavior, there should be an opt-out option readily available to individuals (i.e, not buried in the fine print)

In his October 9th post from The Digital Shift, Matt Enis reports that Adobe is working to correct the problem of data being transmitted in clear text but “maintains that its collection of this data is covered under its user agreement.” The data that corporations transmit should be limited to the data and data elements necessary to provide desired functionality yet also restricted enough for an individual’s activity to remain private.

To join the conversation, begin to educate yourself using our ADE Primer, below, plus the following resources:

A Primer on how Adobe Digital Editions (ADE) works with library ebooks

I’m a reader and I go to use a library ebook
(via Overdrive or other downloading service offered):

  1. what will I need to install on my device(s)?
    (laptop, tablet, phone, & iPod let’s assume)

    • laptop/computer: Adobe Digital Editions (ADE), activated with an Adobe ID
    • tablet, phone, iPod, etc.: Bluefire Reader (or compatible) app, activated with an Adobe ID
  2. how do the various devices know which page to show me next when I switch between them?
    • access and synchronization across devices are managed using the Adobe ID and the information associated with the ebook and by data tracked with ADE
  3. what technologies are behind the scenes?
    • the ADE managed digital rights management (DRM) required by the ebook publisher
    • the ebook reader software/app
    • the internet
  4. what data is needed to be able to do the sync?
    • the minimum required data is arguably the UserID, BookID, and a page-accessed timestamp
    • the current ADE version, ADE4, tracks significantly more data than the minimums above
  5. how is that data shared between devices?
    • Users can access their ADE account from up to 6 different devices. When accessing the ID/account from a new device the user must “activate” the device by logging into the Adobe ID/Account to prove that the user is the legitimate account holder.
    • ADE4 shares all ebook data it tracks in plain-text in an unsecured channel over the internet
  6. what functionality would not work if this were suddenly not provided?
    • if ADE did not provide reader tracking data, each time a reader opened an ebook on a different device the reader would have to remember the page s/he was on and then navigate to that page to continue reading from where they left off
    • A computer can be anonymously activated using ADE, however this will prevent the items from being accessible from more than one computer/device. The ebooks would then be considered to be “owned” by that computer and would not be available to be accessed from other devices.
    • if ADE were completely withdrawn from availability, ebook DRM would prevent use of ADE-managed DRM-protected ebooks

From a technology point of view, the clear-text data transmitted suggests the data may be for synchronization, but it seems, first and foremost, to support various licensing business models. Because Adobe might in the future have customers who want to use Adobe DRM to expire a book after a certain number of hours or pages read, they may feel the need to collect that data. Adobe’s data collection seems to be working as intended here. Clear-text transmission is clearly a bug, but that this data about patron reading habits is being transmitted to Adobe is a feature of the software.

The philosophical discussion which needs to happen around ebooks and DRM should include:

  • what data elements enable user-desired functionality
  • what data elements enable digital rights management
  • what data elements above are/are not within ALA’s stated professional ethics
  • whether tracking ebook user behavior is acceptable *at all*

From libraryland conversations around the issue so far, opinions have ranged from ‘tracking is not the problem, the clear-text transmission is‘ to ‘tracking is very much a problem, it’s unacceptable.’

Issues like this highlight the need to revisit stated positions and evaluate where the balance point is between accomodating user functionality and protecting against collection of personally identifiable data, or metadata.

*Post updated to correctly credit Nate Hoffelder as the original discoverer (my apologies!)

[Update with more on the library data cycle from Gary Price of INFOdocket below]

  • According to OverDrive: “It is our understanding that the reported issue involves Adobe Digital Editions 4, which is not used as part of the OverDrive app.” Meaning this ADE4 problem does not affect their apps for Android, iOS, etc., it is only for the ADE console which is installed on computers and laptops.
  • Pulling more from Gary’s long-time informtional and eductaional posts about library data and privacy, there are data insecurities in the configuration of many library services which involve sharing library user data with third parties such as Adobe, Amazon, library catalog vendors, etc.
  • As Gary correctly points out: “issues with any third parties having access to library user data need to be discussed not only in the library community but also directly with users.”

Published by

AaronDobbs

I'm centralizing the maintenance of my Blog and Wiki "about me" pages (to cut down on outdated information floating in the flotsam) -- Aaron the Librarian

12 thoughts on “ADE in the Library eBook Data Lifecycle”

  1. Thanks for writing this up, Aaron!

    I want to push a bit harder on the technical details, because I think the system can be even more limited while still providing for cloud sync.

    One, this information can be encrypted such that even Adobe never knows who’s reading what; it needs to know which account to send the data to, but the system can be designed so that the data is only ever decrypted on the user’s device.

    Two, I want to clarify that “tracks significantly more data than the minimums above” includes tracking every page the reader accessed and the timestamp of every access (when synchronization would require only the number of the last page accessed). I might be able to imagine a reader-useful feature that would need that level of data, but the use case that springs most readily to mind is reading analytics for publishers. And I can, again, imagine a relationship where libraries are comfortable providing that kind of information, but only if it were explicitly negotiated, with privacy safeguards.

  2. Nice explanation of some of the technical issues around the ADE and how 3rd party applications/services can really complicate things. I agree we need to have a conversation about the balance point between convenience and privacy.

    I can see both sides of the issue. I am the current chair of the privacy subcommittee of the ALA Intellectual Freedom Committee and in my day job, I am head of the Systems department for an academic library and spend a lot of time and effort trying to improve the user experience (i.e. make things more convenient) for students and researchers.

    Before we can even begin to determine a balance point, we need a better idea of how the various vendors deal with patron data. I’d love to see a description of the patron data lifecycle for each major vendor. This type of overview could help libraries make better purchasing decisions in light of their own policies and allow us as a profession to have a knowledgeable discussion about issue. Maybe this is something LITA and the Intellectual Freedom Committee could work on together?

Comments are closed.