2005

Application of JPEG2000 in Archives & Libraries

Application of JPEG2000 in Archives & Libraries
Peter Murray
Concurrent session #1, LITA National Forum 2005
September 30, 2005

Started out with questions; who is thinking of it as an access technology? Who is thinking of it as a preservation technology?

Contributions from the audience: what are people interested in?
-Someone who just bought JPEG2000 is wondering about how to use it.
-People who are interested in archiving issues
-People who are using ContentDM are using JPEG2000, there are a few of those folks here.
-Can we use it for newspapers?
-VidiPax has customers who are asking about Motion JPEG2000 …
-What are the performance issues
-LuraTech
-ExLibris, who OEM’s the AWARE product
-Endeavor

Attributes the presentation in part to Robert Buckley, Research Fellow at Xerox; some of Peter’s slides are from Buckley’s presentation.

Key Messages:
Will begin with an intro to the format.
Talk about the “value proposition”
Opportunities for collaboration

What is JPEG2000?
Wavelet-based image compression standard. Same ISO that worked on JPEG worked on this. 2000 was the year that ISO officially passed part 1 of the standard.

Conception:
-Improve the performance of JPEG
-Add features and capabilities not available with Basline JPEG compression.

What is required to adopt a new technology?
1) Knowledge

JPEG2000 is one standard but it has evolving number of parts.
—–
Part 1) the core image coding part, was passed in 2000, followed by Part 2, which was extensions. Part 3: Motion JPEG2000. 4) Conformance Testing 5) Reference Software 6) Compound image file format. Blah blah more parts…

Image codestream compression architecture: PART ONE

Wavelet Transform: see slides from XEROX (Peter is trying to get the rights to redistribute those). The format divides image data up into discrete blocks by size, by resolution, etc. so that parts of the codestream can be accessed to get derivatives of the image at various sizes, resolutions, etc. very efficiently. When all of the pieces, or blocks, are reassembled, the original image results.

Can deliver JPEG 2000 images:
Progressively by size
Progressively by resolution
Progressively by Quality

JPEG2000 optimizes compression across the entire image, rather than by spatial blocks as JPEG does.

Color management in part one is based on sRGB color space. The people who were at the table during the time that discussions were happening, and they felt sRGB was good enough.

Part 2, JPX is more capable. It supports other color spaces, full ICC profiles
—-
Image components

Part 1 (JP2) supports 1- or 3-component images, plus optional masks; all JPEG 2000 compressed. 1 would be b/w only, 3- components would be RGB.

Part 2 (JPX) supports anything for which there is a color spec, for example multispectral photography. Getting beyond just the Red, Green and Blue spectrum.

FILE FORMAT ARCHITECTURE

Initially, JPEG group only specified the compression, and didn’t address the file format. There were negative outcomes, with a proliferation of different JPEG file formats.

This time, decided to focus on the file formats.

A JPEG200 file is a sequence of boxes with 3 fields each:
-length L
-type T
-data D

With such a file format, an application can read through the block (box) and figure out how long it is, skip over the L component to the type and see what kind of info it is. If it isn’t interested in that data, it has all of the information it needs to skip over it entirely.

BASIC JPEG2000 file:
-Begins with JPEG2000 signature file (declares itself as a member of the JPEG2000 filefamily)
-File type box
-Header box (image and color params)
-codesstream box (actual image data)
-Metadata

METADATA
You can pretty much put anything you like in it; allows for two types:
-XML box, any XML-formatted metadata
-Any other kind of data (UUIC boxes), voice annotations, TIFFs, PDFs, etc.

JPEG2000 FILE FORMAT FAMILY
-JP2 (JPEG 2000 Core)
-JPX (Extensions)
-MJ2 (timed sequence of JPEG2000 images). Not coded with interframe differences
-JPM (JPEG2000 Multi-layer). Documents where different parts of the image might be coded differently; for example, a newpaper article where the text can be bitonal but the photograph rgb.

Motion JPEG2000 was recently adopted by Digital Cinema initiative: this will be the way movies are going to be delivering content to movie theaters.

JP2 HEADER BOX: TECHNICAL METADATA LIKELY TO BE ENCODED
-image header
-Bits per component

There has already been an initiative to map the JP2 headers to TIFF (see the American Memory site for info on TIFF headers.
Some things that don’t have direct mappings in the TIFF header to JP2 header info can possibly be mapped to Dublin Core instead.

Protection
-Security. JPX introduces a digital signature Box, containing a checksum or digital signature
-Part 8 supports selective encryption and conditional access for the codestream.
For example, you could password-protect a certain layer of your JPEG2000 file. This may not necessarily be advisable for long-term archiving, but could perhaps be useful for secure transit between archives.

Error resilience
Variable length coders like JPEG2000 are vulnerable to errors that cause loss of synchronization. In Part 1, optional start of packet (SOP) synchronization markers are defined, so that an application reading in the file could resynchronize.
-Part 11, which deals with JPEG2000 for wireless, defines methods for protecting the codestream from errors in noisy environments.

Losing a certain amount of data from a JPEG2000 image will yield a loss of some kind, but it will not be as catastrophic as loss of data from the middle of, say a JPEG or an uncompressed TIFF or a TIFF with compression. Every chunk of data and compression in JPEG2000 is applied across the entire image. You don’t have chunks of data that correspond to what we think of as chunks of an image, that is, a block with X,Y coordinates.

JPM – Multilayer JPEG2000 for compound document images.

JPSearch. Provide clear understanding of the image retrieval process. The library community should be active here.

——————–

JPEG2000 Practice in Archives and Libraries

What is required to adopt a new technology?
2) Is JPEG2000 better enough technology? This is the key question that we should be asking ourselves.

WHY USE JPEG2000?
-Open standard; royalty-free use. Write and encoder and decoder and pay royalties to noone. There are no patent issues for the encode/decode. The vendors can license their software. Writing an encoder is harder than writing a decoder.
-One asset supports multiple derivatives; one file for both lossless and lossy data.
-Region-of-interest (ROI) on coding and access. Can specify that certain parts of the image are very important and should be encoded at higher quality, for example.
-Easily handles large images. (Peter’s example: ERMapper brought very very large diskpacks with a 10 Terabyte image and browsed it as a JPEG2000 file).

Architecture for access and archiving with JPEG2000
-Part 9
Peter’s working on the architecture piece; capture and management of JPEG2000

JPEG2000 in use:
National Digital Newspaper Program (NDNP)
-Objectives and constraints

UConn’s Charles Colson project
Annotated Melville’s manuscripts
Received a grant for preservation treatment and digitization.
Have embedded various types of metadata: TEIheaders (XML data), PDF (UUID data), and the entire EAD finding aid to provide the context, so the user can tell where this came from.

—–
What remains?

What is required to adopt a new technology?
3) Confirmation (dialog, and people to test, is this where we should be going? And Peter thinks this is better enough than what we have been doing. Harvard and LC are also starting to do this.

Final questions
Has anyone endorsed this as a standard? Library of Congress has put it on par with TIFF as a storage standard.

Archive groups haven’t endorsed it yet.

Is this replacing EXIF data? Will camera vendors do JPEG2000? Yes, there will be some new digital cameras this Christmas with JPEG2000 support.