Future-Proofing: Xena objects

From the JISC Future Proofing project blog

In this post, I’m just going to look at the normalised Xena object, which is an Archival Information Package (AIP). It’s encoded in a file format whose extension is .xena and which is identified by Windows as a ‘Xena preserved digital object’ type.

We’ve created these for normalisations of each of our principal records types – documents, spreadsheets, PowerPoint, PDFs, images and emails.

When we click on one of these it launches the Xena Viewer application. This gives an initial view of the AIP in the default NAA Package View. This is very useful as it gives a quick visual on how well the conversion / normalisation has worked. (It seems particularly strong on rendering documents and common image formats.)

The AIP is actually quite a complex object though. It is in fact an XML wrapper which contains the transformed object, and the metadata about it. This conforms with the preservation model as proposed by the Library of Congress Metadata Encoding and Transmission Standard, which suggests that building an XML profile like this is the best way to make a preserved object self-describing over time.

The main components of a Xena AIP are (A) technical metadata about the wrapper and the conversion process, some of which is expressed as Dublin Core-compliant metadata, and some of which conforms to an XML namespace defined by the NAA; (B) a string of code that is the transformed file; and (C) a checksum.

The Xena Viewer offers three ways of seeing the package:

1) The default NAA Package view. For documents, this shows the representation of the transformed object in a large window. Underneath that window is a smaller scrolling window which displays the transformation metadata. Underneath this is an even smaller window displaying the package signature checksum.

2) The Raw XML view. In this view we can see the entire package as a stream of XML code. This makes clear the use of XML namespaces and Dublin Core elements for the metadata.

3) The XML Tree View. This makes clear the way the AIP is structured as a package of content and metadata.

The default view changes slightly depending on object type:

For documents, the viewer window shows the OO representation in a manner that keeps some of the basic formatting intact. To be precise, it’s a MIME-compliant representation of the OO document.
For images, the viewer shows a Base 64 rendering of an image file.
For PDF documents, the viewer integrates a set of Adobe Reader-like functions – search, zoom, properties, page navigation etc. This seems to be done by a JPedal Library GUI.
For Spreadsheets and Powerpoint files, the viewer doesn’t show anything in the big window (but we’ll get to this when we follow the “Show in OpenOffice” option, which will be the subject of another post).
For emails, the Xena Viewer has a unique view that not only displays the message content, but also the email header, and any attachments, both in separate windows. We’ll look at this again in a future post about how Xena works with emails.

As an Archival Information Package in OAIS terms, a Xena object is clearly compliant and suitable for preservation purposes. The only problem for our project is getting more of our metadata into the AIP. To put it another way, integrating more metadata that records our digital preservation actions. Ideally, we would like to be able to perform a string of actions (DROID identification of an object, virus checking, checksum etc) and integrate the accumulated metadata into a single AIP. However, this is not in scope of the project.

Leave a Reply Cancel reply