Future-Proofing: DIPs from a Xena object – descriptive metadata

From the JISC Future Proofing project blog

In OAIS terms, the Xena digital object is our Archival Information Package (AIP). This is the object that would be stored and preserved in UoL’s archival storage system (if we had one).

For the purposes of this project, the records manager needs to be assured we can render and deliver a readable, authentic version of the record from the Xena object. In OAIS terms, this could be seen as a Dissemination Information Package (DIP) derived from an AIP. Among other things, the DIP package ought to include sufficient descriptive metadata.

We’ve defined a minimal set of what we would need for records management purposes (names, dates for authenticity) and for information retrieval, and criteria for assessing overall success of the transformations – such as legibility, presentation, look and feel, and basic functionality. See our previous post for the documentation of how we arrived at our criteria.

For most of the examples below we are looking at standard Xena normalisations, which can produce a DIP by the following methods:

Export the AIP to nearest Open Office equivalent
Export the AIP to native format
Export the AIP to a format of our choice (an option we haven’t tried as yet)

Emails are slightly more complex – see below for more detail, and previous post on emails.

Documents, spreadsheets and powerpoints

For these MS Office documents, Xena normalises by converting them to their nearest Open Office equivalent. We can view this Open Office version via the Xena Viewer simply by clicking on Show in OpenOffice.org.

In the case of a sample docx, this shows us the document with its original formatting intact. In Open Office, we can now look at File, Properties, and confirm the following descriptive metadata are intact (some of these we will revisit when we look at significant properties):

Title
Subject
Keywords
Comments
Company Name
Number of pages
Number of tables
Number of graphics
Number of OLE objects
Number of paragraphs
Number of words
Number of characters
Number of lines
Size
Date of creation / created by who
Date of modification / modified by who

One missing property is a discrete ‘Author’ field in the OO version.

PDFs

Xena Viewer has its own Adobe-like viewer for PDFs. When we look at the Document Properties for a normalised PDF, we get the following descriptive metadata:

File size
Page count
Title
Author
Creator (i.e. software that created the file)
Producer (i.e. software that created the file)
Date of creation
Date of modification

Images

We may not need much descriptive metadata for individual images. We looked at the converted images using a free EXIF viewer tool and the date values (creation and modification) are intact in the normalised object.

Emails

For emails, the AIP will either be a standard Xena normalised object or a binary Xena object. Broadly, the former can export to an XML version and the latter can export back to its native format (.msg file). We’re always guaranteed to get a minimum of descriptive metadata whichever transformation we enact. This would be:

Title
Author
Recipient
Date

Results – progress so far

Below is a PDF of my table of analysis of descriptive metadata (and significant properties).

DescriptiveMetadata and SigProps

We also have Kit’s assessment of the transformations, including his comments on whether a reasonable amount of metadata is intact.

Kit20111208-Testing

Documents, spreadsheets and powerpoints

PDFs

Images

Emails

Results – progress so far

Leave a Reply Cancel reply