Future-Proofing: DIPs from a Xena object – descriptive metadata

From the JISC Future Proofing project blog

In OAIS terms, the Xena digital object is our Archival Information Package (AIP). This is the object that would be stored and preserved in UoL’s archival storage system (if we had one).

For the purposes of this project, the records manager needs to be assured we can render and deliver a readable, authentic version of the record from the Xena object. In OAIS terms, this could be seen as a Dissemination Information Package (DIP) derived from an AIP. Among other things, the DIP package ought to include sufficient descriptive metadata.

We’ve defined a minimal set of what we would need for records management purposes (names, dates for authenticity) and for information retrieval, and criteria for assessing overall success of the transformations – such as legibility, presentation, look and feel, and basic functionality. See our previous post for the documentation of how we arrived at our criteria.

For most of the examples below we are looking at standard Xena normalisations, which can produce a DIP by the following methods:

  • Export the AIP to nearest Open Office equivalent
  • Export the AIP to native format
  • Export the AIP to a format of our choice (an option we haven’t tried as yet)

Emails are slightly more complex – see below for more detail, and previous post on emails.

Documents, spreadsheets and powerpoints

For these MS Office documents, Xena normalises by converting them to their nearest Open Office equivalent. We can view this Open Office version via the Xena Viewer simply by clicking on Show in OpenOffice.org.

In the case of a sample docx, this shows us the document with its original formatting intact. In Open Office, we can now look at File, Properties, and confirm the following descriptive metadata are intact (some of these we will revisit when we look at significant properties):

  • Title
  • Subject
  • Keywords
  • Comments
  • Company Name
  • Number of pages
  • Number of tables
  • Number of graphics
  • Number of OLE objects
  • Number of paragraphs
  • Number of words
  • Number of characters
  • Number of lines
  • Size
  • Date of creation / created by who
  • Date of modification / modified by who

One missing property is a discrete ‘Author’ field in the OO version.

PDFs

Xena Viewer has its own Adobe-like viewer for PDFs. When we look at the Document Properties for a normalised PDF, we get the following descriptive metadata:

  • File size
  • Page count
  • Title
  • Author
  • Creator (i.e. software that created the file)
  • Producer (i.e. software that created the file)
  • Date of creation
  • Date of modification

Images

We may not need much descriptive metadata for individual images. We looked at the converted images using a free EXIF viewer tool and the date values (creation and modification) are intact in the normalised object.

Emails

For emails, the AIP will either be a standard Xena normalised object or a binary Xena object. Broadly, the former can export to an XML version and the latter can export back to its native format (.msg file). We’re always guaranteed to get a minimum of descriptive metadata whichever transformation we enact. This would be:

  • Title
  • Author
  • Recipient
  • Date

Results – progress so far

Below is a PDF of my table of analysis of descriptive metadata (and significant properties).

We also have Kit’s assessment of the transformations, including his comments on whether a reasonable amount of metadata is intact.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.