April 2016 – Ed The Archivist

IPTC Photo Metadata Conference

The DART team will attend the IPTC Photo Metadata conference in Zagreb on 26 May 2016. The theme is “Keep Metadata Alive and Intact”. Ed Pinsent will be speaking in the morning session, which is themed on “Strongly Attached Metadata, what you need to know”.

We think the Conference will allow us to speak to various image management experts, people and organisations who manage picture libraries, who may have an interest in IPTC metadata and the management of their collections with a Digital Asset Management System (DAMS).

Sarah Saunders of Electric Lane works with a lot of these professional image management people. When she came on our DPTP Course recently, she noticed a few things:

There’s more to preservation of image files (e.g. TIFFs or JPEGs) than most people think
Elements of a possible digital preservation repository / system, and its workflow, overlapped to some degree with what she understood about the production chain for images, and the place of the DAMS, which leads to…
The idea (which we tend to teach on DPTP) that a preservation system doesn’t have to be a single system, but rather could repurpose existing systems (or elements of them) to arrive at a whole that is OAIS-compliant; for instance, one system performing storage, one for access, one for ingest.
She liked our insistence on the management of technical metadata and other useful metadata embedded in files

IPTC Photo Metadata Conference – Our Talk

From talks with Sarah there evolved the notion that I might be able to deliver a presentation which expresses some of these messages specifically targeted at image management experts. With that in mind, I’ve tried to devise a blue-sky thinking slide show that covers the following:

One – Drivers: why this audience might be interested in applying digital preservation to their image collections.

Two – How to do it for image files, involving some simple overviews of migration and technical metadata extraction. While image files will have generic technical metadata, e.g concerning the size, resolution, and color of the image, there is also specialist metadata. Of especial interest to this audience, we think, will be the management of IPTC metadata and EXIF metadata.

These are two specialist types of metadata which by and large only apply to digital image files. Broadly, IPTC metadata can be used to protect rights and ownership of images; and EXIF metadata records details about the hardware (camera, scanner) that was used to create the image.

Interestingly, although it’s possible to embed these metadata in some formats (e.g. TIFF, JPEG, and JPEG 2000), neither metadata type is guaranteed to survive permanently – especially if the file is migrated.

There’s also descriptive metadata created by a curator to help describe and identify images – names, keywords, dates. Quite often this is part of a Digital Asset Management System, and will be exposed and published online to make the images more meaningful and accessible to an audience.

Is any of this metadata useful in the long term? I would argue that it is, and maybe we need to learn how to protect it better.

Preserving Digital Content – Taking first steps with the AOR toolkit

I have long had an interest in promoting digital preservation, and most recently with the the relaunch of the AIDA toolkit as AOR toolkit. However in my work I meet a lot of people in a lot of organisations, for whom “preservation” – in perhaps the traditional archival sense – isn’t necessarily their sole or principle interest.

AOR Toolkit – Possible uses

To speculate on this for a while, we could instead consider (a) what kind of digital content people typically have and (b) what do they want to do with it. It’s possible that the new AOR Toolkit can help as a first step to assessing your capability to perform (b).

One milieu that we’re most familiar with is higher education, where the (a) is born-digital research data and the (b) is research data management. This may in time shade into a long-term preservation need, but not exclusively. That said, the main driver for many to manage research data in an organised way is actually the requirement of the funders for the research data to be preserved.

Not unrelated to that example, repository managers may not be primarily interested in preservation, but they certainly have a need to manage (a) born-digital publications and research papers and (b) their metadata, storage, dissemination, and use, perhaps using a repository. On the other hand, as the content held in a publications repository over time starts to increase, the repository managers may need to become more interested in selection and preservation.

Another possibility is electronic records management. The (a) is born-digital records, and the (b) could include such activities as classification, retention scheduling, meeting legislative requirements, metadata management, storage (in the short to mid-term), and security. In such scenarios, not all digital content need be kept permanently, and the outcome is not always long-term digital preservation for all content types.

AOR toolkit – Beyond the organisation

Digital librarians, managers of image libraries, in short anyone who holds digital content is probably eligible for inclusion in my admittedly very loose definition of a potential AOR Toolkit user. I would like to think the toolkit could apply, not just to organisations, but also to individual projects both large and small. All the user has to do is to set the parameters. It might even be a way into understanding your own personal capability for “personal archiving”, i.e. ensuring the longevity of your own personal digital history, identity and collections in the form of digital images, documents, and social media presence. Use the AOR toolkit to assess your own PC and hard drive, in other words.

It remains to be seen if the AOR Toolkit can match any of my wide-eyed optimistic predictions, but at least for this new iteration we have attempted to expand the scope of the toolkit, and expanded the definitions of the elements, in order to bring it a step closer towards a more comprehensive, if not actually universally applicable, assessment tool.

AOR toolkit – addressing a community need?

Results from our recent training needs survey also indicate there is a general need for assessment in the context of digital preservation. In terms of suggestions made for subjects that are not currently being taught enough, some respondents explicitly identified the following requirements which indicate how assessment would help advance their case:

Self-assessment and audit
Assessment/criteria/decision in the context of RDM
Quality analysis as part of preservation planning and action
Benchmarking in digital preservation (i.e. what to do when unable to comply with OAIS)
Key performance indicators for digital preservation
What to check over time

In the same survey, when asked about “expected benefits of training”, an even more interesting response was forthcoming. There were 32 answers which I classified under strategy and planning, many of the responses indicating the need for assessment and analysis as a first step; and likewise, 21 answers alluding to the ability to implement a preservation system, with many references to “next steps” and understanding organisational “capacity”. One response in particular is worth quoting in full:

“We have recognised and assessed the problem, decided on a strategy and are nearing the purchase of a system to cope with what we currently have, but once this is done we will need to create two projects – one to address ongoing work and one to resolve legacy work created by our stop-gap solution. I’d expect training to answer both these needs.”

All of the above is simply to reiterate what I said in March: “I hope to make the new AOR toolkit into something applicable to a wider range of digital content scenarios and services.”

Self-assessment as digital preservation training aid

I have always liked to encourage people to assess their organisation and its readiness to undertake digital preservation. It’s possible that AIDA and the new AOR Toolkit could continue to have a small part in this process.

Self-assessment in DPTP

We have incorporated exercises in self-assessment as digital preservation training aid in the DPTP course for many years. We don’t do it much lately, but we used to get students to map themselves against the OAIS Reference Model. The idea was they could identify gaps in the Functional Entities, information package creation, and who their Producers / Consumers were. We would ask them to draw it up as a flipchart sketch, using dotted lines to express missing elements or gaps.

Another exercise was to ask students to make an informed guess as to where their organisation would sit on the Five Organisational Stages model proposed by Anne Kenney and Nancy McGovern. The most common response we usually had was Stage 1 “Acknowledge” or Stage 2 “Act”. We also asked which leg of their three-legged stool (Organisation, Technology, or Resources) was shortest or longest. The most memorable response we ever had to the stool exercise produced a drawing by one student of an upholstered Queen Anne chair.

Other self-assessment models we have introduced to our classes include:

The NDSA Levels of Digital Preservation, which is good because it’s so compacted and easy to understand. Admittedly, in the version we were talking about, it only assessed the workings of a repository (not a whole organisational setup) and focussed on technological capability like checksums and storage. This may change if the recent proposal, to add a row for “Access”, goes forward.
The DP Capability Maturity Model. In this model we liked the very rich descriptions of what it’s like to be operating at one of the proposed five levels of success.
The DRAMBORA toolkit, which emphasises risk assessment of a repository.

We also tried to encourage students to look at using elements of the TRAC and TDR audit regime purely from a self-assessment viewpoint. These tools can be time-consuming and costly if you’re undergoing full audited certification, but there’s nothing to stop an organisation using them for their own gap analysis or self-assessment needs.

Matter of fact this line of thinking fed into the SPRUCE toolkit I worked on with Chris Fryer; together we created a useful and pragmatic assessment method. ULCC prepared the cut-down and simplified version of ISO 16363, by retaining only those requirements considered essential for the purposes of this project. The project added value by proposing systems assessment, product analysis, and user stories as part of the process. My 2013 blog post alludes once again to the various assessment toolkits that can be found in the digital preservation landscape.

Review of self-assessment landscape

Are there too many toolkits, and are they really any good? Christoph Becker at the University of Toronto has been wondering that himself, and his team conducted a study on the assessment model landscape, which became a paper published at iPRES. His work in evaluating these assessment frameworks continues:

“Assessment models such as AIDA, DPCMM and others are very particular artifacts, and there are methodologies to design, apply and evaluate such models effectively and rigorously. Substantial knowledge and specific methodology from Information Systems research provides a foundation for the effective design, application and evaluation of frameworks such as AIDA.

“We have just completed an in-depth review of the state of the art of assessment frameworks in Digital Preservation. The article is currently under review; a much more informal initial overview was presented at IPRES (Emily Maemura, Nathan Moles, Christoph Becker. A Survey of Organizational Assessment Frameworks in Digital Preservation. In: International Conference on Digital Preservation (IPRES 2015), November 2015, Chapel Hill.)

“We also recently completed a detailed investigation that leveraged the foundations mentioned above to analyze AIDA and the DPCMM in detail from both theory and practice in two real organizations: The University of Toronto Libraries, and the Austrian State Archives (i.e. we conducted four assessments). We conducted these case studies not to evaluate the organizations, but instead, to evaluate the frameworks.

“We could now design a new assessment model from scratch, and that is our default plan. However, our work showed that (too) many models have already been designed. Most models have been designed with a focus on practice (which is good), but in very informal ways without rigorous design methods (which is not so good). Aside from a model, there’s also need for a tool, a method, guidance, and empirical evidence from real-world applications to be developed and shared. And then, since assessment is often geared toward improvement, the next question is how to support and demonstrate that improvement over time.”

AIDA’s new name: AOR Toolkit

The hardest part of any project is devising a name for the output. The second hardest thing is devising a name that can also be expressed as a memorable acronym.

I think one of the most successful instances I encountered was the CAMiLEON Project. This acronym unpacks into Creative Archiving at Michigan and Leeds Emulating the Old on the New. It brilliantly manages to include the names of both sponsoring Institutions, and accurately describes the work of the project, and still end up as a memorable one-word acronym. Even the word itself resembled “chameleon” of course, a certain lizard which the project quite naturally used as its logo. When you consider the project itself was about Emulation – a particular approach to digital preservation that involves “copying” IT environments – then that emblem is strikingly apposite to the meaning of the work.

From AIDA to AOR toolkit

I realised that the new AIDA name and acronym could never possibly tick all those boxes. In February we put it out to the social media arena, offering prizes to anyone who could help us devise something suitable. The dilemma was expressed here. Meanwhile I tried making use of various online acronym generation tools, and found myself getting into an even worse mess of linguistic spaghetti.

In the end I decided to abandon acronyms, and instead settled for:

The Assessing Organisational Readiness (AOR) Toolkit

Acceptable abbreviations of this name would include AOR or AORT. AOR is an acronym already – it can mean “Album-Oriented Rock” or “Area Of Responsibility”. The second one is not entirely unsuitable for this toolkit.

Rationale for AOR toolkit:

This is simpler and shorter than Assessing Organisational Readiness for Managing Digital Content or similar
It captures the three most important functions of the toolkit (the “digital” side of it is almost irrelevant, you could say)
It includes “readiness”, which the old AIDA missed, and which is central to the toolkit
It allows users to make other interpretations of what “managing digital content” means to them (e.g. it could mean preservation, but it could also mean providing access), without closing off these meanings

I do wonder though if “cute” project acronyms have had their day now. When I was doing web-archiving for the JISC, almost every project had one around 2006-2007, and we ended up with rather forced constructions such as this one.

From AIDA to CARDIO

AIDA had a part to play in the creation of another assessment toolkit, CARDIO. This project is owned and operated by HATII at the University of Glasgow, and Joy Davidson of the Digital Curation Centre was the architect behind the toolkit.om AIDA to CARDIO

CARDIO (Collaborative Assessment of Research Data Infrastructure and Objectives) is targeted at Research Data Management (RDM), and digital outputs associated with research – be they publications or data. The processes for the management of these digital assets has been a concern with HE Institutions in the UK for some time now. CARDIO will measure an Institution’s capacity and preparedness for doing RDM.

If you’ve been following our blog posts on this subject, you’ll recognise overlap here with AIDA. But where AIDA was assessing a potentially very wide range of digital asset types, CARDIO was far more focussed and specific. As such, there was a very real need in our project to understand the audience, the environment, and the context of research in higher education. It was targeted at three very specific users in this milieu: the Data Liaison Officer, the Data Originator, and the Service Provider. For more detail, see the CARDIO website.

I worked with Joy in 2011-2012 to contribute an AIDA-like framework to her new assessment tool. The finished product ended up as webforms, designed by developers at HATII, but ULCC supplied the underlying grid and the text of the assessments. The basic structure of three legs and numbered elements survived, but the subjects had to change, and the wording had to change. For instance, new elements we devised specific for this task included “Sharing of Research Data / Access to Research Data” and “Preservation and Continuity of Research Data”.

The actual reworking was done by ULCC with a team of volunteers, who received small payments from a project underspend. Fortunately these 12 volunteers were all experts in just the right fields – data management, academic research, digital preservation, copyright, and other appropriate subjects.
I could give you a long report of their insightful comments and helpful suggestions, which show how AIDA was reformed and reshaped into CARDIO. Some reviewers rethought the actual target of the assessment statements; others were strong on technical aspects. Some highlighted “jargon alerts”. Through this work, we improved the consistency of the meaning of the five stages across the three legs, and we added many details that are directly relevant to the HE community and to managing research data.

Benefits of CARDIO

Since its launch, CARDIO is now frequently used as a first step by UK Institutions who are embarking on a programme of managing research data. They use CARDIO to assess their institutional capability for RDM.

I’ll end with one very insightful paragraph from a reviewer which shows a detailed grasp of how an organisational assessment like AIDA and CARDIO can work:

“Processes, workflows, and policy grow more well-defined and rigid all the way up to stage 4, which represents a well-honed system suited to the internal needs of the repository. From that point onward, the progression to stage 5 is one of outward growth, with processes and workflows becoming more fluid to meet the needs of possible interoperating partners/collaborators. I generally do not see this “softening” in the 5 stages of CARDIO – rather, the 5th stage often represents things being fixed in place by legislation, a position that can become quite limiting if the repository’s (or stake holders’) needs change in the future.”

The AIDA toolkit: use cases

There are a few isolated uses of the old AIDA Toolkit. In this blog post I will try and recount some of these AIDA toolkit use cases.

In the beginning…

In its first phase, I was aided greatly in 2009 by five UK HE Institutions who volunteered to act as guinea pigs and do test runs, but this was mainly to help me improve the structure and the wording. However, Sarah Jones of HATII was very positive about its potential in 2010.

“AIDA is a very useful for seeing where your strengths and weaknesses lie. The results could provide a benchmark too, so if you go on to make some changes you can measure their effects…AIDA sounds particularly useful for your context too as this is about institutional readiness and assessing where strengths and weaknesses lie to determine areas for investment.”

I also used AIDA as part of consultancy for a digital preservation strategy, working with the digital archivist at Diageo in 2012; they said

“We agree that the AIDA assessment would be worthwhile doing as it will give us a good idea of where we are in terms of readiness and the areas we need to focus on to enable the implementation of a digital preservation strategy and system.”

Sarah Makinson of SOAS also undertook an AIDA assessment.

Further down the line…

Between 2011 and 2015, the toolkit was published and made available for download on a Jisc-hosted project website. During that time various uses were made of AIDA by an international audience:

Natalya Kusel used it for benchmarking collection care; she had

“been looking for some free self-assessment tools that I can use for benchmarking the current ‘health’ of collections care. I’m looking for something that will help me identify how the firm currently manages digital assets that have a long retention period so I can identify risks and plan for improvement.”

Anthony Smith used it as a teaching aid for part of UNESCO’s Intergovernmental Oceanographic Data Exchange sponsored teaching programme.

Kelcy Shepherd of Amherst College used it in her workshops.

“Coincidentally, the Five Colleges, a consortium I’m involved in, used the Toolkit a few years ago. Each institution completed the survey to ascertain levels of readiness at the various institutions, and determine areas where it would make sense to collaborate. This helped us identify some concrete steps that we could take together as a consortium.”

Walter D Ray, the Political Papers archivist at Southern Illinois University, used it to assess his library’s readiness:

“I’m glad to see work is being done on the AIDA toolkit. We used it for our self-assessment and found it helpful. As my boss, Director of Special Collections Pam Hackbart-Dean says, “the digital readiness assessment was a useful tool in helping give us direction.” I would add that it helped us define the issues we needed to confront.

“Since then we have developed some policies and procedures, revised our Deed of Gift form, set up a digital forensics workstation, and put a process in place to handle digital projects coming from elsewhere on campus. We greatly appreciate the work you’ve done on the AIDA toolkit.”

However, on the less positive side, Nathan Moles and Christoph Becker of University of Toronto studied AIDA as part of their “in-depth review of the state of the art of assessment frameworks in Digital Preservation.” Their survey of the landscape indicates the following:

“Our work showed that (too) many models have already been designed. Most models have been designed with a focus on practice (which is good), but in very informal ways without rigorous design methods (which is not so good). Aside from a model, there’s also need for a tool, a method, guidance, and empirical evidence from real-world applications to be developed and shared.”

AIDA in particular was found wanting:

“I think AIDA provides an interesting basis to start, but also currently has some shortcomings that we would need to see addressed to ensure that the resulting insights are well-founded. Most importantly, the fundamental concepts and constructs used in the model are currently unclear and would benefit from being set on a clear conceptual foundation.”

These stories show that AIDA had more of a shelf-life and more application than I originally expected. Our hope is that the new AOR Toolkit will give the ideas a new lease of life and continue to be of practical help to some in performing assessments.