Preserving Digital Content – Taking first steps with the AOR toolkit

I have long had an interest in promoting digital preservation, and most recently with the the relaunch of the AIDA toolkit as AOR toolkit. However in my work I meet a lot of people in a lot of organisations, for whom “preservation” – in perhaps the traditional archival sense – isn’t necessarily their sole or principle interest.

AOR Toolkit – Possible uses

To speculate on this for a while, we could instead consider (a) what kind of digital content people typically have and (b) what do they want to do with it. It’s possible that the new AOR Toolkit can help as a first step to assessing your capability to perform (b).

One milieu that we’re most familiar with is higher education, where the (a) is born-digital research data and the (b) is research data management. This may in time shade into a long-term preservation need, but not exclusively. That said, the main driver for many to manage research data in an organised way is actually the requirement of the funders for the research data to be preserved.

Not unrelated to that example, repository managers may not be primarily interested in preservation, but they certainly have a need to manage (a) born-digital publications and research papers and (b) their metadata, storage, dissemination, and use, perhaps using a repository. On the other hand, as the content held in a publications repository over time starts to increase, the repository managers may need to become more interested in selection and preservation.

Another possibility is electronic records management. The (a) is born-digital records, and the (b) could include such activities as classification, retention scheduling, meeting legislative requirements, metadata management, storage (in the short to mid-term), and security. In such scenarios, not all digital content need be kept permanently, and the outcome is not always long-term digital preservation for all content types.

AOR toolkit – Beyond the organisation

Digital librarians, managers of image libraries, in short anyone who holds digital content is probably eligible for inclusion in my admittedly very loose definition of a potential AOR Toolkit user. I would like to think the toolkit could apply, not just to organisations, but also to individual projects both large and small. All the user has to do is to set the parameters. It might even be a way into understanding your own personal capability for “personal archiving”, i.e. ensuring the longevity of your own personal digital history, identity and collections in the form of digital images, documents, and social media presence. Use the AOR toolkit to assess your own PC and hard drive, in other words.

It remains to be seen if the AOR Toolkit can match any of my wide-eyed optimistic predictions, but at least for this new iteration we have attempted to expand the scope of the toolkit, and expanded the definitions of the elements, in order to bring it a step closer towards a more comprehensive, if not actually universally applicable, assessment tool.

AOR toolkit – addressing a community need?

Results from our recent training needs survey also indicate there is a general need for assessment in the context of digital preservation. In terms of suggestions made for subjects that are not currently being taught enough, some respondents explicitly identified the following requirements which indicate how assessment would help advance their case:

  • Self-assessment and audit
  • Assessment/criteria/decision in the context of RDM
  • Quality analysis as part of preservation planning and action
  • Benchmarking in digital preservation (i.e. what to do when unable to comply with OAIS)
  • Key performance indicators for digital preservation
  • What to check over time

In the same survey, when asked about “expected benefits of training”, an even more interesting response was forthcoming. There were 32 answers which I classified under strategy and planning, many of the responses indicating the need for assessment and analysis as a first step; and likewise, 21 answers alluding to the ability to implement a preservation system, with many references to “next steps” and understanding organisational “capacity”. One response in particular is worth quoting in full:

“We have recognised and assessed the problem, decided on a strategy and are nearing the purchase of a system to cope with what we currently have, but once this is done we will need to create two projects – one to address ongoing work and one to resolve legacy work created by our stop-gap solution. I’d expect training to answer both these needs.”

All of the above is simply to reiterate what I said in March: “I hope to make the new AOR toolkit into something applicable to a wider range of digital content scenarios and services.”

Self-assessment as digital preservation training aid

I have always liked to encourage people to assess their organisation and its readiness to undertake digital preservation. It’s possible that AIDA and the new AOR Toolkit could continue to have a small part in this process.

Self-assessment in DPTP

We have incorporated exercises in self-assessment as digital preservation training aid in the DPTP course for many years. We don’t do it much lately, but we used to get students to map themselves against the OAIS Reference Model. The idea was they could identify gaps in the Functional Entities, information package creation, and who their Producers / Consumers were. We would ask them to draw it up as a flipchart sketch, using dotted lines to express missing elements or gaps.

Another exercise was to ask students to make an informed guess as to where their organisation would sit on the Five Organisational Stages model proposed by Anne Kenney and Nancy McGovern. The most common response we usually had was Stage 1 “Acknowledge” or Stage 2 “Act”. We also asked which leg of their three-legged stool (Organisation, Technology, or Resources) was shortest or longest. The most memorable response we ever had to the stool exercise produced a drawing by one student of an upholstered Queen Anne chair.

Other self-assessment models we have introduced to our classes include:

  • The NDSA Levels of Digital Preservation, which is good because it’s so compacted and easy to understand. Admittedly, in the version we were talking about, it only assessed the workings of a repository (not a whole organisational setup) and focussed on technological capability like checksums and storage. This may change if the recent proposal, to add a row for “Access”, goes forward.
  • The DP Capability Maturity Model. In this model we liked the very rich descriptions of what it’s like to be operating at one of the proposed five levels of success.
  • The DRAMBORA toolkit, which emphasises risk assessment of a repository.

We also tried to encourage students to look at using elements of the TRAC and TDR audit regime purely from a self-assessment viewpoint. These tools can be time-consuming and costly if you’re undergoing full audited certification, but there’s nothing to stop an organisation using them for their own gap analysis or self-assessment needs.

Matter of fact this line of thinking fed into the SPRUCE toolkit I worked on with Chris Fryer; together we created a useful and pragmatic assessment method. ULCC prepared the cut-down and simplified version of ISO 16363, by retaining only those requirements considered essential for the purposes of this project. The project added value by proposing systems assessment, product analysis, and user stories as part of the process. My 2013 blog post alludes once again to the various assessment toolkits that can be found in the digital preservation landscape.

Review of self-assessment landscape

Are there too many toolkits, and are they really any good? Christoph Becker at the University of Toronto has been wondering that himself, and his team conducted a study on the assessment model landscape, which became a paper published at iPRES. His work in evaluating these assessment frameworks continues:

“Assessment models such as AIDA, DPCMM and others are very particular artifacts, and there are methodologies to design, apply and evaluate such models effectively and rigorously. Substantial knowledge and specific methodology from Information Systems research provides a foundation for the effective design, application and evaluation of frameworks such as AIDA.

“We have just completed an in-depth review of the state of the art of assessment frameworks in Digital Preservation. The article is currently under review; a much more informal initial overview was presented at IPRES (Emily Maemura, Nathan Moles, Christoph Becker. A Survey of Organizational Assessment Frameworks in Digital Preservation. In: International Conference on Digital Preservation (IPRES 2015), November 2015, Chapel Hill.)

“We also recently completed a detailed investigation that leveraged the foundations mentioned above to analyze AIDA and the DPCMM in detail from both theory and practice in two real organizations: The University of Toronto Libraries, and the Austrian State Archives (i.e. we conducted four assessments). We conducted these case studies not to evaluate the organizations, but instead, to evaluate the frameworks.

“We could now design a new assessment model from scratch, and that is our default plan. However, our work showed that (too) many models have already been designed. Most models have been designed with a focus on practice (which is good), but in very informal ways without rigorous design methods (which is not so good). Aside from a model, there’s also need for a tool, a method, guidance, and empirical evidence from real-world applications to be developed and shared. And then, since assessment is often geared toward improvement, the next question is how to support and demonstrate that improvement over time.”

AIDA’s new name: AOR Toolkit

The hardest part of any project is devising a name for the output. The second hardest thing is devising a name that can also be expressed as a memorable acronym.

I think one of the most successful instances I encountered was the CAMiLEON Project. This acronym unpacks into Creative Archiving at Michigan and Leeds Emulating the Old on the New. It brilliantly manages to include the names of both sponsoring Institutions, and accurately describes the work of the project, and still end up as a memorable one-word acronym. Even the word itself resembled “chameleon” of course, a certain lizard which the project quite naturally used as its logo. When you consider the project itself was about Emulation – a particular approach to digital preservation that involves “copying” IT environments – then that emblem is strikingly apposite to the meaning of the work.

From AIDA to AOR toolkit

I realised that the new AIDA name and acronym could never possibly tick all those boxes. In February we put it out to the social media arena, offering prizes to anyone who could help us devise something suitable. The dilemma was expressed here. Meanwhile I tried making use of various online acronym generation tools, and found myself getting into an even worse mess of linguistic spaghetti.

In the end I decided to abandon acronyms, and instead settled for:

The Assessing Organisational Readiness (AOR) Toolkit

Acceptable abbreviations of this name would include AOR or AORT. AOR is an acronym already – it can mean “Album-Oriented Rock” or “Area Of Responsibility”. The second one is not entirely unsuitable for this toolkit.

Rationale for AOR toolkit:

  1. This is simpler and shorter than Assessing Organisational Readiness for Managing Digital Content or similar
  2. It captures the three most important functions of the toolkit (the “digital” side of it is almost irrelevant, you could say)
  3. It includes “readiness”, which the old AIDA missed, and which is central to the toolkit
  4. It allows users to make other interpretations of what “managing digital content” means to them (e.g. it could mean preservation, but it could also mean providing access), without closing off these meanings

I do wonder though if “cute” project acronyms have had their day now. When I was doing web-archiving for the JISC, almost every project had one around 2006-2007, and we ended up with rather forced constructions such as this one.

From AIDA to CARDIO

AIDA had a part to play in the creation of another assessment toolkit, CARDIO. This project is owned and operated by HATII at the University of Glasgow, and Joy Davidson of the Digital Curation Centre was the architect behind the toolkit.om AIDA to CARDIO

CARDIO (Collaborative Assessment of Research Data Infrastructure and Objectives) is targeted at Research Data Management (RDM), and digital outputs associated with research – be they publications or data. The processes for the management of these digital assets has been a concern with HE Institutions in the UK for some time now. CARDIO will measure an Institution’s capacity and preparedness for doing RDM.

If you’ve been following our blog posts on this subject, you’ll recognise overlap here with AIDA. But where AIDA was assessing a potentially very wide range of digital asset types, CARDIO was far more focussed and specific. As such, there was a very real need in our project to understand the audience, the environment, and the context of research in higher education. It was targeted at three very specific users in this milieu: the Data Liaison Officer, the Data Originator, and the Service Provider. For more detail, see the CARDIO website.

I worked with Joy in 2011-2012 to contribute an AIDA-like framework to her new assessment tool. The finished product ended up as webforms, designed by developers at HATII, but ULCC supplied the underlying grid and the text of the assessments. The basic structure of three legs and numbered elements survived, but the subjects had to change, and the wording had to change. For instance, new elements we devised specific for this task included “Sharing of Research Data / Access to Research Data” and “Preservation and Continuity of Research Data”.

The actual reworking was done by ULCC with a team of volunteers, who received small payments from a project underspend. Fortunately these 12 volunteers were all experts in just the right fields – data management, academic research, digital preservation, copyright, and other appropriate subjects.
I could give you a long report of their insightful comments and helpful suggestions, which show how AIDA was reformed and reshaped into CARDIO. Some reviewers rethought the actual target of the assessment statements; others were strong on technical aspects. Some highlighted “jargon alerts”. Through this work, we improved the consistency of the meaning of the five stages across the three legs, and we added many details that are directly relevant to the HE community and to managing research data.

Benefits of CARDIO

Since its launch, CARDIO is now frequently used as a first step by UK Institutions who are embarking on a programme of managing research data. They use CARDIO to assess their institutional capability for RDM.

I’ll end with one very insightful paragraph from a reviewer which shows a detailed grasp of how an organisational assessment like AIDA and CARDIO can work:

“Processes, workflows, and policy grow more well-defined and rigid all the way up to stage 4, which represents a well-honed system suited to the internal needs of the repository. From that point onward, the progression to stage 5 is one of outward growth, with processes and workflows becoming more fluid to meet the needs of possible interoperating partners/collaborators. I generally do not see this “softening” in the 5 stages of CARDIO – rather, the 5th stage often represents things being fixed in place by legislation, a position that can become quite limiting if the repository’s (or stake holders’) needs change in the future.”

The AIDA toolkit: use cases

There are a few isolated uses of the old AIDA Toolkit. In this blog post I will try and recount some of these AIDA toolkit use cases.

In the beginning…

In its first phase, I was aided greatly in 2009 by five UK HE Institutions who volunteered to act as guinea pigs and do test runs, but this was mainly to help me improve the structure and the wording. However, Sarah Jones of HATII was very positive about its potential in 2010.

“AIDA is a very useful for seeing where your strengths and weaknesses lie. The results could provide a benchmark too, so if you go on to make some changes you can measure their effects…AIDA sounds particularly useful for your context too as this is about institutional readiness and assessing where strengths and weaknesses lie to determine areas for investment.”

I also used AIDA as part of consultancy for a digital preservation strategy, working with the digital archivist at Diageo in 2012; they said

“We agree that the AIDA assessment would be worthwhile doing as it will give us a good idea of where we are in terms of readiness and the areas we need to focus on to enable the implementation of a digital preservation strategy and system.”

Sarah Makinson of SOAS also undertook an AIDA assessment.

Further down the line…

Between 2011 and 2015, the toolkit was published and made available for download on a Jisc-hosted project website. During that time various uses were made of AIDA by an international audience:

Natalya Kusel used it for benchmarking collection care; she had

“been looking for some free self-assessment tools that I can use for benchmarking the current ‘health’ of collections care. I’m looking for something that will help me identify how the firm currently manages digital assets that have a long retention period so I can identify risks and plan for improvement.”

Anthony Smith used it as a teaching aid for part of UNESCO’s Intergovernmental Oceanographic Data Exchange sponsored teaching programme.

Kelcy Shepherd of Amherst College used it in her workshops.

“Coincidentally, the Five Colleges, a consortium I’m involved in, used the Toolkit a few years ago. Each institution completed the survey to ascertain levels of readiness at the various institutions, and determine areas where it would make sense to collaborate. This helped us identify some concrete steps that we could take together as a consortium.”

Walter D Ray, the Political Papers archivist at Southern Illinois University, used it to assess his library’s readiness:

“I’m glad to see work is being done on the AIDA toolkit. We used it for our self-assessment and found it helpful. As my boss, Director of Special Collections Pam Hackbart-Dean says, “the digital readiness assessment was a useful tool in helping give us direction.” I would add that it helped us define the issues we needed to confront.

“Since then we have developed some policies and procedures, revised our Deed of Gift form, set up a digital forensics workstation, and put a process in place to handle digital projects coming from elsewhere on campus. We greatly appreciate the work you’ve done on the AIDA toolkit.”

However, on the less positive side, Nathan Moles and Christoph Becker of University of Toronto studied AIDA as part of their “in-depth review of the state of the art of assessment frameworks in Digital Preservation.” Their survey of the landscape indicates the following:

“Our work showed that (too) many models have already been designed. Most models have been designed with a focus on practice (which is good), but in very informal ways without rigorous design methods (which is not so good). Aside from a model, there’s also need for a tool, a method, guidance, and empirical evidence from real-world applications to be developed and shared.”

AIDA in particular was found wanting:

“I think AIDA provides an interesting basis to start, but also currently has some shortcomings that we would need to see addressed to ensure that the resulting insights are well-founded. Most importantly, the fundamental concepts and constructs used in the model are currently unclear and would benefit from being set on a clear conceptual foundation.”

These stories show that AIDA had more of a shelf-life and more application than I originally expected. Our hope is that the new AOR Toolkit will give the ideas a new lease of life and continue to be of practical help to some in performing assessments.

Reworking AIDA: Storage

In the fourth of our series of posts on reworking the AIDA self-assessment toolkit, we look at a technical element – Managed Storage.

Reworking AIDA Storage

In reworking the toolkit, we are now looking at the 11th Technology Element. In the “old” AIDA, this was called “Institutional Repository”, and it pretty much assessed whether the University had an Institutional Repository (IR) system and the degree to which it had been successfully implemented, and was being used.

For the 2009 audience, and given the scope of what AIDA was about, an IR was probably just the right thing to assess. In 2009, Institutional Repository software was the new thing and a lot of UK HE & FE institutions were embracing it enthusiastically. Of course your basic IR doesn’t really do storage by itself; certainly it enables sharing of resources, it does managed access, perhaps some automated metadata creation, and allows remote submission of content. An IR system such as EPrints can be used as an interface to storage – as a matter of fact it has a built-in function called “Storage Manager” – but it isn’t a tool for configuring the servers where content is stored.

Storage in 2016

In 2016, a few things occurred to me thinking about the storage topic.

  1. I doubt I shall ever understand everything to do with storage of digital content, but since working on the original AIDA my understanding has improved somewhat. I now know that it is at least technically possible to configure IT storage in ways that match the expected usage of the content. Personally, I’m particularly interested in such configuration for long-term preservation purposes.
  2. I’m also aware that it’s possible for a sysadmin – or even a digital archivist – to operate some kind of interface with the storage server, using for instance an application like “storage manager”, that might enable them to choose suitable destinations for digital content.
  3. Backup is not the same as storage.
  4. Checksums are an essential part of validating the integrity of stored digital objects.

I have thus widened the scope of Element TECH 11 so that we can assess more than the limited workings of an IR. I also went back to two other related elements in the TECH leg, and attempted to enrich them.

To address (1), the capability that is being assessed is not just whether your organisation has a server room or network storage, but rather if you have identified your storage needs correctly and have configured the right kind of storage to keep your digital content (and deliver it to users). We might add this capability is nothing to do with the quantity, number, or size of your digital materials.

To assess (2), we’ve identified the requirement for an application or mechanism that helps put things into storage, take them out again, and assist with access while they are in storage. We could add that this interface mechanism is not doing the same job as metadata, capability for which is assessed elsewhere.

To address (3), I went back to TECH 03 and changed its name from “Ensuring Availability” to “Ensuring Availability / Backing Up”. The element description was then improved with more detailed descriptions concerning backup actions; we’re trying to describe the optimum backup scenario, based on actual organisational needs; and provide caveats for when multiple copies can cause syncing problems. Work done on the CARDIO toolkit was very useful here.

To incorporate (4), I thought it best to include checksums in element TECH 04, “Integrity of Information”. Checksum creation and validation is now explicitly suggested as one possible method to ensure integrity of digital content.

Managed storage as a whole is thus distributed among several measurable TECH elements in the new toolkit.

In this way I’m hoping to arrive at a measurable capability for managed storage that does not pre-empt the use the organisation wishes to make of such storage. The wording is such that even a digital preservation strategy could be assessed in the new toolkit – as could many other uses. If I can get this right, it would be an improvement on simply assessing the presence of an Institutional Repository.

Reworking AIDA: Legal Compliance

Today we’re looking briefly at legal obligations concerning management of your digital content.
The original AIDA had only one section on this, and it covered Copyright and IPR. These issues were important in 2009 and are still important today, especially in the context of research data management when academics need to be assured that attribution, intellectual property, and copyright are all being protected.

Legal Compliance – widening the scope

For the new toolkit, in keeping with my plan for a wider scope, I wanted to address additional legal concerns. The best solution seemed to be to add a new component to assess them.

What we’re assessing under Legal Compliance:

  1. Awareness of responsibility for legal compliance.
  2. The operation of mechanisms for controlling access to digital content, such as by licenses, redaction, closure, and release (which may be timed).
  3. Processes of review of digital content holdings, for identifying legal and compliance issues.

Legal Compliance – Awareness

The first one is probably the most important of the three. If nobody in the organisation is even aware of their own responsibilities, this can’t be good. My view would be that any effective information manager – archivist, librarian, records manager – is probably handling digital content with potential legal concerns regarding its access, and has a duty of care. But a good organisation will share these responsibilities, and embeds awareness into every role.

Legal Compliance – Mechanisms & Procedures

Secondly, we’d assess whether the organisation has any means (policies, procedures, forms) for controlling access and closure; and thirdly, whether there’s a review process that can seek out any legal concerns in certain digital collections.

Legislation regimes vary across the world, of course, and this makes it challenging to devise a model that is internationally applicable. The new version of the model name-checks specific acts in UK legislation, such as the Data Protection Act and Freedom of Information. On the other hand, other countries have their own versions of similar legislation; and copyright laws are widespread, even when they differ on detail and interpretation.

The value of the toolkit, if indeed it proves to have any, is not that we’re measuring an organisation’s specific point-by-point compliance with a certain Statute; rather, we’re assessing the high-level awareness of legal compliance, and what the organisation does to meet it.

Interestingly, the high-level application of legal protection across an organisation is something which can appear somewhat undeveloped in other assessment tools.

The ISO 16363 code of practice refers to copyright implications, intellectual property and other legal restrictions on use only in the context of compiling good Content Information and Preservation Description Information.

The expectation is that “An Archive will honor all applicable legal restrictions. These issues occur when the OAIS acts as a custodian. An OAIS should understand the intellectual property rights concepts, such as copyrights and any other applicable laws prior to accepting copyrighted materials into the OAIS. It can establish guidelines for ingestion of information and rules for dissemination and duplication of the information when necessary. It is beyond the scope of this document to provide details of national and international copyright laws.”

Personally I’ve always been disappointed by the lack of engagement implied here. To be fair though, the Code does cite many strong examples of “Access Rights” metadata, when it describes instances of what exemplary “Preservation Description Information” should look like for Digital Library Collections.

The DPCMM maturity model likewise doesn’t see fit to assess legal compliance as a separate entity, and it is not singled out as one of its 15 elements. However, the concept of “ensuring long‐term access to digital content that has legal, regulatory, business, and cultural memory value” is embedded in the model.

Updating the AIDA toolkit

This week, I have been mostly reworking and reviewing the ULCC AIDA toolkit. We’re planning to relaunch it later this year, with a new name, new scope, and new scorecard.

AIDA toolkit – a short history

The AIDA acronym stands for “Assessing Institutional Digital Assets”. Kevin Ashley and myself completed this JISC-funded project in 2009, and the idea was it could be used by any University – i.e. an Institution – to assess its own capability for managing digital assets.

At the time, AIDA was certainly intended for an HE/FE audience; and that’s reflected in the “Institutional” part of the name, and the type of digital content in scope. Content likely to have been familiar to anyone working in HE – digital libraries, research publications, digital datasets. As a matter of fact, AIDA was pressed into action as a toolkit directly relevant to the needs of Managing Research Data, as is shown by its reworking in 2011 into the CARDIO Toolkit.

I gather CARDIO, under the auspices of Joy Davidson, HATII and the DCC, has since been quite successful and its take-up among UK Institutions to measure or benchmark their own preparedness for Research Data Management perhaps indicates we were doing something right.

A new AIDA toolkit for 2016

My plan is to open up the AIDA toolkit so that it can be used by more people, apply to more content, and operate on a wider basis. In particular, I want it to apply to:

  • Not just Universities, but any Organisation that has digital content
  • Not just research / library content, but almost anything digital (the term “Digital Assets” always seemed vague to me; where the term “Digital Asset Management” is in fact something very specific and may refer to particular platforms and software)
  • Not just repository managers, but also archivists, records managers, and librarians working with digital content.

I’m also going to be adding a simpler scorecard element; we had one for AIDA before, but it got a little too “clever” with its elaborate weighted scores.

Readers may legitimately wonder if the community really “needs” another self-assessment tool; we teach several of the known models on our Digital Preservation Training Programme, including the use of the TRAC framework for self-assessment purposes; and since doing AIDA, the excellent DPCMM has become available, and indeed the latter has influenced my thinking. The new AIDA toolkit will continue to be a free download, though, and we’re aiming to retain its overall simplicity, which we believe is one of its strengths.

A new acronym

As part of this plan, I’m keen to bring out and highlight the “Capability” and “Management” parts of the AIDA toolkit, factors which have been slightly obscured by its current name and acronym. With this in mind, I need a new name and a new acronym. The elements that must be included in the title are:

  • Assessing or Benchmarking
  • Organisational
  • Capacity or Readiness [for]
  • Management [of]
  • Digital Content

I’ve already tried feeding these combinations through various online acronym generators, and come up empty. Hence we would like to invite the wider digital preservation community & use the power of crowd-sourcing to collect suggestions & ideas. Simply comment below or tweet us at @dart_ulcc and use the #AIDAthatsnotmyname hashtag. Naturally, the winner(s) of this little crowd-sourcing contest will receive written credit in the final relaunched AIDA toolkit.