Ed The Archivist – Page 3 – Digital Preservation and Archives

Things to consider before undertaking a digitisation project

Counter-intuitive as it may seem, this blog post will try and advance the idea that embarking on a project to digitise your paper collections isn’t always a great idea. This isn’t to say you should abandon the idea completely, but we would encourage you to think it through. You could read this post as a sort of cautionary tale.

The Harvard report Selecting Research Collections for Digitization proposes a number of very sound reasons for why an HFE Institution should pause before it commits resource to any large and complex digitisation project. They provide the reader with a series of questions that will help a good project planner steer a way through the decision process.

Among the reasons identified by these experts, I will single out two of my favourite themes:

Is anyone even interested?

Look at the material you’re intending to digitise. Does it have any value? Do you think readers, users, researchers and customers are going to be interested in it? Even if they are interested, why does it improve the situation for them to access it in digital form? Will usage of the material increase? If you increase access to thousands more people around the world who look at the material through your online catalogue, is that a genuine improvement? Why?

The answers to these questions may seem to be obvious to you, but this line of thinking also can expose some of our assumptions and pre-conceived ideas about our relationship with our audience, and the real value of serving content digitally.

We might assume a collection is going to be popular when it isn’t. We might assume that simply scanning a book and putting images of the pages online is all we need to do. Have we even asked the readers what they would like?

Can you go on supporting it?

This is about the very real problem of ongoing costs. We may assume that once all the scans are produced, the project budget can be closed. In fact, it continues to cost you money to store, support, manage and steward your digitised collections; and that’s leaving aside the cost of long-term preservation, should you realise there’s permanent value in the digital material you have created. In short, it may cost more than you think.

My former colleague Patricia Sleeman did a survey of a number of HFE Institutions in 2009 who had received JISC funding to carry out digitisation projects over the previous decade. She found:

“Four principal themes surfaced through analysis of the preservation plans of the digitisation projects that relate the maturity of institution to the likely success of their digitisation efforts. These are the need for preservation policies; collection management procedures; robust preservation infrastructures; and sustainability. In short, institutions or consortia which have clarity in these four areas considerably reduce the risks associated with long term access to digitized collections.”

Both of these reports may have been aimed primarily at HFE audiences in a research context, but I think the lessons apply to any organisations, including those in the commercial sector who intend to digitise content.

You’re considering spending a lot of money on digitising this collection, and potentially committing the resources of people, technology, and time. If you proceed with the project on overly-optimistic assumptions, it can lead to difficulties in the future.

However, don’t let this discourage you…

When you’ve decided to say “yes”

The benefits of doing digitisation have probably occurred to you already (saves wear and tear on originals, disseminates more content to a wider audience, benefits the organisation, may help with income generation…). I also like to encourage project managers to rethink, if possible, what the collection’s potential is for engaging with its intended audience. Are we happy to continue the traditional model of the searcher visiting the searchroom and looking at a box of photographs with captions, only doing it in a “digital” manner? Wouldn’t we like to use web tools like page-turners and zoom devices to enhance and improve on the experience in some way?

The great thing is that if you’ve done scanning according to best practices, you can repurpose your resources (as Access Copies) in a myriad of ways, making the most of access technologies. You’re now opening the doors for a potential dialogue with your user community, responding to changes in user needs and repurposing the way you serve your content. All your hard work will have paid off.

Priorities for business scanning

A business may decide to scan all their current paperwork, but this is not quite the same as a managed digitisation project.

Quite often a project like this is undertaken for a number of reasons: to save money, to improve efficiency, and to save space occupied by paper. The dream of the “paperless office” has been haunting us for about 30 years now. It still hasn’t come true, at least not in the way they promised us. I can personally recall a time when scanning bureaux appeared in the UK almost overnight, offering to convert the contents of 25 file cabinets into digital scans, and put them all onto a single CD ROM.

The prospect of doing this often appealed to senior executives, especially as the next logical step in their minds would be to get all that paper destroyed (a suggestion that usually causes an information manager to shudder).

How it differs from a traditional digitisation project

Which brings us to the next aspect that interests me. How long are we intending to keep these scans? A digitisation project for a library or archive collection will most likely result in digital content which we wish to preserve and keep permanently, because it’s both a valuable digital asset and a digital surrogate of an important part of our collections. However, when we take on “scanning for business”, as I call it, it’s possible the scans might have a relatively short shelf-life.

This is where it starts to shade into a records management concern. In fact my ideal would be to see a scanning project owned by the records manager, with one eye on user satisfaction, another on protecting business and legal needs, and a third eye on the possible long-term retention needs.

Taking all this into account, ideally we’d try and frame this project with a different emphasis to the concerns we have when doing digitisation for preservation purposes. Our list of priorities when scanning for business might look a bit like this:

Metadata

People need to find stuff again, and any automated retrieval system will only work if there’s sufficient metadata for the objects stored in it. We’d like to think about using pre-determined metadata schema, depending on the nature of the content; tags, folders, and naming rules that will help users retrieve content. My point here is that metadata decisions will tend to be driven by immediate user needs, rather than archival or library cataloguing standards.

Image quality

For a long-term preservation project, our first thought would be of high-resolution image files encoded in robust, open-source formats. For business scanning, it’s highly likely we might be able to compromise on the quality. If we can get away with lower-resolution images in compressed files, it’s worth considering. It may depend on whether the staff want OCR as well as images, which is yet another consideration.

Retention and disposal

Our plan for scanning must align with records management plans. The content is still maintained for as long as there’s a business need, just the same as when it was in paper form. Likewise, we’d hope staff co-operate with our recommended best practices for file naming and description, to assist with those retention decisions.

Authenticity

We’re all concerned with creating “authentic” digital objects, but the business need in this scenario might be slightly different to how an archivist or a researcher regards an authentic digital object. In the archives scenario, the archivist wants to be sure the preserved object is a genuine representation of the original, and so do their users. In the business scenario, we not only need to be assured of that, but we also want hard evidence that is the case, for when the auditors start asking questions. We’re thus facing at least two tough tasks – ensuring the scans themselves are authentic when they’re created, and then making sure we maintain that authenticity through daily use of the scans. We’d certainly want some form of evidence chain and audit trail for that.

From here, we’ve got the bare bones of a successful business scanning project. We might soon be in a better position to safely destroy paper originals, if indeed that was one of the drivers or project goals. That destruction needs to be carried out with due care and attention. You’d certainly want all the digital content signed off as regards authenticity to prove the admissibility of digital objects as legal documents.

If however you succeed in secure shredding of a large number of boxes of paper, you’ve now freed up storage space and shelf space. That is something that has a cash value. If you keep metrics of progress in this area, you’re ready to start proving the value of your project to the organisation.

Projects like this aren’t necessarily easier to carry out than a library/archive focussed digitisation project, and they still require much planning and engagement with stakeholders. As I’ve tried to show above, the priorities have a slightly different emphasis. However, the results can be something of genuine benefit to your organisation, and will prove the value of the Information Manager/Records Manager/Archivist roles and services.

Five benefits of a digitisation programme

We see digitisation as a form of project management, and any managed project needs to have at least three core things – costs, risks, and benefits. It’s important to think about the benefits that a digitisation programme will bring, and not just to you as a collection manager, but to your users, and to your organisation. Sometimes these benefits can be overlooked, or not considered and assessed in detail. In this post we’ll pick out some of the possible benefits digitisation can bring.

Saves originals

Archivists and librarians will recognise the scenario – there’s a precious irreplaceable resource, or one that is fragile (the paper may be crumbling), or it’s the only available copy in the country. What’s more it’s in constant demand, so subjected to frequent handling every time it’s retrieved from the stacks by the staff, then further handling in the searchroom. These precious documents and books don’t like being out in the light too often. Digitisation eliminates all the above risks and provides what, in the old analogue world, would have been called a “surrogate” copy.

Main beneficiaries: archivists, librarians

Meets user needs

This may seem obvious, but it’s still surprising how some digitisation projects still start and end with the collection manager’s decision, and don’t take the audience of users into account. There ought to be a formal process of assessing user needs at the start of a project, and the application of metrics to determine whether user needs have actually been met. This doesn’t always happen; digitisation decisions can be driven instead by internal staff meetings, advisory boards, or the recommendations of external consultants.

It might be more beneficial to consider user-centric methods and approaches like focus groups, customer surveys, online questionnaires, and statistics on searchroom use. A successful digitisation project aimed directly at satisfying a real user need can reap visible dividends for the organisation, in terms of visits, web page hits, raised profile, user satisfaction, and user engagement.

Main beneficiaries: users, the institution

Improves or enhances access

This is surely one of the main benefits of digitising any resource. If planned and executed correctly it can result in a string of related benefits for you and your organisation. Increased access through the web, reaching more users, and increasing not just the numbers but the diversity of your audience. But it’s not enough to just throw an existing image collection on the web in a gallery browser and let the power of the internet do the rest.

Collection managers should take the opportunity to rethink the potential of the resources, listen to user needs, and use technology to provide more imaginative ways to recast and enhance access to the content. There are possibilities for discovery metadata as well as cataloguing metadata, for navigational links that allow many entry points to a collection instead of a traditional hierarchical catalogue, and plug-in tools that can deliver popular and attractive ways to serve the content to users.

One of the most prominent of these is the page-turner and zoom tool device, so common with online books. These things are not merely gimmicks to be used for their own sake, but can offer your users more direct engagement with your collections. And we haven’t even mentioned crowd-sourcing yet…

Main beneficiaries: users, researchers

Saves space

This scenario is a bit of an outlier, and it’s primarily more of a records management/organisational change story (although other information management professionals may consider it too). The common motivator here is that the office is running out of space and that it would be convenient to scan all the current papers into digital form, and start “working digitally”. Managers who have this bright idea can immediately see a cost saving in terms of storage space, with visions of now-empty filing cabinets being removed from costly office space.

True, space saving can be a massive benefit – but people still have to find the materials. A project like this has to be managed very carefully and with a lot of preparation, especially giving due attention to metadata, which doesn’t automatically appear from nowhere when you take folders out of an organised filing system. And scanning is not cost-neutral either. Even so, if you can do this right, you’ll be contributing a genuine improvement to current working practice, and you will save money and space.

Main beneficiaries: staff, organisation, managers

A step towards digital preservation

The gain here is that the digitisation process can seriously lengthen the life of your valuable resources. Through digitisation, you could begin the process of long-term digital preservation. The scenario would be that you continue to keep the original analogue materials, but also keep the digitised version you have created; after all, it has cost you a lot of money to create it (staff time, server space), and its ongoing value to the organisation is already being demonstrated.

Treat the digitised resource with as much care and respect as you would your archival originals, and you’re on the road to digital preservation. As part of the project planning you would want to factor in the long-term preservation goal, before you even lift the lid of the scanner.

Main beneficiaries: archivist, institution

These are just five of the many benefits that a well-managed digitisation project can bring. Other topics would include income generation, pro-active user engagement, and attracting new customers to your offering. Understanding benefits (along with the costs of risks) is a positive way of understanding the digitisation task and delivering the project successfully.

6 reasons higher education should think about preserving research data

What does it mean?
Digital preservation is a way of planning a strategy for the long-term continuity and survival of important digital objects such as documents, images, other digital files, and outputs from research data. Although it’s often perceived as exclusively a technical matter, it does in fact go a lot further than an IT project.

It is more like a very specialist form of project management. Long-term preservation is extremely relevant for those working in higher education, particularly research data managers.

What sort of things would you want to preserve?
Broadly, there are two main objects in this context:

Published outputs from the project, such as reports and findings, probably peer-reviewed and maybe published in an academic journal
The raw data itself

Why should you care about it?
There are a number of compelling reasons why preserving research data is relevant to HE institutions just now. Here are just six:

1. Funders require it
Individual funders, the EPSRC, and UK Research Councils have now explicitly expressed digital preservation as a requirement. The expectation is that datasets will be preserved for at least 10 years after the project’s completion. This expectation is now becoming a condition of funding. To put this at its starkest, one might even say: “no preservation, no funding”.

2. Openness
There is a national and international drive towards openness and sharing of data. While this can be achieved with a well-managed institutional repository, the task is made easier through a digital preservation plan that includes a well-defined and persistent metadata set and adherence to descriptive standards.

3. Your institution cares about its research
The university will have an interest in showcasing certain high profile projects, or as examples of best practice. Preservation of these high-quality datasets will be useful in securing funding for new research projects.

4. Data owners care about their research
The academic who created the dataset will have a strong interest in preserving their own work; they will be aware of the value it has for others in their discipline. Keeping the dataset in a state of preservation will support their current and future research work. It is extremely expensive, and well-nigh impossible, to recreate that dataset from scratch.

5. Users of the data have an interest
Through the drive towards open datasets, increasingly we find that a dataset can be reused, repurposed, and cited in new research. This is another strong driver for preservation.

In all, datasets have a demonstrable meaning and value beyond the life of the project; and they will have a long-term value as the life of that dataset extends beyond the local concerns of the Institution and it passes into the collaborative research community.

6. Everchanging types of data
While some raw data can be static digital objects in an easily preservable format (e.g. word-processed files and spreadsheets), a lot of it is dynamic digital content which is harder to capture and harder to preserve.

It can commonly take the form of databases, but increasingly it also takes the form of blogs, websites, social media outputs, and other forms of dynamic data. Some data creates very large files, and this is a known problem for storage and for repositories, and for preservation.

Given the nature of these datasets – which can be complex, collaborative, large, and frequently updated – the time to start thinking about their preservation is at the start of the project, not at the end of the project’s life. This is just one of the problems which preservation can help to address.

Selection and Appraisal in the OAIS Model

Recently I attended the ARA Conference. On 31 August 2016 we heard three very useful presentations in the digital preservation strand from Matthew Addis of Arkivum, Sarah Higgins and Sally McInnes from Wales, and Mike Quinn from Preservica. I recall asking a question about the OAIS model, which was prompted by another question from a fellow archivist in the audience. I was asking something about the skills of selection and appraisal. Can the OAIS Model accommodate them? My worry is that it cannot, and that the Model tends to present an over-simplified view where the Submission Information Package (SIP) arrives in a “perfect state” all ready to preserve, and the process of transforming it into an Archival Information Package (AIP) can begin. Any archivist or records manager who’s ever handled a deposit or transfer of records will tell you that real life isn’t like that. As a result, the OAIS Model alienates the archivist.

I’m aware of those in our community who have advocated a stronger pre-ingest stage in OAIS. Some call it the “long tail” before Ingest. I believe there is a body of work underway to formalise the process as part of the standard: the Producer-Archive Interface Specification. And I’m aware of those contributions to the DPC OAIS wiki where suggestions are made for how to instigate it, and even automate it to some degree.

But that’s not quite what’s worrying me. Let’s get back to the basics of what we mean by Selection and Appraisal. I think these are very strong archivist skills, which could have tremendous value in the field of digital preservation.

The Record / Archive Series

When I worked as an archivist at the General Synod with paper records and paper archives, we would often appraise and select on a Series basis. What that means to me is that we could assess the value of the content in a contextual framework, based on other records which we knew were being created, or other archival series which we had already selected and kept in the archive. The collections strategy would be based on this approach, looking for a Series in the context of provenance. For instance, the originating body might be the Board for Social Responsibility (BSR); the record series could be “Minute Books”. We would always know to accept deposits of BSR Minutes, because we could trust these as being accurate records of the Board’s work. Likewise, if the BSR collected copies of another Board’s Minutes and Documents (e.g. The Central Board of Finance), we could apply a rule that excluded that series from accessioning, on the grounds that BSR were only receiving “copies for information”.

This process I’m describing is second nature to any archives or records management professional. An understanding of context, provenance, record series: all of these things help us identify the potential value of content. Indeed, a Series model is the foundation for all Archival arrangement, and is the cornerstone of our profession. It’s extremely efficient; it saves you from having to examine every single document.

Appraisal in OAIS

I wonder to myself how Series are expressed in the OAIS Model. I often think the Model is predicated to favour the individual digital object, rather than a record series. To put it another way, a Submission Information Package is not an ideal unit on which to carry out an appraisal. At which point you could tell me “here’s 100 related SIPs, there’s your record series”. Or “we’re putting all the PDFs of our Minutes into this single SIP”. But I would still worry. Through the basic action of ingesting a SIP, we’re starting a process where all subsequent preservation actions continue to centre around the individual digital object – checksums, file format identification, file format characterisation, technical metadata extraction, and preservation metadata. And of course, the temptation is strong to automate these AIP-building actions, which has led us into building scripts that are entirely focused on a single characteristic – most commonly, the file format.

Where’s the record / archival series in all this? It’s difficult to make it out. Maybe it gets reinstated or reconstructed at the point of cataloguing. Even so, it’s not hard to see why archivists can feel alienated by this view of what constitutes digital preservation. The integrity and contextual meaning of a collection is being overlooked, in favour of this atomised digital-object view. OAIS, if strictly interpreted, could bypass the Series altogether in favour of an assembly line workflow that simply processes one digital object after another.

I believe we need to rediscover the value of Appraisal and Selection; I call on all archivists to come forward and re-assert its importance in the digital realm.

In the meantime, some questions: Can anyone show me a way that Appraisal and Selection can truly be incorporated in an OAIS Model workflow? Is there room for considering a new “Series Information Package”, or something similar? Am I over-stressing the atomisation of OAIS?

IPTC Photo Metadata Conference

The DART team will attend the IPTC Photo Metadata conference in Zagreb on 26 May 2016. The theme is “Keep Metadata Alive and Intact”. Ed Pinsent will be speaking in the morning session, which is themed on “Strongly Attached Metadata, what you need to know”.

We think the Conference will allow us to speak to various image management experts, people and organisations who manage picture libraries, who may have an interest in IPTC metadata and the management of their collections with a Digital Asset Management System (DAMS).

Sarah Saunders of Electric Lane works with a lot of these professional image management people. When she came on our DPTP Course recently, she noticed a few things:

There’s more to preservation of image files (e.g. TIFFs or JPEGs) than most people think
Elements of a possible digital preservation repository / system, and its workflow, overlapped to some degree with what she understood about the production chain for images, and the place of the DAMS, which leads to…
The idea (which we tend to teach on DPTP) that a preservation system doesn’t have to be a single system, but rather could repurpose existing systems (or elements of them) to arrive at a whole that is OAIS-compliant; for instance, one system performing storage, one for access, one for ingest.
She liked our insistence on the management of technical metadata and other useful metadata embedded in files

IPTC Photo Metadata Conference – Our Talk

From talks with Sarah there evolved the notion that I might be able to deliver a presentation which expresses some of these messages specifically targeted at image management experts. With that in mind, I’ve tried to devise a blue-sky thinking slide show that covers the following:

One – Drivers: why this audience might be interested in applying digital preservation to their image collections.

Two – How to do it for image files, involving some simple overviews of migration and technical metadata extraction. While image files will have generic technical metadata, e.g concerning the size, resolution, and color of the image, there is also specialist metadata. Of especial interest to this audience, we think, will be the management of IPTC metadata and EXIF metadata.

These are two specialist types of metadata which by and large only apply to digital image files. Broadly, IPTC metadata can be used to protect rights and ownership of images; and EXIF metadata records details about the hardware (camera, scanner) that was used to create the image.

Interestingly, although it’s possible to embed these metadata in some formats (e.g. TIFF, JPEG, and JPEG 2000), neither metadata type is guaranteed to survive permanently – especially if the file is migrated.

There’s also descriptive metadata created by a curator to help describe and identify images – names, keywords, dates. Quite often this is part of a Digital Asset Management System, and will be exposed and published online to make the images more meaningful and accessible to an audience.

Is any of this metadata useful in the long term? I would argue that it is, and maybe we need to learn how to protect it better.

Preserving Digital Content – Taking first steps with the AOR toolkit

I have long had an interest in promoting digital preservation, and most recently with the the relaunch of the AIDA toolkit as AOR toolkit. However in my work I meet a lot of people in a lot of organisations, for whom “preservation” – in perhaps the traditional archival sense – isn’t necessarily their sole or principle interest.

AOR Toolkit – Possible uses

To speculate on this for a while, we could instead consider (a) what kind of digital content people typically have and (b) what do they want to do with it. It’s possible that the new AOR Toolkit can help as a first step to assessing your capability to perform (b).

One milieu that we’re most familiar with is higher education, where the (a) is born-digital research data and the (b) is research data management. This may in time shade into a long-term preservation need, but not exclusively. That said, the main driver for many to manage research data in an organised way is actually the requirement of the funders for the research data to be preserved.

Not unrelated to that example, repository managers may not be primarily interested in preservation, but they certainly have a need to manage (a) born-digital publications and research papers and (b) their metadata, storage, dissemination, and use, perhaps using a repository. On the other hand, as the content held in a publications repository over time starts to increase, the repository managers may need to become more interested in selection and preservation.

Another possibility is electronic records management. The (a) is born-digital records, and the (b) could include such activities as classification, retention scheduling, meeting legislative requirements, metadata management, storage (in the short to mid-term), and security. In such scenarios, not all digital content need be kept permanently, and the outcome is not always long-term digital preservation for all content types.

AOR toolkit – Beyond the organisation

Digital librarians, managers of image libraries, in short anyone who holds digital content is probably eligible for inclusion in my admittedly very loose definition of a potential AOR Toolkit user. I would like to think the toolkit could apply, not just to organisations, but also to individual projects both large and small. All the user has to do is to set the parameters. It might even be a way into understanding your own personal capability for “personal archiving”, i.e. ensuring the longevity of your own personal digital history, identity and collections in the form of digital images, documents, and social media presence. Use the AOR toolkit to assess your own PC and hard drive, in other words.

It remains to be seen if the AOR Toolkit can match any of my wide-eyed optimistic predictions, but at least for this new iteration we have attempted to expand the scope of the toolkit, and expanded the definitions of the elements, in order to bring it a step closer towards a more comprehensive, if not actually universally applicable, assessment tool.

AOR toolkit – addressing a community need?

Results from our recent training needs survey also indicate there is a general need for assessment in the context of digital preservation. In terms of suggestions made for subjects that are not currently being taught enough, some respondents explicitly identified the following requirements which indicate how assessment would help advance their case:

Self-assessment and audit
Assessment/criteria/decision in the context of RDM
Quality analysis as part of preservation planning and action
Benchmarking in digital preservation (i.e. what to do when unable to comply with OAIS)
Key performance indicators for digital preservation
What to check over time

In the same survey, when asked about “expected benefits of training”, an even more interesting response was forthcoming. There were 32 answers which I classified under strategy and planning, many of the responses indicating the need for assessment and analysis as a first step; and likewise, 21 answers alluding to the ability to implement a preservation system, with many references to “next steps” and understanding organisational “capacity”. One response in particular is worth quoting in full:

“We have recognised and assessed the problem, decided on a strategy and are nearing the purchase of a system to cope with what we currently have, but once this is done we will need to create two projects – one to address ongoing work and one to resolve legacy work created by our stop-gap solution. I’d expect training to answer both these needs.”

All of the above is simply to reiterate what I said in March: “I hope to make the new AOR toolkit into something applicable to a wider range of digital content scenarios and services.”

Self-assessment as digital preservation training aid

I have always liked to encourage people to assess their organisation and its readiness to undertake digital preservation. It’s possible that AIDA and the new AOR Toolkit could continue to have a small part in this process.

Self-assessment in DPTP

We have incorporated exercises in self-assessment as digital preservation training aid in the DPTP course for many years. We don’t do it much lately, but we used to get students to map themselves against the OAIS Reference Model. The idea was they could identify gaps in the Functional Entities, information package creation, and who their Producers / Consumers were. We would ask them to draw it up as a flipchart sketch, using dotted lines to express missing elements or gaps.

Another exercise was to ask students to make an informed guess as to where their organisation would sit on the Five Organisational Stages model proposed by Anne Kenney and Nancy McGovern. The most common response we usually had was Stage 1 “Acknowledge” or Stage 2 “Act”. We also asked which leg of their three-legged stool (Organisation, Technology, or Resources) was shortest or longest. The most memorable response we ever had to the stool exercise produced a drawing by one student of an upholstered Queen Anne chair.

Other self-assessment models we have introduced to our classes include:

The NDSA Levels of Digital Preservation, which is good because it’s so compacted and easy to understand. Admittedly, in the version we were talking about, it only assessed the workings of a repository (not a whole organisational setup) and focussed on technological capability like checksums and storage. This may change if the recent proposal, to add a row for “Access”, goes forward.
The DP Capability Maturity Model. In this model we liked the very rich descriptions of what it’s like to be operating at one of the proposed five levels of success.
The DRAMBORA toolkit, which emphasises risk assessment of a repository.

We also tried to encourage students to look at using elements of the TRAC and TDR audit regime purely from a self-assessment viewpoint. These tools can be time-consuming and costly if you’re undergoing full audited certification, but there’s nothing to stop an organisation using them for their own gap analysis or self-assessment needs.

Matter of fact this line of thinking fed into the SPRUCE toolkit I worked on with Chris Fryer; together we created a useful and pragmatic assessment method. ULCC prepared the cut-down and simplified version of ISO 16363, by retaining only those requirements considered essential for the purposes of this project. The project added value by proposing systems assessment, product analysis, and user stories as part of the process. My 2013 blog post alludes once again to the various assessment toolkits that can be found in the digital preservation landscape.

Review of self-assessment landscape

Are there too many toolkits, and are they really any good? Christoph Becker at the University of Toronto has been wondering that himself, and his team conducted a study on the assessment model landscape, which became a paper published at iPRES. His work in evaluating these assessment frameworks continues:

“Assessment models such as AIDA, DPCMM and others are very particular artifacts, and there are methodologies to design, apply and evaluate such models effectively and rigorously. Substantial knowledge and specific methodology from Information Systems research provides a foundation for the effective design, application and evaluation of frameworks such as AIDA.

“We have just completed an in-depth review of the state of the art of assessment frameworks in Digital Preservation. The article is currently under review; a much more informal initial overview was presented at IPRES (Emily Maemura, Nathan Moles, Christoph Becker. A Survey of Organizational Assessment Frameworks in Digital Preservation. In: International Conference on Digital Preservation (IPRES 2015), November 2015, Chapel Hill.)

“We also recently completed a detailed investigation that leveraged the foundations mentioned above to analyze AIDA and the DPCMM in detail from both theory and practice in two real organizations: The University of Toronto Libraries, and the Austrian State Archives (i.e. we conducted four assessments). We conducted these case studies not to evaluate the organizations, but instead, to evaluate the frameworks.

“We could now design a new assessment model from scratch, and that is our default plan. However, our work showed that (too) many models have already been designed. Most models have been designed with a focus on practice (which is good), but in very informal ways without rigorous design methods (which is not so good). Aside from a model, there’s also need for a tool, a method, guidance, and empirical evidence from real-world applications to be developed and shared. And then, since assessment is often geared toward improvement, the next question is how to support and demonstrate that improvement over time.”

AIDA’s new name: AOR Toolkit

The hardest part of any project is devising a name for the output. The second hardest thing is devising a name that can also be expressed as a memorable acronym.

I think one of the most successful instances I encountered was the CAMiLEON Project. This acronym unpacks into Creative Archiving at Michigan and Leeds Emulating the Old on the New. It brilliantly manages to include the names of both sponsoring Institutions, and accurately describes the work of the project, and still end up as a memorable one-word acronym. Even the word itself resembled “chameleon” of course, a certain lizard which the project quite naturally used as its logo. When you consider the project itself was about Emulation – a particular approach to digital preservation that involves “copying” IT environments – then that emblem is strikingly apposite to the meaning of the work.

From AIDA to AOR toolkit

I realised that the new AIDA name and acronym could never possibly tick all those boxes. In February we put it out to the social media arena, offering prizes to anyone who could help us devise something suitable. The dilemma was expressed here. Meanwhile I tried making use of various online acronym generation tools, and found myself getting into an even worse mess of linguistic spaghetti.

In the end I decided to abandon acronyms, and instead settled for:

The Assessing Organisational Readiness (AOR) Toolkit

Acceptable abbreviations of this name would include AOR or AORT. AOR is an acronym already – it can mean “Album-Oriented Rock” or “Area Of Responsibility”. The second one is not entirely unsuitable for this toolkit.

Rationale for AOR toolkit:

This is simpler and shorter than Assessing Organisational Readiness for Managing Digital Content or similar
It captures the three most important functions of the toolkit (the “digital” side of it is almost irrelevant, you could say)
It includes “readiness”, which the old AIDA missed, and which is central to the toolkit
It allows users to make other interpretations of what “managing digital content” means to them (e.g. it could mean preservation, but it could also mean providing access), without closing off these meanings

I do wonder though if “cute” project acronyms have had their day now. When I was doing web-archiving for the JISC, almost every project had one around 2006-2007, and we ended up with rather forced constructions such as this one.

From AIDA to CARDIO

AIDA had a part to play in the creation of another assessment toolkit, CARDIO. This project is owned and operated by HATII at the University of Glasgow, and Joy Davidson of the Digital Curation Centre was the architect behind the toolkit.om AIDA to CARDIO

CARDIO (Collaborative Assessment of Research Data Infrastructure and Objectives) is targeted at Research Data Management (RDM), and digital outputs associated with research – be they publications or data. The processes for the management of these digital assets has been a concern with HE Institutions in the UK for some time now. CARDIO will measure an Institution’s capacity and preparedness for doing RDM.

If you’ve been following our blog posts on this subject, you’ll recognise overlap here with AIDA. But where AIDA was assessing a potentially very wide range of digital asset types, CARDIO was far more focussed and specific. As such, there was a very real need in our project to understand the audience, the environment, and the context of research in higher education. It was targeted at three very specific users in this milieu: the Data Liaison Officer, the Data Originator, and the Service Provider. For more detail, see the CARDIO website.

I worked with Joy in 2011-2012 to contribute an AIDA-like framework to her new assessment tool. The finished product ended up as webforms, designed by developers at HATII, but ULCC supplied the underlying grid and the text of the assessments. The basic structure of three legs and numbered elements survived, but the subjects had to change, and the wording had to change. For instance, new elements we devised specific for this task included “Sharing of Research Data / Access to Research Data” and “Preservation and Continuity of Research Data”.

The actual reworking was done by ULCC with a team of volunteers, who received small payments from a project underspend. Fortunately these 12 volunteers were all experts in just the right fields – data management, academic research, digital preservation, copyright, and other appropriate subjects.
I could give you a long report of their insightful comments and helpful suggestions, which show how AIDA was reformed and reshaped into CARDIO. Some reviewers rethought the actual target of the assessment statements; others were strong on technical aspects. Some highlighted “jargon alerts”. Through this work, we improved the consistency of the meaning of the five stages across the three legs, and we added many details that are directly relevant to the HE community and to managing research data.

Benefits of CARDIO

Since its launch, CARDIO is now frequently used as a first step by UK Institutions who are embarking on a programme of managing research data. They use CARDIO to assess their institutional capability for RDM.

I’ll end with one very insightful paragraph from a reviewer which shows a detailed grasp of how an organisational assessment like AIDA and CARDIO can work:

“Processes, workflows, and policy grow more well-defined and rigid all the way up to stage 4, which represents a well-honed system suited to the internal needs of the repository. From that point onward, the progression to stage 5 is one of outward growth, with processes and workflows becoming more fluid to meet the needs of possible interoperating partners/collaborators. I generally do not see this “softening” in the 5 stages of CARDIO – rather, the 5th stage often represents things being fixed in place by legislation, a position that can become quite limiting if the repository’s (or stake holders’) needs change in the future.”