March 2016 – Ed The Archivist

Reworking AIDA: Storage

In the fourth of our series of posts on reworking the AIDA self-assessment toolkit, we look at a technical element – Managed Storage.

Reworking AIDA Storage

In reworking the toolkit, we are now looking at the 11th Technology Element. In the “old” AIDA, this was called “Institutional Repository”, and it pretty much assessed whether the University had an Institutional Repository (IR) system and the degree to which it had been successfully implemented, and was being used.

For the 2009 audience, and given the scope of what AIDA was about, an IR was probably just the right thing to assess. In 2009, Institutional Repository software was the new thing and a lot of UK HE & FE institutions were embracing it enthusiastically. Of course your basic IR doesn’t really do storage by itself; certainly it enables sharing of resources, it does managed access, perhaps some automated metadata creation, and allows remote submission of content. An IR system such as EPrints can be used as an interface to storage – as a matter of fact it has a built-in function called “Storage Manager” – but it isn’t a tool for configuring the servers where content is stored.

Storage in 2016

In 2016, a few things occurred to me thinking about the storage topic.

I doubt I shall ever understand everything to do with storage of digital content, but since working on the original AIDA my understanding has improved somewhat. I now know that it is at least technically possible to configure IT storage in ways that match the expected usage of the content. Personally, I’m particularly interested in such configuration for long-term preservation purposes.
I’m also aware that it’s possible for a sysadmin – or even a digital archivist – to operate some kind of interface with the storage server, using for instance an application like “storage manager”, that might enable them to choose suitable destinations for digital content.
Backup is not the same as storage.
Checksums are an essential part of validating the integrity of stored digital objects.

I have thus widened the scope of Element TECH 11 so that we can assess more than the limited workings of an IR. I also went back to two other related elements in the TECH leg, and attempted to enrich them.

To address (1), the capability that is being assessed is not just whether your organisation has a server room or network storage, but rather if you have identified your storage needs correctly and have configured the right kind of storage to keep your digital content (and deliver it to users). We might add this capability is nothing to do with the quantity, number, or size of your digital materials.

To assess (2), we’ve identified the requirement for an application or mechanism that helps put things into storage, take them out again, and assist with access while they are in storage. We could add that this interface mechanism is not doing the same job as metadata, capability for which is assessed elsewhere.

To address (3), I went back to TECH 03 and changed its name from “Ensuring Availability” to “Ensuring Availability / Backing Up”. The element description was then improved with more detailed descriptions concerning backup actions; we’re trying to describe the optimum backup scenario, based on actual organisational needs; and provide caveats for when multiple copies can cause syncing problems. Work done on the CARDIO toolkit was very useful here.

To incorporate (4), I thought it best to include checksums in element TECH 04, “Integrity of Information”. Checksum creation and validation is now explicitly suggested as one possible method to ensure integrity of digital content.

Managed storage as a whole is thus distributed among several measurable TECH elements in the new toolkit.

In this way I’m hoping to arrive at a measurable capability for managed storage that does not pre-empt the use the organisation wishes to make of such storage. The wording is such that even a digital preservation strategy could be assessed in the new toolkit – as could many other uses. If I can get this right, it would be an improvement on simply assessing the presence of an Institutional Repository.

Reworking AIDA: Legal Compliance

Today we’re looking briefly at legal obligations concerning management of your digital content.
The original AIDA had only one section on this, and it covered Copyright and IPR. These issues were important in 2009 and are still important today, especially in the context of research data management when academics need to be assured that attribution, intellectual property, and copyright are all being protected.

Legal Compliance – widening the scope

For the new toolkit, in keeping with my plan for a wider scope, I wanted to address additional legal concerns. The best solution seemed to be to add a new component to assess them.

What we’re assessing under Legal Compliance:

Awareness of responsibility for legal compliance.
The operation of mechanisms for controlling access to digital content, such as by licenses, redaction, closure, and release (which may be timed).
Processes of review of digital content holdings, for identifying legal and compliance issues.

Legal Compliance – Awareness

The first one is probably the most important of the three. If nobody in the organisation is even aware of their own responsibilities, this can’t be good. My view would be that any effective information manager – archivist, librarian, records manager – is probably handling digital content with potential legal concerns regarding its access, and has a duty of care. But a good organisation will share these responsibilities, and embeds awareness into every role.

Legal Compliance – Mechanisms & Procedures

Secondly, we’d assess whether the organisation has any means (policies, procedures, forms) for controlling access and closure; and thirdly, whether there’s a review process that can seek out any legal concerns in certain digital collections.

Legislation regimes vary across the world, of course, and this makes it challenging to devise a model that is internationally applicable. The new version of the model name-checks specific acts in UK legislation, such as the Data Protection Act and Freedom of Information. On the other hand, other countries have their own versions of similar legislation; and copyright laws are widespread, even when they differ on detail and interpretation.

The value of the toolkit, if indeed it proves to have any, is not that we’re measuring an organisation’s specific point-by-point compliance with a certain Statute; rather, we’re assessing the high-level awareness of legal compliance, and what the organisation does to meet it.

Interestingly, the high-level application of legal protection across an organisation is something which can appear somewhat undeveloped in other assessment tools.

The ISO 16363 code of practice refers to copyright implications, intellectual property and other legal restrictions on use only in the context of compiling good Content Information and Preservation Description Information.

The expectation is that “An Archive will honor all applicable legal restrictions. These issues occur when the OAIS acts as a custodian. An OAIS should understand the intellectual property rights concepts, such as copyrights and any other applicable laws prior to accepting copyrighted materials into the OAIS. It can establish guidelines for ingestion of information and rules for dissemination and duplication of the information when necessary. It is beyond the scope of this document to provide details of national and international copyright laws.”

Personally I’ve always been disappointed by the lack of engagement implied here. To be fair though, the Code does cite many strong examples of “Access Rights” metadata, when it describes instances of what exemplary “Preservation Description Information” should look like for Digital Library Collections.

The DPCMM maturity model likewise doesn’t see fit to assess legal compliance as a separate entity, and it is not singled out as one of its 15 elements. However, the concept of “ensuring long‐term access to digital content that has legal, regulatory, business, and cultural memory value” is embedded in the model.

Reworking the AIDA toolkit: why we added new sections to cover Depositors and Users

Why are we reworking the AIDA toolkit?

The previous AIDA toolkit covered digital content in an HE & FE environment. As such, it made a few basic assumptions about usage; one assessment element was not really about the users at all, but about the Institutional capability for measuring use of resources. To put it another way, an Institution might be maintaining a useless collection of material that nobody looks at (at some cost). What mechanism do you have to monitor and measure use of assets?

That is useful, but also limited. For the new toolkit, I wanted to open up the whole question of usage, and base the assessment on a much wider interpretation of the “designated user community”. This catch-all term seems to have come our way via the OAIS reference model, but it seems to have caught on in the community. As I would have it, it should mean:

Anyone who views, reads and uses digital material.
They do it for many purposes and in many situations –I would like user scenarios to include internal staff looking at born-digital records in an EDRMS, or readers downloading ebooks, or photographers browsing a digital image gallery, or researchers running an app on a dataset.

To understand these needs, and meet them with appropriate mechanisms, ought to be what any self-respecting digital content service is about.

Measuring organisational commitment to users

I thought about how I could turn that organisational commitment into a measurable, assessable thing, and came up with four areas of benchmarking:

Creating access copies of digital content, and providing a suitable technological platform to play them on
Monitoring and measuring user engagement with digital content, including feedback
Evaluation of the user base to identify their needs
Some mechanism whereby they relate the user experience to the actual digital content. User evaluation will be an indicator here.

This includes the original AIDA element, but adds more to it. I’d like to think a lot of services can recognise their user community provision in the above.

After that, I thought about the other side of the coin – the people who create and deposit the material with our service in the first place. Why not add a new element to benchmark this?

Measuring organisational commitment to depositors

The OAIS reference model doesn’t have a collective term for these people, but it calls them “Producers”, a piece of jargon I have never much cared for. We decided to stick with “Depositors” for this new element; I’m more interested in the fact that they are transferring content to us, whether or not they actually “produced” it. As I would have it, a Depositor means:

Anyone who is a content creator, submitter, or donor, putting digital material into your care.
Again, they do it in many situations: external depositors may donate collections to an archive; internal users may transfer their department’s born-digital records to an organisational record-keeping system; researchers may deposit publications, or datasets, in a repository.

When trying to benchmark this, it occurred to me there’s a two-way obligation going on in this transfer situation; we have to do stuff, and so do the depositors. We don’t have to be specific about these obligations in the toolkit; just assess whether they are understood, and supported.

In reworking the toolkit, I came up with the following assessable things:

Whether obligations are understood, both by depositors and the staff administering deposits
Whether there are mechanisms in place for allowing transfer and deposit
Whether these mechanisms are governed by formal procedures
Whether these mechanisms are supported by documents and forms, and a good record-keeping method

For both Users and Depositors, there will of course be legal dimensions that underpin access, and which may even impact on transfer methods. However, these legal aspects are catered for in two other benchmarking elements, which will be the subject of another blog post.

Conclusion

With these two new elements, I have fed in information and experience gained from teaching the DPTP, and from my consultancy work; I hope to make the new AIDA into something applicable to a wider range of digital content scenarios and services.