Research datasets: lockdown or snapshot?

In today’s blog post we’re going to talk about digital preservation planning in the context of research datasets. We’re planning a one-day course for research data managers, where we can help with making preservation planning decisions that intersect with and complement your research data management plan.

When we’re dealing with datasets of research materials, there’s often a question about when (and whether) it’s possible to “close” the dataset. The dataset is likely to be a cumulative entity, especially if it’s a database, continually accumulating new records and new entries. Is there ever a point at which the dataset is “finished”? If you ask a researcher, it’s likely they will say it’s an ongoing concern, and they would rather not have it taken away from them and put into an archive.

For the data manager wishing to protect and preserve this valuable data, there are two possibilities.

The first is to “lock down” the dataset

This would involve intervening at a suitable date or time, for instance at the completion of a project, and negotiating with the researcher and other stakeholders. If everyone can agree on a lockdown, it means that no further changes can be made to the dataset; no more new records added, and existing records cannot be changed.

A locked-down dataset is somewhat easier to manage in a digital preservation repository, especially if it’s not being requested for use very frequently. However, this approach doesn’t always match the needs of the institution, nor the researcher who created the content. This is where the second possibility comes into play.

The second possibility is to take “snapshots” of the dataset

This involves a capture action that involves abstracting records from the dataset, and preserving that as a “view” of the dataset at a particular moment in time. The dataset itself remains intact, and can continue being used for live data as needed: it can still be edited and updated.

Taking dataset snapshots is a more pragmatic way of managing and preserving important research data, while meeting the needs of the majority of stakeholders. However, it also requires more effort: a strategic approach, more planning, and a certain amount of technical capability. In terms of planning, it might be feasible to take snapshots of a large and frequently-updated dataset on a regular basis, e.g. every year or every six months; doing so will tend to create reliable, well-managed views of the data.

Another valid approach would be to align the snapshot with a particular piece of research

For instance, when a research paper is published, the snapshot of the dataset should reflect the basis on which the analysis in that paper was carried out. The dataset snapshot would then act as a strong affirmation of the validity of the dataset. This is a very good approach, but requires the data manager and archivist to have a detailed knowledge of the content, and more importantly the progress of the research cycle.

The ideal scenario would be to have your researcher on board with your preservation programme, and get them signed up to a process like this; at crucial junctures in their work, they could request snapshots of the dataset, or even be empowered to perform it themselves.

In terms of the technical capability for taking snapshots, it may be as simple as running an export script on a database, but it’s likely to be a more delicate and nuanced operation. The parameters of the export may have to be discussed and managed quite carefully.

Lastly we should add that these operations by themselves don’t constitute the entirety of digital preservation. They are both strategies to create an effective capture of a particular resource; but capture alone is not preservation.

That resource must pass into the preservation repository and undergo a series of preservation actions in order to be protected and usable in the future. There will be several variations on this scenario, as there are many ways of gathering and storing data. We know that institutions struggle with this area, and there is no single agreed “best practice.”

Preserving Digital Content – Taking first steps with the AOR toolkit

I have long had an interest in promoting digital preservation, and most recently with the the relaunch of the AIDA toolkit as AOR toolkit. However in my work I meet a lot of people in a lot of organisations, for whom “preservation” – in perhaps the traditional archival sense – isn’t necessarily their sole or principle interest.

AOR Toolkit – Possible uses

To speculate on this for a while, we could instead consider (a) what kind of digital content people typically have and (b) what do they want to do with it. It’s possible that the new AOR Toolkit can help as a first step to assessing your capability to perform (b).

One milieu that we’re most familiar with is higher education, where the (a) is born-digital research data and the (b) is research data management. This may in time shade into a long-term preservation need, but not exclusively. That said, the main driver for many to manage research data in an organised way is actually the requirement of the funders for the research data to be preserved.

Not unrelated to that example, repository managers may not be primarily interested in preservation, but they certainly have a need to manage (a) born-digital publications and research papers and (b) their metadata, storage, dissemination, and use, perhaps using a repository. On the other hand, as the content held in a publications repository over time starts to increase, the repository managers may need to become more interested in selection and preservation.

Another possibility is electronic records management. The (a) is born-digital records, and the (b) could include such activities as classification, retention scheduling, meeting legislative requirements, metadata management, storage (in the short to mid-term), and security. In such scenarios, not all digital content need be kept permanently, and the outcome is not always long-term digital preservation for all content types.

AOR toolkit – Beyond the organisation

Digital librarians, managers of image libraries, in short anyone who holds digital content is probably eligible for inclusion in my admittedly very loose definition of a potential AOR Toolkit user. I would like to think the toolkit could apply, not just to organisations, but also to individual projects both large and small. All the user has to do is to set the parameters. It might even be a way into understanding your own personal capability for “personal archiving”, i.e. ensuring the longevity of your own personal digital history, identity and collections in the form of digital images, documents, and social media presence. Use the AOR toolkit to assess your own PC and hard drive, in other words.

It remains to be seen if the AOR Toolkit can match any of my wide-eyed optimistic predictions, but at least for this new iteration we have attempted to expand the scope of the toolkit, and expanded the definitions of the elements, in order to bring it a step closer towards a more comprehensive, if not actually universally applicable, assessment tool.

AOR toolkit – addressing a community need?

Results from our recent training needs survey also indicate there is a general need for assessment in the context of digital preservation. In terms of suggestions made for subjects that are not currently being taught enough, some respondents explicitly identified the following requirements which indicate how assessment would help advance their case:

  • Self-assessment and audit
  • Assessment/criteria/decision in the context of RDM
  • Quality analysis as part of preservation planning and action
  • Benchmarking in digital preservation (i.e. what to do when unable to comply with OAIS)
  • Key performance indicators for digital preservation
  • What to check over time

In the same survey, when asked about “expected benefits of training”, an even more interesting response was forthcoming. There were 32 answers which I classified under strategy and planning, many of the responses indicating the need for assessment and analysis as a first step; and likewise, 21 answers alluding to the ability to implement a preservation system, with many references to “next steps” and understanding organisational “capacity”. One response in particular is worth quoting in full:

“We have recognised and assessed the problem, decided on a strategy and are nearing the purchase of a system to cope with what we currently have, but once this is done we will need to create two projects – one to address ongoing work and one to resolve legacy work created by our stop-gap solution. I’d expect training to answer both these needs.”

All of the above is simply to reiterate what I said in March: “I hope to make the new AOR toolkit into something applicable to a wider range of digital content scenarios and services.”

Self-assessment as digital preservation training aid

I have always liked to encourage people to assess their organisation and its readiness to undertake digital preservation. It’s possible that AIDA and the new AOR Toolkit could continue to have a small part in this process.

Self-assessment in DPTP

We have incorporated exercises in self-assessment as digital preservation training aid in the DPTP course for many years. We don’t do it much lately, but we used to get students to map themselves against the OAIS Reference Model. The idea was they could identify gaps in the Functional Entities, information package creation, and who their Producers / Consumers were. We would ask them to draw it up as a flipchart sketch, using dotted lines to express missing elements or gaps.

Another exercise was to ask students to make an informed guess as to where their organisation would sit on the Five Organisational Stages model proposed by Anne Kenney and Nancy McGovern. The most common response we usually had was Stage 1 “Acknowledge” or Stage 2 “Act”. We also asked which leg of their three-legged stool (Organisation, Technology, or Resources) was shortest or longest. The most memorable response we ever had to the stool exercise produced a drawing by one student of an upholstered Queen Anne chair.

Other self-assessment models we have introduced to our classes include:

  • The NDSA Levels of Digital Preservation, which is good because it’s so compacted and easy to understand. Admittedly, in the version we were talking about, it only assessed the workings of a repository (not a whole organisational setup) and focussed on technological capability like checksums and storage. This may change if the recent proposal, to add a row for “Access”, goes forward.
  • The DP Capability Maturity Model. In this model we liked the very rich descriptions of what it’s like to be operating at one of the proposed five levels of success.
  • The DRAMBORA toolkit, which emphasises risk assessment of a repository.

We also tried to encourage students to look at using elements of the TRAC and TDR audit regime purely from a self-assessment viewpoint. These tools can be time-consuming and costly if you’re undergoing full audited certification, but there’s nothing to stop an organisation using them for their own gap analysis or self-assessment needs.

Matter of fact this line of thinking fed into the SPRUCE toolkit I worked on with Chris Fryer; together we created a useful and pragmatic assessment method. ULCC prepared the cut-down and simplified version of ISO 16363, by retaining only those requirements considered essential for the purposes of this project. The project added value by proposing systems assessment, product analysis, and user stories as part of the process. My 2013 blog post alludes once again to the various assessment toolkits that can be found in the digital preservation landscape.

Review of self-assessment landscape

Are there too many toolkits, and are they really any good? Christoph Becker at the University of Toronto has been wondering that himself, and his team conducted a study on the assessment model landscape, which became a paper published at iPRES. His work in evaluating these assessment frameworks continues:

“Assessment models such as AIDA, DPCMM and others are very particular artifacts, and there are methodologies to design, apply and evaluate such models effectively and rigorously. Substantial knowledge and specific methodology from Information Systems research provides a foundation for the effective design, application and evaluation of frameworks such as AIDA.

“We have just completed an in-depth review of the state of the art of assessment frameworks in Digital Preservation. The article is currently under review; a much more informal initial overview was presented at IPRES (Emily Maemura, Nathan Moles, Christoph Becker. A Survey of Organizational Assessment Frameworks in Digital Preservation. In: International Conference on Digital Preservation (IPRES 2015), November 2015, Chapel Hill.)

“We also recently completed a detailed investigation that leveraged the foundations mentioned above to analyze AIDA and the DPCMM in detail from both theory and practice in two real organizations: The University of Toronto Libraries, and the Austrian State Archives (i.e. we conducted four assessments). We conducted these case studies not to evaluate the organizations, but instead, to evaluate the frameworks.

“We could now design a new assessment model from scratch, and that is our default plan. However, our work showed that (too) many models have already been designed. Most models have been designed with a focus on practice (which is good), but in very informal ways without rigorous design methods (which is not so good). Aside from a model, there’s also need for a tool, a method, guidance, and empirical evidence from real-world applications to be developed and shared. And then, since assessment is often geared toward improvement, the next question is how to support and demonstrate that improvement over time.”

The AIDA toolkit: use cases

There are a few isolated uses of the old AIDA Toolkit. In this blog post I will try and recount some of these AIDA toolkit use cases.

In the beginning…

In its first phase, I was aided greatly in 2009 by five UK HE Institutions who volunteered to act as guinea pigs and do test runs, but this was mainly to help me improve the structure and the wording. However, Sarah Jones of HATII was very positive about its potential in 2010.

“AIDA is a very useful for seeing where your strengths and weaknesses lie. The results could provide a benchmark too, so if you go on to make some changes you can measure their effects…AIDA sounds particularly useful for your context too as this is about institutional readiness and assessing where strengths and weaknesses lie to determine areas for investment.”

I also used AIDA as part of consultancy for a digital preservation strategy, working with the digital archivist at Diageo in 2012; they said

“We agree that the AIDA assessment would be worthwhile doing as it will give us a good idea of where we are in terms of readiness and the areas we need to focus on to enable the implementation of a digital preservation strategy and system.”

Sarah Makinson of SOAS also undertook an AIDA assessment.

Further down the line…

Between 2011 and 2015, the toolkit was published and made available for download on a Jisc-hosted project website. During that time various uses were made of AIDA by an international audience:

Natalya Kusel used it for benchmarking collection care; she had

“been looking for some free self-assessment tools that I can use for benchmarking the current ‘health’ of collections care. I’m looking for something that will help me identify how the firm currently manages digital assets that have a long retention period so I can identify risks and plan for improvement.”

Anthony Smith used it as a teaching aid for part of UNESCO’s Intergovernmental Oceanographic Data Exchange sponsored teaching programme.

Kelcy Shepherd of Amherst College used it in her workshops.

“Coincidentally, the Five Colleges, a consortium I’m involved in, used the Toolkit a few years ago. Each institution completed the survey to ascertain levels of readiness at the various institutions, and determine areas where it would make sense to collaborate. This helped us identify some concrete steps that we could take together as a consortium.”

Walter D Ray, the Political Papers archivist at Southern Illinois University, used it to assess his library’s readiness:

“I’m glad to see work is being done on the AIDA toolkit. We used it for our self-assessment and found it helpful. As my boss, Director of Special Collections Pam Hackbart-Dean says, “the digital readiness assessment was a useful tool in helping give us direction.” I would add that it helped us define the issues we needed to confront.

“Since then we have developed some policies and procedures, revised our Deed of Gift form, set up a digital forensics workstation, and put a process in place to handle digital projects coming from elsewhere on campus. We greatly appreciate the work you’ve done on the AIDA toolkit.”

However, on the less positive side, Nathan Moles and Christoph Becker of University of Toronto studied AIDA as part of their “in-depth review of the state of the art of assessment frameworks in Digital Preservation.” Their survey of the landscape indicates the following:

“Our work showed that (too) many models have already been designed. Most models have been designed with a focus on practice (which is good), but in very informal ways without rigorous design methods (which is not so good). Aside from a model, there’s also need for a tool, a method, guidance, and empirical evidence from real-world applications to be developed and shared.”

AIDA in particular was found wanting:

“I think AIDA provides an interesting basis to start, but also currently has some shortcomings that we would need to see addressed to ensure that the resulting insights are well-founded. Most importantly, the fundamental concepts and constructs used in the model are currently unclear and would benefit from being set on a clear conceptual foundation.”

These stories show that AIDA had more of a shelf-life and more application than I originally expected. Our hope is that the new AOR Toolkit will give the ideas a new lease of life and continue to be of practical help to some in performing assessments.

Reworking AIDA: Storage

In the fourth of our series of posts on reworking the AIDA self-assessment toolkit, we look at a technical element – Managed Storage.

Reworking AIDA Storage

In reworking the toolkit, we are now looking at the 11th Technology Element. In the “old” AIDA, this was called “Institutional Repository”, and it pretty much assessed whether the University had an Institutional Repository (IR) system and the degree to which it had been successfully implemented, and was being used.

For the 2009 audience, and given the scope of what AIDA was about, an IR was probably just the right thing to assess. In 2009, Institutional Repository software was the new thing and a lot of UK HE & FE institutions were embracing it enthusiastically. Of course your basic IR doesn’t really do storage by itself; certainly it enables sharing of resources, it does managed access, perhaps some automated metadata creation, and allows remote submission of content. An IR system such as EPrints can be used as an interface to storage – as a matter of fact it has a built-in function called “Storage Manager” – but it isn’t a tool for configuring the servers where content is stored.

Storage in 2016

In 2016, a few things occurred to me thinking about the storage topic.

  1. I doubt I shall ever understand everything to do with storage of digital content, but since working on the original AIDA my understanding has improved somewhat. I now know that it is at least technically possible to configure IT storage in ways that match the expected usage of the content. Personally, I’m particularly interested in such configuration for long-term preservation purposes.
  2. I’m also aware that it’s possible for a sysadmin – or even a digital archivist – to operate some kind of interface with the storage server, using for instance an application like “storage manager”, that might enable them to choose suitable destinations for digital content.
  3. Backup is not the same as storage.
  4. Checksums are an essential part of validating the integrity of stored digital objects.

I have thus widened the scope of Element TECH 11 so that we can assess more than the limited workings of an IR. I also went back to two other related elements in the TECH leg, and attempted to enrich them.

To address (1), the capability that is being assessed is not just whether your organisation has a server room or network storage, but rather if you have identified your storage needs correctly and have configured the right kind of storage to keep your digital content (and deliver it to users). We might add this capability is nothing to do with the quantity, number, or size of your digital materials.

To assess (2), we’ve identified the requirement for an application or mechanism that helps put things into storage, take them out again, and assist with access while they are in storage. We could add that this interface mechanism is not doing the same job as metadata, capability for which is assessed elsewhere.

To address (3), I went back to TECH 03 and changed its name from “Ensuring Availability” to “Ensuring Availability / Backing Up”. The element description was then improved with more detailed descriptions concerning backup actions; we’re trying to describe the optimum backup scenario, based on actual organisational needs; and provide caveats for when multiple copies can cause syncing problems. Work done on the CARDIO toolkit was very useful here.

To incorporate (4), I thought it best to include checksums in element TECH 04, “Integrity of Information”. Checksum creation and validation is now explicitly suggested as one possible method to ensure integrity of digital content.

Managed storage as a whole is thus distributed among several measurable TECH elements in the new toolkit.

In this way I’m hoping to arrive at a measurable capability for managed storage that does not pre-empt the use the organisation wishes to make of such storage. The wording is such that even a digital preservation strategy could be assessed in the new toolkit – as could many other uses. If I can get this right, it would be an improvement on simply assessing the presence of an Institutional Repository.

Reworking the AIDA toolkit: why we added new sections to cover Depositors and Users

Why are we reworking the AIDA toolkit?

The previous AIDA toolkit covered digital content in an HE & FE environment. As such, it made a few basic assumptions about usage; one assessment element was not really about the users at all, but about the Institutional capability for measuring use of resources. To put it another way, an Institution might be maintaining a useless collection of material that nobody looks at (at some cost). What mechanism do you have to monitor and measure use of assets?

That is useful, but also limited. For the new toolkit, I wanted to open up the whole question of usage, and base the assessment on a much wider interpretation of the “designated user community”. This catch-all term seems to have come our way via the OAIS reference model, but it seems to have caught on in the community. As I would have it, it should mean:

  • Anyone who views, reads and uses digital material.
  • They do it for many purposes and in many situations –I would like user scenarios to include internal staff looking at born-digital records in an EDRMS, or readers downloading ebooks, or photographers browsing a digital image gallery, or researchers running an app on a dataset.

To understand these needs, and meet them with appropriate mechanisms, ought to be what any self-respecting digital content service is about.

Measuring organisational commitment to users

I thought about how I could turn that organisational commitment into a measurable, assessable thing, and came up with four areas of benchmarking:

  • Creating access copies of digital content, and providing a suitable technological platform to play them on
  • Monitoring and measuring user engagement with digital content, including feedback
  • Evaluation of the user base to identify their needs
  • Some mechanism whereby they relate the user experience to the actual digital content. User evaluation will be an indicator here.

This includes the original AIDA element, but adds more to it. I’d like to think a lot of services can recognise their user community provision in the above.

After that, I thought about the other side of the coin – the people who create and deposit the material with our service in the first place. Why not add a new element to benchmark this?

Measuring organisational commitment to depositors

The OAIS reference model doesn’t have a collective term for these people, but it calls them “Producers”, a piece of jargon I have never much cared for. We decided to stick with “Depositors” for this new element; I’m more interested in the fact that they are transferring content to us, whether or not they actually “produced” it. As I would have it, a Depositor means:

  • Anyone who is a content creator, submitter, or donor, putting digital material into your care.
  • Again, they do it in many situations: external depositors may donate collections to an archive; internal users may transfer their department’s born-digital records to an organisational record-keeping system; researchers may deposit publications, or datasets, in a repository.

When trying to benchmark this, it occurred to me there’s a two-way obligation going on in this transfer situation; we have to do stuff, and so do the depositors. We don’t have to be specific about these obligations in the toolkit; just assess whether they are understood, and supported.

In reworking the toolkit, I came up with the following assessable things:

  • Whether obligations are understood, both by depositors and the staff administering deposits
  • Whether there are mechanisms in place for allowing transfer and deposit
  • Whether these mechanisms are governed by formal procedures
  • Whether these mechanisms are supported by documents and forms, and a good record-keeping method

For both Users and Depositors, there will of course be legal dimensions that underpin access, and which may even impact on transfer methods. However, these legal aspects are catered for in two other benchmarking elements, which will be the subject of another blog post.

Conclusion

With these two new elements, I have fed in information and experience gained from teaching the DPTP, and from my consultancy work; I hope to make the new AIDA into something applicable to a wider range of digital content scenarios and services.

Building a Digital Preservation Strategy

IRMS ARAI Event 19 November 2015

Last week I was in Dublin where I gave a presentation for the IRMS Ireland Group at their joint meeting with ARA Ireland. It was great for me personally to address a roomful of fellow Archivists and Records Managers, and learn more about how they’re dealing with digital concerns in Ireland. I heard a lot of success stories and met some great people.

Sarah Hayes, the Chair of IRMS Ireland, heard me speak earlier this year at the Celtic Manor Hotel (the IRMS Conference) and invited me to talk at her event. Matter of fact I got a similar invite from IRMS Wales this year, but Sarah wanted new content from me, specifically on the subject of Building a Digital Preservation Strategy.

How to develop a digital preservation strategy

My talk on developing a digital preservation strategy made the following points:

  • Start small, and grow the service
  • You already have knowledge of your collections and users – so build on that
  • Ask yourself why you are doing digital preservation, and who will benefit
  • Build use cases
  • Determine your own organisational capacity for the task
  • Increase your metadata power
  • Determine your digital preservation strategy (or strategies) in advance of talking to IT, or a vendor

I also presented some imaginary scenarios that would address digital preservation needs incrementally and meet requirements for different audiences:

  • Bit-level preservation (access deferred)
  • Emphasis on access and users
  • Emphasis on archival care of digital objects
  • Emphasis on legal compliance
  • Emphasis on income generation

Event Highlights

In fact the whole day was themed on Digital Preservation issues. John McDonough, the Director of the National Archives of Ireland, gave encouraging reports of how they are managing electronics records by “striding up the slope of enlightenment”. There’s an expectation that public services in Ireland must be “digital by default”, with an emphasis on continual online access to archival content in digital form. John is clear that archives in Ireland “underpin citizen’s rights” and are crucial to the “development of Nation and statehood”, which fits the picture I have of Dublin’s culture – it’s a city with a very clear sense of its own identity, and history.

In terms of change management and advocacy for working digitally, Joanne Rothwell has single-handedly transformed the records management of Waterford City and County Council, using SharePoint. Her resourceful use of an alphanumeric File Index allows machine-readable links between paper records and born-digital content, thus preserving continuity of materials. She also uses SharePoint’s site-creation facility to build a virtual space for holding “non-current” records, which replicate existing file structures. It’s splendid to see sound records management practice carry across into the digital realm so successfully.

DPTP alumnus from the class of November 2011, Hugh Campbell of the Public Record Office of Northern Ireland, has developed a robust and effective workflow for the transfer, characterisation and preservation of digital content. It’s not only a model of good practice, but he’s done it all in-house with his own team, using open source tools and developer skills.

During the breaks I managed to mingle and met many other professionals in Ireland who have responded well to digital challenges. I was especially impressed by Liz Robinson, the Records Officer for the Health and Safety Authority in Ireland. We agreed that any system implementation should only proceed after a thorough planning period, where the organisation establishes its own workflows and procedures, and does proper requirements gathering. This ought to be a firm foundation in advance of purchasing and implementing a system. Sadly, we’ve both seen projects where the system drove the practice, rather than the other way around.

Plan, plan and plan again before you speak to a vendor; this was the underlying message to my ‘How to develop a digital preservation strategy’ talk, so it was nice to be singled out in one Tweet as a “particular highlight” of the day.

Digital Preservation: new assessment tools

This year I collaborated with Chris Fryer of Northumberland Estates on a project under the auspices of the Jisc’s SPRUCE funding. It’s ended up as a case study, and it’s an assessment of available digital preservation solutions. The main aim was to build outputs that would have value to smaller organisations, who intend to implement digital preservation on a limited budget; Chris in particular wanted something aligned very closely to his own business case, and local practices.

We believe that the methodology we used on this project, if not the actual deliverables, will have some reuse value for other small organisations. There are four useful outputs in our toolkit:

  1. A requirements shopping list – a specification of what the chosen system would have to do
  2. An assessment form – the same shopping list, expressed as a scored checklist to assess a system
  3. Example(s) of assessments of real-world solutions
  4. A very simple self-assessment form for scoring organisational preparedness for digital preservation, based on ISO 16363.

The Requirements Deliverable is essentially a “shopping list” of what the chosen system has to do to perform digital preservation. It was built from a combination of:

1. The OAIS standard (somewhat selectively)
2. US National Library of Medicine 2007 specification
3. Suggestions sent by Jen Mitcham (Digital Archivist at the University of York), QA supplier to the project

We wanted to keep the specification concise, manageable and realistic so that it would meet the immediate business needs of Northumberland Estates, while also adhering to best practice. The project team agreed that it was not necessary to adhere to every last detail of OAIS compliance. This approach might horrify purists, but it worked in this context.

The Assessment Form deliverable is a recasting of the requirements document into a form that could be used for assessing a preservation solution. We added a simple scoring range, and a weighted score methodology to add weight to the “essential” requirements.

With these two deliverables, we achieved a credible specification and assessment method that is a good fit for Northumberland Estates. Our methodology shows it would be possible for any small organisation to devise their own suitable specification. It is based not exclusively on OAIS, but on the business needs of NE and a simple understanding of the user workflow.

We used our documents to assess actual solutions (I looked at Preservica, the cloud-based version of Safety Deposit Box). Using these assessments, NE stands a better chance of selecting the right system for their business needs, and using a process that can be repeated and objectively verified.

This method should be regarded as quick and easy. Since we used supplier information, success of the method depends on whether that information is accurate and truthful. But it would be a good first step to selecting a supplier. More in-depth assessments of systems are possible.

Lastly we built the cut-down ISO 16363 assessment. This was suggested by the project sponsor to compensate for the technology-heavy direction we had been heading in. ULCC prepared the cut-down and simplified version of ISO 16363, by retaining only those requirements considered essential for the purposes of this project.

This deliverable was explicitly intended to complement and enhance the assessment of the repository solution, so as to be effective in the context of this project. In particular, all of the standard’s section 4 on Digital Object Management is omitted in this deliverable, since most of its essential detail is already expressed in the repository assessment document.

The scoring element uses the Five Organisational Stages model (Kenney / McGovern). This is a very strong model and I also used it in the preparation of AIDA and for my contributions to CARDIO.

There are already a lot of self-assessment tools available for repositories, including very thorough and comprehensive tools like TRAC and DRAMBORA. But with this quick and easy approach, we show it is possible for an organisation to perform a credible ISO self-assessment in a very short time. Users of this tool effectively conduct a mini-gap analysis of their organisation, the results of which could be used as a starting point for building your business case.

Chris’s final report on the project exists as a blog post. The deliverables can be downloaded from the SPRUCE project wiki.

Every man his own modified digital object

Today we’ve just completed our Future-Proofing study at ULCC and sent the final report to the JISC Programme Manager, with hopes of a favourable sign-off so that we can publish the results on our blog.

It was a collaboration between myself and Kit Good, the records manager here at UoL. We’re quite pleased with the results. We wanted to see if we could create preservation copies of core business documents that require permanent preservation, but do it using a very simple intervention and with zero overheads. So we worked with a simple toolkit of services and software that can plug into a network drive; we used open source migration and validation tools. Our case study sought to demonstrate the viability of this approach. Along the way we learned a lot about how Xena digital preservation software operates, and how (combined with Open Office) it makes a very credible job of producing bare-bones Archival Information Packages, and putting information into formats with improved long-term prospects.

The project has worked on a small test corpus of common Institutional digital records, performed preservation transformations on them and conducted systematic evaluation to ensure that the conversions worked, that the finished documents render correctly, that sufficient metadata been generated for preservation purposes, and that it can feasibly be extracted and stored in a database; and that the results are satisfactory and fit for purpose.

The results show us that it is possible to build a low-cost, practical preservation solution that addresses immediate preservation problems, makes use of available open source tools, and requires minimal IT support. We think the results of the case study can feasibly be used by other Institutions facing similar difficulties, and scaled up to apply to the preservation of other and more complex digital objects. It will enable non-specialist information professionals to perform certain preservation and information management tasks with a minimum of preservation-specific theoretical knowledge.

Future-Proofing won’t solve your records management problems, but it stands a chance of empowering records managers by allowing them to create preservation-worthy digital objects out of their organisation’s records, without the need for an expensive bespoke solution.

Archiving a wiki

On dablog recently I have put up a post with a few observations about archiving a MediaWiki site. The example is the UKOLN Repositories Research Team wiki DigiRep, selected for the JISC to add to their UKWAC collection (or to put it more accurately, pro-actively offered for archiving by DigiRep’s manager). The post illustrates a […]

On dablog recently I have put up a post with a few observations about archiving a MediaWiki site. The example is the UKOLN Repositories Research Team wiki DigiRep, selected for the JISC to add to their UKWAC collection (or to put it more accurately, pro-actively offered for archiving by DigiRep’s manager). The post illustrates a few points which we have touched on in the PoWR Handbook, which I’d like to illuminate and amplify here.

Firstly, we don’t want to gather absolutely everything that’s presented as a web page in the wiki, since the wiki contains not only the user-input content but also a large number of automatically generated pages (versioning, indexing, admin and login forms, etc). This stems from the underlying assumption about doing digital preservation, mainly that it costs money to capture and store digital content, and it goes on costing money to keep on storing it. (Managing this could be seen as good housekeeping. The British Library Life and Life2 projects have devised ingenious and elaborate formulae for costing digital preservation, taking all the factors into account to enable you to figure out if you can really afford to do it.) In my case, there are two pressing concerns: (a) I don’t want to waste time and resource in the shared gather queue while Web Curator Tool gathers hundreds of pages from DigiRep, and (b) I don’t want to commit the JISC to paying for expensive server space, storing a bloated gather which they don’t really want.

Secondly, the above assumptions have led to me making a form of selection decision, i.e. to exclude from capture those parts of the wiki I don’t want to preserve. The parts I don’t want are the edit history and the discussion pages. The reason I don’t want them is because UKWAC users, the target audience for the archived copy – or the designated user community, as OAIS calls it – probably don’t want to see them either. All they will want is to look at the finished content, the abiding record of what it was that DigiRep actually did.

This selection aspect led to Maureen Pennock’s reply, which is a very valid point – there are some instances where people would want to look at the edit history. Who wrote what, when…and why did it change? If that change-history is retrievable from the wiki, should we not archive it? My thinking is that yes, it is valuable, but only to a certain audience. I would think the change history is massively important to the current owner-operators of DigiRep, and that as its administrators they would certainly want to access that data. But then I put on my Institutional records management hat, and start to ask them how long they really want to have access to that change history, and whether they really need to commit the Institution to its long-term (or even permanent) preservation. Indeed, could their access requirement be satisfied merely by allowing the wiki (presuming it is reasonably secure, backed-up etc.) to go on operating the way it is, as a self-documenting collaborative editing tool?

All of the above raises some interesting questions which you may want to consider if undertaking to archive a wiki in your own Institution. Who needs it, how long for, do we need to keep every bit of it, and if not then which bits can we exclude? Note that they are principally questions of policy and decision-making, and don’t involve a technology-driven solution; the technology comes in later, when you want to implement the decisions.