Anti-folder, pro-searching

Chris Loftus at the University of Sheffield has detected a trend among tech giants Google and Microsoft in their cloud storage provision. They would prefer to us make more use of searches to find material, rather than store it in named folders.

With MS SharePoint at least – which is more than just cloud storage, it’s a whole collaborative environment with built-in software and numerous features – my sense is that Microsoft would be happier if we moved away from using folders. One reason for this might be because these cloud-based web-accessed environments would struggle if the pathway or URL is too long; presumably the more folders you add, the more the string grows, and you make the problem worse. So there’s a practical technical reason right there; we wanted a way to work collaboratively in the cloud, but maybe some web browsers can’t cope.

However, I also think SharePoint’s owners are trying to edge us towards taking another view of our content. This is probably based on its use of metadata. SharePoint offers a rich array of tags; one instance that springs to mind is the “Create Column” feature that enables the user to build their own metadata fields for the their content (such as Department Name) and populate it with their own content. This enables the user to create a custom view of thousands of documents, with useful fields arranged in columns. The columns can be searched, filtered, sorted, rearranged.

This could be called a “paradigm shift” by those who like such jargon…it’s a way of moving towards a “faceted view” of individual documents, based on metadata selections, not unlike the faceted views offered by Institutional Repository software (which allow browsing by year, name of author, departments; see this page for instance).

Advocates of this approach would say that this faceted view is arguably more flexible and better than the views of documents afforded by the old hierarchical folder structure in Windows, which tends to flatten out access to a single point of entry, which must be followed by drilling-down into a single route and opening more sub-folders. Anecdotally, I have heard of enthusiasts who actively welcome this future – “we’ll make folders a thing of the past!”

In doing this, perhaps Microsoft are exploiting a feature which has been present in their product for some time now, even before SharePoint. I mean document properties; when one creates a Word file, some of these properties (including dates) are generated. Some of them (Title, Comments) can be added by the user, if so inclined. Some can be auto-populated, for instance a person’s name – if the institution managed to find a way to synch Outlook address book data, or the Identity Management system, with document authoring.

Few users have ever bothered much with creating or using document properties, in my experience. It’s true they aren’t really that “visible”. If you right-click on any given file, you can see some of them. Some of them are also visible if you decide to pick certain “details” from a drop down, which then turn into columns in Windows Explorer. Successive versions of Explorer have gradually tweaked that feature. In one sense, SharePoint have found a way to expose these fields, leverage the properties even more dynamically. Did I mention SharePoint is like a gigantic database?

I might want to add that success in SharePoint metadata depends on an organisation taking the trouble to do it, and configure the system accordingly. If you don’t, SharePoint probably isn’t much of an improvement over the old Windows Explorer way. If you do want to configure it that way, I would say it’s a process that should be managed by a records manager or someone who knows about naming conventions and rules for metadata entry; I seem to be saying it’s not unlike building an old-school (paper) file registry with a controlled vocabulary. How 19th-century is that? But if that path is not followed, might there not be the risk of free-spirited column-adding and naming by individual users, resulting in metadata (and views) that are only of value to themselves.

However, I would probably be in favour of anything that moves us away from the “paper metaphor”. What I mean by this is that storing Word-processed files, spreadsheets and emails in (digital) folders has encouraged us to think we can carry on working the old pre-digital way, and imagine that we are “doing the filing” by putting pieces of paper into named folders. This has led to tremendous errors in electronic records management systems, which likewise perpetuate this paper-based myth, and create the illusion that records can be managed, sentenced and disposed on a folder basis. Any digital change offers us an opportunity to rethink the way we do things, but the paper metaphor gets in the way of that. If nothing else, SharePoint allows us a way of apprehending content that is arguably “truer” to computer science.

Updating the AIDA toolkit

This week, I have been mostly reworking and reviewing the ULCC AIDA toolkit. We’re planning to relaunch it later this year, with a new name, new scope, and new scorecard.

AIDA toolkit – a short history

The AIDA acronym stands for “Assessing Institutional Digital Assets”. Kevin Ashley and myself completed this JISC-funded project in 2009, and the idea was it could be used by any University – i.e. an Institution – to assess its own capability for managing digital assets.

At the time, AIDA was certainly intended for an HE/FE audience; and that’s reflected in the “Institutional” part of the name, and the type of digital content in scope. Content likely to have been familiar to anyone working in HE – digital libraries, research publications, digital datasets. As a matter of fact, AIDA was pressed into action as a toolkit directly relevant to the needs of Managing Research Data, as is shown by its reworking in 2011 into the CARDIO Toolkit.

I gather CARDIO, under the auspices of Joy Davidson, HATII and the DCC, has since been quite successful and its take-up among UK Institutions to measure or benchmark their own preparedness for Research Data Management perhaps indicates we were doing something right.

A new AIDA toolkit for 2016

My plan is to open up the AIDA toolkit so that it can be used by more people, apply to more content, and operate on a wider basis. In particular, I want it to apply to:

  • Not just Universities, but any Organisation that has digital content
  • Not just research / library content, but almost anything digital (the term “Digital Assets” always seemed vague to me; where the term “Digital Asset Management” is in fact something very specific and may refer to particular platforms and software)
  • Not just repository managers, but also archivists, records managers, and librarians working with digital content.

I’m also going to be adding a simpler scorecard element; we had one for AIDA before, but it got a little too “clever” with its elaborate weighted scores.

Readers may legitimately wonder if the community really “needs” another self-assessment tool; we teach several of the known models on our Digital Preservation Training Programme, including the use of the TRAC framework for self-assessment purposes; and since doing AIDA, the excellent DPCMM has become available, and indeed the latter has influenced my thinking. The new AIDA toolkit will continue to be a free download, though, and we’re aiming to retain its overall simplicity, which we believe is one of its strengths.

A new acronym

As part of this plan, I’m keen to bring out and highlight the “Capability” and “Management” parts of the AIDA toolkit, factors which have been slightly obscured by its current name and acronym. With this in mind, I need a new name and a new acronym. The elements that must be included in the title are:

  • Assessing or Benchmarking
  • Organisational
  • Capacity or Readiness [for]
  • Management [of]
  • Digital Content

I’ve already tried feeding these combinations through various online acronym generators, and come up empty. Hence we would like to invite the wider digital preservation community & use the power of crowd-sourcing to collect suggestions & ideas. Simply comment below or tweet us at @dart_ulcc and use the #AIDAthatsnotmyname hashtag. Naturally, the winner(s) of this little crowd-sourcing contest will receive written credit in the final relaunched AIDA toolkit.

Building a Digital Preservation Strategy

IRMS ARAI Event 19 November 2015

Last week I was in Dublin where I gave a presentation for the IRMS Ireland Group at their joint meeting with ARA Ireland. It was great for me personally to address a roomful of fellow Archivists and Records Managers, and learn more about how they’re dealing with digital concerns in Ireland. I heard a lot of success stories and met some great people.

Sarah Hayes, the Chair of IRMS Ireland, heard me speak earlier this year at the Celtic Manor Hotel (the IRMS Conference) and invited me to talk at her event. Matter of fact I got a similar invite from IRMS Wales this year, but Sarah wanted new content from me, specifically on the subject of Building a Digital Preservation Strategy.

How to develop a digital preservation strategy

My talk on developing a digital preservation strategy made the following points:

  • Start small, and grow the service
  • You already have knowledge of your collections and users – so build on that
  • Ask yourself why you are doing digital preservation, and who will benefit
  • Build use cases
  • Determine your own organisational capacity for the task
  • Increase your metadata power
  • Determine your digital preservation strategy (or strategies) in advance of talking to IT, or a vendor

I also presented some imaginary scenarios that would address digital preservation needs incrementally and meet requirements for different audiences:

  • Bit-level preservation (access deferred)
  • Emphasis on access and users
  • Emphasis on archival care of digital objects
  • Emphasis on legal compliance
  • Emphasis on income generation

Event Highlights

In fact the whole day was themed on Digital Preservation issues. John McDonough, the Director of the National Archives of Ireland, gave encouraging reports of how they are managing electronics records by “striding up the slope of enlightenment”. There’s an expectation that public services in Ireland must be “digital by default”, with an emphasis on continual online access to archival content in digital form. John is clear that archives in Ireland “underpin citizen’s rights” and are crucial to the “development of Nation and statehood”, which fits the picture I have of Dublin’s culture – it’s a city with a very clear sense of its own identity, and history.

In terms of change management and advocacy for working digitally, Joanne Rothwell has single-handedly transformed the records management of Waterford City and County Council, using SharePoint. Her resourceful use of an alphanumeric File Index allows machine-readable links between paper records and born-digital content, thus preserving continuity of materials. She also uses SharePoint’s site-creation facility to build a virtual space for holding “non-current” records, which replicate existing file structures. It’s splendid to see sound records management practice carry across into the digital realm so successfully.

DPTP alumnus from the class of November 2011, Hugh Campbell of the Public Record Office of Northern Ireland, has developed a robust and effective workflow for the transfer, characterisation and preservation of digital content. It’s not only a model of good practice, but he’s done it all in-house with his own team, using open source tools and developer skills.

During the breaks I managed to mingle and met many other professionals in Ireland who have responded well to digital challenges. I was especially impressed by Liz Robinson, the Records Officer for the Health and Safety Authority in Ireland. We agreed that any system implementation should only proceed after a thorough planning period, where the organisation establishes its own workflows and procedures, and does proper requirements gathering. This ought to be a firm foundation in advance of purchasing and implementing a system. Sadly, we’ve both seen projects where the system drove the practice, rather than the other way around.

Plan, plan and plan again before you speak to a vendor; this was the underlying message to my ‘How to develop a digital preservation strategy’ talk, so it was nice to be singled out in one Tweet as a “particular highlight” of the day.

Making Progress in Digital Preservation: Part 3 – Roundtable

This one-day event on 31 October 2014 was organised by the DPC. The day concluded with a roundtable discussion, featuring a panel of the speakers and taking questions from the floor. The level of engagement from delegates throughout the event was clearly shown in the interesting questions posed to the panel, the thoughtful responses and the buzz of general discussion in this session. Among many interesting topics covered, three stand out as typical of the breadth of knowledge and interest shown at the event.

First, a fundamental question about the explosion of digital content and how it will impact on our work. How can we keep all of this stuff, where will we put it, and how much will it really cost? Sarah Middleton urged us to attend the upcoming 4C Conference in London to hear discussion of cutting-edge ideas about large-scale storage approaches. Catherine Hardman reminded us of one of the most obvious archival skills, which we sometimes tend to forget: selection. We do not have to keep “everything”, and a well-formulated selection policy continues to be an effective way to target the preservation of the most meaningful digital resources.

Next, a question on copyright and IPR as it applies to archives/archivists and hence digital preservation quickly span into the audience and back to different panel members in a lively discussion. The general inability of the current legislation, formed in a world of print, to deal with the digital reality of today was quickly identified as an obstacle to both those engaged in digital preservation and to users seeking access to digital resources.

The Hargreaves report was mentioned (by Ed Pinsent of ULCC) and given an approving nod for the sensible approach it took to bringing legislation into the 21st century. However, the speed with which any change has actually been implemented was of concern for all, and was felt to be damaging to the need to preserve material. The issues around copyright and IPR were knowledgeable discussed from a wide variety of perspectives, including the cultural heritage sector, specialist collections, archaeological data and resources and, equally important among delegates, the inability to fully open up collections to users in order to comply with the law as it stands.

Some hope was found, though, in the recent (and ongoing) Free Our History campaign. Using the national and international awareness of various exhibitions, broadcasts and events to mark the anniversary of the First World War, the campaign has focussed on the WW1 content that museums, libraries and archives are unable to display because of current copyright law. Led by the National Library of Scotland, other memory institutions and many cultural heritage institutions have joined in the CILIP campaign to prominently exhibit a blank piece of paper. The blank page represents the many items which cannot be publicly displayed. The visual impact of such displays has caught attention, and the accompanying petition is currently being addressed by the UK government.

The third issue raised during this session was the suggestion for more community activity, for example more networking and exchange of experience opportunities. Given the high rate of networking during lunchtime and breaks, not to mention the lively discussions and questions, this was greeted with enthusiasm. Kurt Helfrich from RIBA explained his idea for an informal group to organise site visits and exchange of experience sessions among themselves, perhaps based in London to start off with. Judging by the level of interest among delegates to share their own work and learn from others during this day, this would be really useful to many. Leaving the event with positive plans for practical action felt a very fitting way to end an event around making progress in digital preservation.

Download the slides from this event

Making Progress in Digital Preservation: Part 1 – The path towards a steady state

This one-day event on 31 October 2014 was organised by the DPC and hosted at the futuristic, spacious offices of HSBC, where the presentation facilities and the catering were excellent. All those attending were given plenty of mental exercises by William Kilbride. He said he wanted to build on his “Getting Started in Digital Preservation” events and help everyone move further along the path towards a steady state, where digital preservation starts to become “business as usual”. The very first exercise he proposed was a brief sharing-discussion exercise where people shared things they have tried, and what worked and didn’t work.

Kurt Helfrich from The RIBA Library said his organisation had a large amount of staff administering a historic archive; various databases, created at different time for different needs, would be better if connected. He was keen to collaborate with other RIBA teams and link “silos” in his agency.

Lindsay Ould from Kings College London said “starting small worked for us”. They’ve built a standalone virtual machine, using locally-owned kit, and are using it for “manual” preservation; when they’ve got the process right, they could automate it and bring in network help from IT.

When asked about “barriers to success”, over a dozen hands in the room went up. Common themes: getting the momentum to get preservation going in the first place; extracting a long-term commitment from Executives who lose interest when they see it’s not going to be finished in 12 months. There’s a need to do advocacy regularly, not just once; and a need to convince depositors to co-operate. IT departments, especially in the commercial sector, are slow to see the point of digital preservation if its “business purpose” – a euphemism for “income stream”, I would say – is not immediately apparent. Steph Taylor of ULCC pointed out how many case studies in tools in our profession are mostly geared to the needs of large memory institutions, not the dozens of county archives and small organisations who were in the room.

Ed Pinsent (i.e. me) delivered a talk on conducting a preservation assessment survey, paying particular attention to the Digital Preservation Capability Maturity Model and other tools and standards. If done properly, this could tell you useful things about your capability to support digital preservation; you could even use the evidence from the survey to build a business case for investment or funding. The tricky thing is choosing the model that’s right for you; there are about a dozen available, with varying degrees of credibility as to their fundamental basis.

Catherine Hardman from the Archaeological Data Service (ADS) is one who is very much aware of “income streams”, since the profession of archaeology has become commercialised and somewhat profit-driven. She now has to engage with many depositors as paying customers. To that end, she’s devised a superb interface called ADS Easy that allows them to upload their own deposits, and add suitable metadata through a series of web forms. This process also incorporates a costing calculator, so that the real costs of archiving (based on file size) can be estimated; it even acts as a billing system, creating and sending out invoices. Putting this much onus on depositors is, in fact, a proven effective way of engaging with your users. In the same vein, ADS have published good practice guidance on things to consider when using CAD files, and advice on metadata to add to a Submission Package. Does she ever receive non-preferred formats in a transfer? Yes, and their response is to send them back – the ADS has had interesting experiences with “experimental” archaeologists in the field. Kurt Helfrich opened up the discussion here, speaking of the lengthy process before deposit that is sometimes needed; he memorably described it as a “pre-custodial intervention”. Later in the day, William Kilbride picked up this theme: maybe “starting early”, while good practice, is not ambitious enough. Maybe we have to begin our curation activities before the digital object is even created!

Catherine also perceived an interesting shift in user expectations; they want more from digital content, and leaps in technology make them impatient for speedy delivery. As part of meeting this need, ADS have embraced OAI-PMH protocols, which enables them to reuse their collections metadata and enhance their services to multiple external shareholders.

There is no doubt that having a proper preservation policy in place would go some way to helping address issues like this. When Kirsty Lee from the University of Edinburgh asked how many of us already had a signed-off policy document, the response level was not high. She then shared with us the methodology that she’s using to build a policy at Edinburgh, and it’s a thought-through meticulous process indeed. Her flowcharts show her constructing a complex “matrix” of separate policy elements, all drawn from a number of reports and sources, which tend to say similar things but in different ways; her triumph has been to distil this array of information and, equally importantly, arrange the elements in a meaningful order.

Kirsty is upbeat and optimistic about the value of a preservation policy. It can be a statement of intent; a mandate for the archive to support digital records and archives. It provides authority and can be leverage for a business case; it helps get senior management buy-in. To help us understand, she gave us an excellent handout which listed some two dozen elements; the exercise was to pick only the ones that suit our organisation, and to put them in order of priority. The tough part was coming up with a “single sentence that defines the purpose of your policy” – I think we all got stumped by this!

Download the slides from this event

IT skills for archivists and librarians

In September this year Dave Thompson of the Wellcome Library asked a question by Twitter, one which is highly relevant to digital preservation practice and learning skills. Addressing digital archivists and librarians, he asked: “Do we need to be able to do all ourselves, or know how to ask for what is required?”

My answer is “we need to do both”…and I would add a third thing to Dave’s list. We also need to understand enough of what is happening when we get what we ask for, whether it’s a system, tool, application, storage interface, or whatever.

Personally, I’ve got several interests here. I’m a traditional archivist (got my diploma in 1992 or thereabouts) with a strong interest in digital preservation, since about 2004.

As an archivist wedded to paper and analogue methods, for some years I was fiercely proud of my lack of IT knowledge. Whenever forced to use IT, I found I was always happier when I could open an application, see it working on the screen, and experiment with it until it does what I want it to do. On this basis, for example, I loved playing around with the File Information Tool Set (FITS).

When I first managed to get some output from FITS, it was like I was seeing the inside of a file format for the first time. I could see tags and values of a TIFF file, some of which I was able to recognise as those elusive “significant properties” you hear so much about. So this is what they look like! From my limited understanding of XML – which is what FITS outputs into – I knew that XML was structured and could be stored in a database. That meant I’d be able to store those significant properties as fields in a database, and interrogate them. This would give me the intellectual control that I used to relish with my old card catalogues in the late 1980s. I could see from this how it would be possible to have “domain” over a digital object.

There’s a huge gap, I know, between me messing around on my desktop and the full functionality of a preservation system like Preservica. But with exercises like the above, I feel closer to the goal of being able to “ask for what is required”, and more to the point, I could interpret the outputs of this functionality to some degree. I certainly couldn’t do everything myself, but I want to feel that I know enough about what’s happening in those multiple “black boxes” to give me the confidence I need as an archivist that my resources are being preserved correctly.

I would like to think it’s possible to equip archivists, librarians and data managers with the same degree of confidence; teaching them “just enough” of what is happening in these complex processes, at the same time translating machine code into concrete metaphors that an information professional can grasp and understand. In short, I believe these things are knowable, and archivists should know them. Of course it’s important that the next step is to open a meaningful discussion with the developer, data centre manager, or database engineer (i.e. “ask for what is required”), but it’s also important to keep that dialogue open, to go on asking, to continue understanding what these tools and systems are doing. There is a school of thought that progress in digital preservation can only be made when information professionals and IT experts collaborate more closely, and I would align myself with that.

Digital Preservation: new assessment tools

This year I collaborated with Chris Fryer of Northumberland Estates on a project under the auspices of the Jisc’s SPRUCE funding. It’s ended up as a case study, and it’s an assessment of available digital preservation solutions. The main aim was to build outputs that would have value to smaller organisations, who intend to implement digital preservation on a limited budget; Chris in particular wanted something aligned very closely to his own business case, and local practices.

We believe that the methodology we used on this project, if not the actual deliverables, will have some reuse value for other small organisations. There are four useful outputs in our toolkit:

  1. A requirements shopping list – a specification of what the chosen system would have to do
  2. An assessment form – the same shopping list, expressed as a scored checklist to assess a system
  3. Example(s) of assessments of real-world solutions
  4. A very simple self-assessment form for scoring organisational preparedness for digital preservation, based on ISO 16363.

The Requirements Deliverable is essentially a “shopping list” of what the chosen system has to do to perform digital preservation. It was built from a combination of:

1. The OAIS standard (somewhat selectively)
2. US National Library of Medicine 2007 specification
3. Suggestions sent by Jen Mitcham (Digital Archivist at the University of York), QA supplier to the project

We wanted to keep the specification concise, manageable and realistic so that it would meet the immediate business needs of Northumberland Estates, while also adhering to best practice. The project team agreed that it was not necessary to adhere to every last detail of OAIS compliance. This approach might horrify purists, but it worked in this context.

The Assessment Form deliverable is a recasting of the requirements document into a form that could be used for assessing a preservation solution. We added a simple scoring range, and a weighted score methodology to add weight to the “essential” requirements.

With these two deliverables, we achieved a credible specification and assessment method that is a good fit for Northumberland Estates. Our methodology shows it would be possible for any small organisation to devise their own suitable specification. It is based not exclusively on OAIS, but on the business needs of NE and a simple understanding of the user workflow.

We used our documents to assess actual solutions (I looked at Preservica, the cloud-based version of Safety Deposit Box). Using these assessments, NE stands a better chance of selecting the right system for their business needs, and using a process that can be repeated and objectively verified.

This method should be regarded as quick and easy. Since we used supplier information, success of the method depends on whether that information is accurate and truthful. But it would be a good first step to selecting a supplier. More in-depth assessments of systems are possible.

Lastly we built the cut-down ISO 16363 assessment. This was suggested by the project sponsor to compensate for the technology-heavy direction we had been heading in. ULCC prepared the cut-down and simplified version of ISO 16363, by retaining only those requirements considered essential for the purposes of this project.

This deliverable was explicitly intended to complement and enhance the assessment of the repository solution, so as to be effective in the context of this project. In particular, all of the standard’s section 4 on Digital Object Management is omitted in this deliverable, since most of its essential detail is already expressed in the repository assessment document.

The scoring element uses the Five Organisational Stages model (Kenney / McGovern). This is a very strong model and I also used it in the preparation of AIDA and for my contributions to CARDIO.

There are already a lot of self-assessment tools available for repositories, including very thorough and comprehensive tools like TRAC and DRAMBORA. But with this quick and easy approach, we show it is possible for an organisation to perform a credible ISO self-assessment in a very short time. Users of this tool effectively conduct a mini-gap analysis of their organisation, the results of which could be used as a starting point for building your business case.

Chris’s final report on the project exists as a blog post. The deliverables can be downloaded from the SPRUCE project wiki.

Enhancing Linnean Online: The AIDA metrics

From Enhancing Linnean Online project blog

We’ve mentioned so far Beagrie’s metrics for measuring improvements to the management of academic research data, and the Ithaka metrics for measuring improvements to delivery of content, particularly with regard to the operations of an organisation’s business model.

A third possibility is making use of UoL’s AIDA toolkit, a desk-based assessment method which has gone through many iterations and possible applications. Over time, we’ve shown how it could be used for digital assets, records management, and even research data (although admittedly it has never been used in anger in those situations). AIDA doesn’t intend to measure assets, but instead measures the capability of the Institution (or the owning Organisation) to preserve its own digital resources.

In July 2011 we produced a detailed reworking of AIDA that could specifically be used for research data. This was part of the JISC-funded IDMP project and the intention was that AIDA could feed into the DCC’s online assessment tool, CARDIO. The detail of the reworked AIDA was assisted greatly by the expertise of numerous external consultants, recruited from a wide range of international locations and skillsets. They fine-tuned the wording of the AIDA assessment statements to make it into a benchmarking tool with great potential.

AIDA is predicated on the notion of “continuous improvement”, and expresses its benchmarking with an adapted version of the “Five Stages” model which was originally invented and developed at Cornell University by Anne Kenney and Nancy McGovern. It also uses their “Three Legs” framework to ensure that the three mainstays of digital preservation (i.e. Organisation, Technology and Resources) are properly investigated.

We think there may be some scope for applying AIDA to JISC ELO, mainly as an analysis tool or knowledge base for measuring the results of responses to questionnaires and surveys. It could assess broadly whether the Linnean Online service finds itself at a Stage Two or Stage Three. We could subsequently measure whether the enhancements, once implemented, have moved the service forward to a Stage Four or Stage Five.

This could be done with a little tweaking of the wording of the current iteration of AIDA, and through selective / partial application of its benchmarks. We think it would be a good fit for the ELO project strands which discuss Metadata, Licensing, and Preservation Policy – all of which are expressed in the Organisation leg of AIDA. The Resources leg of AIDA could be tweaked to measure improvements in the area of ELO’s Revenue Generation. One of the most salient features of AIDA is its flexibility.

Versions of the adapted AIDA toolkit can be found via the project blog, although the improved CARDIO version has not been published as yet.