What does an archivist do?

This post is a response to a Tweet from Judith Dray seen recently. The plea was for a “cool, interesting, and accessible way to describe what an archivist does”.

I worked as a “traditional” archivist for the General Synod of the Church of England for about 15 years. When I say traditional, I mean I worked with paper records and archives. I could easily describe what I did in the usual terms, involving cataloguing, indexing, arrangement, description, and boxing of materials and putting them on shelves. In so doing, I would probably confirm the clichéd view that an archivist is a solitary hermit who loses themselves in the abstruse rules of provenance and original order.

But that misses the bigger picture. The work of an archivist only has any value if we put it in context. This context involves tangible things like other people, organisations, and the work and life of other people; and also involves abstract ideas, like culture, meaning, and history.

Below, I have written seven Tweetable responses to Judith Dray. But I had to unpack each Tweet into a paragraph of prose. I may have resorted to some hyperbole and rhetoric, but I like to think there is still a grain of truth in my ravings and fantasising. For this post, I have thought myself back into the past, and temporarily forgotten whatever I might know about digital preservation.

1. An archivist brings order to chaos.

Just give an archivist a random-seeming mess of unsorted papers and see how quickly that mess is transformed into an accessible collection. This is because archivist is applying sorting skills, based on their knowledge of parent collections, parent organisation, and former owners. Colleagues at the Synod sometimes assumed I was just “doing the filing”, but I think there’s more to it.

2. An archivist reflects the truth of an organisation.

If you want to know the core meaning, truth or essence of any organisation – from a business to a school to a textile factory – the archive holds the authoritative version of it. The archivist brings out that truth, through adhering to the fundamental principles of provenance (where the papers came from) and original order (how they were kept). These two principles may sound musty and boring, yet have proven surprisingly robust as a reliable method for reflecting the truth.

3. An archivist has the holistic view.

A good archivist isn’t just there at the end of the life of a record, but is there right at the start; they know the creators and understand precisely why they create the records that they do. In this way, they connect to and engage with the creating organisation in ways that surpass even the most diligent executive officer or auditor. The development of records management in the 20th century only served to strengthen this inherently archival virtue. At one stage in the 1990s, commercial companies tried to harness that rare skill and monetise it, turning it into something called “Knowledge Management”. Naturally, this failed!

4. An archivist engenders trust in their depositors.

The real value of an archivist’s role has to be seen in the context of people and agencies who use archives. Among these people, the depositors, creators and owners of the resources are key. Over time the “culture” of archives has created and diligently nurtured a trust bond, a covenant if you will, that enables depositors to place their faith in a single archivist or an entire memory institution. That trust has been hard won, but we got there through applying effective procedures for due diligence, managing and documenting every stage in the transfer of content in ways that ensured the integrity of the resource, informed by the “holistic” skill (see above).

5. An archivist enables use and re-use of the archives.

A second key group of archive users comprises the researcher, the scholar, the historian, the reader. In today’s impoverished world the beleaguered archivist has been obliged to reframe “readers” as “customers”, seeing them as an income stream, but the cultural truth is much richer. Archives don’t change; but the historian’s interpretation of the source keeps evolving all the time. If any historian seeks to validate or challenge the interpretation of another, the archives are there – waiting silently for consultation. The same resource can be used to research multiple topics, depending on the “lens” the researcher chooses to apply; there is a well-known archival resource which began life as a land survey, yet in its lifetime it has been used as statistical evidence for population studies, income distribution, place names, family history, and more.

6. An archivist can beat Google hands-down.

In our insatiable lust for faster and deeper browser searches, we sometimes tend to overlook the value of structure. Structure is something an archivist has hard-wired into their genetic code, and it’s what makes archival cataloguing a superior way of organising and presenting information concisely and meaningfully. It’s not about sticking obstinately to the arcane rules of ISAD(G) or insisting on the Fonds-Series-Item hierarchy to the point of madness, but about understanding the structure of meaning, the way that one piece of information “belongs” to another, and how we can use these relationships to bring out the inner truth of the collection. Compared to this deep understanding, any given Google search return may give the user a quick hit of satisfaction, yet it is severely fractured, lacking in context, and disconnected from the core.

7. An archivist makes history manageable.

Any given archive probably represents a very small percentage of the actual records that were created at the time; this is especially true of any 20th century collection. Archivists are able to select the core 5% from this abundance, and yet still preserve the truth of the organisation. We don’t keep “everything”, to put it another way; we keep just the right amount. The skills of appraisal and selection are among the most valuable tools we have for any society that wants to manage its collective memory, yet these skills are taken for granted and under-valued, even by archivists themselves. We can feasibly scale this up to address the challenge of digital content. At a time when the world is creating more digital data than we can store or contain, let alone preserve, the skills of selection and appraisal will be needed more than ever.

PASIG17 reflections: Sheridan’s disruptive digital archive

I was very interested to hear John Sheridan, Head of Digital at The National Archives, present on this theme. He is growing new ways of thinking about archival care in relation to digital preservation. As per my previous post, when these phrases occur in the same sentence then you have my attention. He has blogged about the subject this year (for the Digital Preservation Coalition), but clearly the subject is becoming deeper all the time. Below, I reflect on three of the many points that he makes concerning what he dubs the “disruptive digital archive”.

The paper metaphor is nearing end of life

Sheridan suggests “the deep-rooted nature of paper-based thinking and its influence on our thinking” needs to change and move on. “The archival catalogue is a 19th century thing, and we’ve taken it as far as we can in the 20th century”.

I love a catalogue, but I still agree; and I would extend this to electronic records management. And here I repeat an idea stated some time ago by Andrew Wilson, currently working on the E-ARK project. We (as a community) applied a paper metaphor when we built file plans for EDRM systems, and this approach didn’t work out too well. That approach requires a narrow insistence on single locations for digital objects, locations exactly matching against the retention needs of each object. Not only is this hard work for everyone who has to do “electronic filing”, it proved not to work in practice. It’s one-dimensional, and it stems from the grand error of the paper metaphor.

I would still argue there’d be a place in digital preservation for sorting and curation, “keeping like with like” in directories, though I wouldn’t insist on micro-managing it; and, as archivists and records managers we need to make more use of two things computers can do for us.

One of them is linked aliases; allowing the possibility for digital content sitting permanently in one place on the server, mostly likely in an order that has nothing to do with “original order”, while aliased links, or a METS catalogue, do the work of presenting a view of the content based on a logical sequence or hierarchy, one that the archivist, librarian, and user are happy with. In METS for instance, this is done with the <FLocat> element.

The second one is making use of embedded metadata in Office documents and emails. Though it’s not always possible to get these properties assigned consistently and well, doing so would allow us to view / retrieve / sort materials in a more three-dimensional manner, which the single directory view doesn’t allow us to do.

I dream of a future where both approaches will apply in ways that allow us these “faceted views” of our content, whether that’s records or digital archives.

Get over the need for tidiness

“We are too keen to retrofit information into some form of order,” said Sheridan. “In fact it is quite chaotic.” That resonates with me as much as it would with my other fellow archivists who worked on the National Digital Archive of Datasets, a pioneering preservation service set up by Kevin Ashley and Ruth Vyse for TNA. When we were accessioning and cataloguing a database – yes, we did try and catalogue databases – we had to concede there is really no such thing as an “original order” when it comes to tables in a relational database. We still had to give them ISAD(G) compliant citations, so some form of arrangement and ordering was required, but this is a limitation of ISAD(G), which I still maintain is far from ideal when it comes to describing born-digital content.

I accept Sheridan’s chaos metaphor…one day we will square this circle; we need some new means of understanding and performing arrangement that is suitable for the “truth” of digital content, and that doesn’t require massive amounts of wasteful effort.

Trust

Sheridan’s broad message was that “we need new forms of trust”. I would say that perhaps we need to embrace both new forms and old forms of trust.

In some circles we have tended to define trust in terms of the checksum – exclusively defining trust as a computer science thing. We want checksums, but they only prove that a digital object has not changed; they’re not an absolute demonstration of its trustworthiness. I think Somaya Langley has recently articulated this very issue in the DP0C blog, though I can’t find the reference just now.

Elsewhere, we have framed the trust discussion in terms of the Trusted Digital Repository, a complex and sometimes contentious narrative. One outcome has been that to demonstrate trust, an expensive overhead in terms of certification tick-boxing is required. It’s not always clear how this exercise demonstrates trust to users…see the Twitter snippet below.

Me, I’m a big fan of audit trails – and not just PREMIS, which only audits what happens in the repository. I think every step from creation to disposal should be logged in some way. I often bleat about rescuing audit trails from EDRM systems and CMS systems. And I’d love to see a return to that most despised of paper forms, the Transfer List, expressed in digital form. And I don’t just mean a manifest, though I like them too.

Lastly, there’s supporting documentation. We were very strong on that in the NDAD service too, a provision for which I am certain we have Ruth Vyse to thank. We didn’t just ingest a dataset, but also lots of surrounding reports, manuals, screenshots, data dictionaries, code bases…anything that explained more about the dataset, its owners, its creation, and its use. Naturally our scrutiny also included a survey of the IT environment that was needed to support the database in its original location.

All of this documentation, I believe, goes a long way to engendering trust, because it demonstrates the authenticity of any given digital resource. A single digital object can’t be expected to demonstrate this truth on its own account; it needs the surrounding contextual information, and multiple instances of such documentation give a kind of “triangulation” on the history. This is why the archival skill of understanding, assessing and preserving the holistic context of the resource continues to be important for digital preservation.

Conclusion

Sheridan’s call for “disruption” need not be heard as an alarmist cry, but there is a much-needed wake-up call to the archival profession in his words. It is an understatement to say that the digital environment is evolving very quickly, and we need to respond to the situation with equal alacrity.

The meaning of the term Archive

In this blog post I should like to disambiguate uses of the word “archive”. I have found the term is often open to misunderstandings and misinterpretation. Since I come from a traditional archivist background, I will begin with a definition whose meaning is clear to me.

At any rate, it is a definition that pre-dates computers, digital content, and the internet; the arrival of these agencies has brought us new, ambiguous meanings of the term. Some instances of this follow below. In each instance, I will be looking for whether these digital “archives” imply or offer “permanence”, a characteristic I would associate with a traditional archive. 

    1. In the paper world: an archive is any collection of documents needed for long-term preservation, e.g. for historical, cultural heritage, or business purposes. It can also mean the building where such documents are permanently stored, in accordance with archival standards, or even the memory Institution itself (e.g. The National Archives).
    2. In the digital world: a “digital archive” ought to refer to a specific function of a much larger process called digital preservation. This offers permanent retention, managed storage, and a means of keeping content accessible in the long term. The organisation might use a service like this for keeping content that has no current business need, but it still needed for historical or legal reasons.  Therefore, the content is no longer held on a live system.
      The OAIS Reference Model devised the term “Archival Storage” for describing this, and call it a Functional Entity of the Model; this means it can apply to the function of the organisation that makes this happen, the system that governs it, or the servers where the content is actually stored. More than just storage, it requires system logging, validation, and managed backups on a scale and frequency that exceeds the average network storage arrangement. The outcome of this activity is long-term preservation of digital content.
    3. In the IT world: a sysadmin might identify a tar, zip or gz file as an “archive”. This is an accumulation of multiple files within a single wrapper. The wrapper may or may not perform a compression action on the content. The zipped “archive” is not necessarily being kept; the “archiving” action is the act of doing the zipping / compression.
    4. On a blog: a blog platform, such as WordPress or Google Blogger, organises its pages and posts according to date-based rules. WordPress automatically builds directories to store the content in monthly and annual partitions. These directories are often called “archives”, and the word itself appears on the published blog page. In this context the word “archives” simply designates “non-current content”, in order to distinguish it from this month’s current posts. This “archive” is not necessarily backed up, or preserved; and in fact it is still accessible on the live blog.
    5. In network management: the administrator backs up content from the entire network on a regular basis. They might call this action “archiving”, and may refer to the data, the tapes/discs on which the data are stored, or even the server room as the “archive”. In this instance, it seems to me the term is used to distinguish the backups from the live network. In case of a fail (e.g. accidental data deletion, or the need for a system restore), they would retrieve the lost data from the most recent “archive”. However: none of these “archives” are ever kept permanently. Rather, they are subject to a regular turnover and refreshment programme, meaning that the administrator only ever retains a few weeks or months of backups.
    6. Cloud storage services may offer services called “Data Archive” or “Cloud Archive”. In many cases this service performs the role of extended network storage, except that it might be cheaper than storing the data on your own network. Your organisation also might decide to use this cheaper method to store “non-current” content. In neither case is the data guaranteed to be preserved permanently, unless the provider explicitly states it is, or the provider is using cloud storage as part of a larger digital preservation approach.
    7. For emails: In MS Outlook, there is a term called AutoArchive. When run, this routine will move emails to an “archive” directory, based on rules (often associated with the age of the email) which the user can configure. The action also does a “clear out”, i.e. a deletion, of expired content, again based on rules. There is certainly no preservation taking place. This “AutoArchive” action is largely about moving email content from one part of the system to another, in line with rules. I believe a similar principle has been used to “archive” a folder or list in SharePoint, another Microsoft product. Some organisations scale up this model for email, and purchase enterprise “mail archiving” systems which apply similar age-based rules to the entire mail server. Unless explicitly applied as an additional service, there is no preservation taking place, just data compression to save space.

    To summarise:

    • The term “archive” has been used in a rather diffuse manner in the IT and digital worlds, and can mean variously “compression”, “aggregated content”, “backing up”, “non-current content”, and “removal from the live environment”. While useful and necessary, none of these are guaranteed to offer the same degree of permanence as digital preservation. Of these examples, only digital preservation (implementation of which is a complex and non-trivial task) offers permanent retention, protection, and replayability of your assets.
    • If you are an archivist, content owner, or publisher: when dealing with vendors, suppliers, or IT managers, be sure you take the time to discuss and understand what is meant by the term “archive”, especially if you’re purchasing a service that includes the term in some way.

Building a Digital Preservation Strategy

IRMS ARAI Event 19 November 2015

Last week I was in Dublin where I gave a presentation for the IRMS Ireland Group at their joint meeting with ARA Ireland. It was great for me personally to address a roomful of fellow Archivists and Records Managers, and learn more about how they’re dealing with digital concerns in Ireland. I heard a lot of success stories and met some great people.

Sarah Hayes, the Chair of IRMS Ireland, heard me speak earlier this year at the Celtic Manor Hotel (the IRMS Conference) and invited me to talk at her event. Matter of fact I got a similar invite from IRMS Wales this year, but Sarah wanted new content from me, specifically on the subject of Building a Digital Preservation Strategy.

How to develop a digital preservation strategy

My talk on developing a digital preservation strategy made the following points:

  • Start small, and grow the service
  • You already have knowledge of your collections and users – so build on that
  • Ask yourself why you are doing digital preservation, and who will benefit
  • Build use cases
  • Determine your own organisational capacity for the task
  • Increase your metadata power
  • Determine your digital preservation strategy (or strategies) in advance of talking to IT, or a vendor

I also presented some imaginary scenarios that would address digital preservation needs incrementally and meet requirements for different audiences:

  • Bit-level preservation (access deferred)
  • Emphasis on access and users
  • Emphasis on archival care of digital objects
  • Emphasis on legal compliance
  • Emphasis on income generation

Event Highlights

In fact the whole day was themed on Digital Preservation issues. John McDonough, the Director of the National Archives of Ireland, gave encouraging reports of how they are managing electronics records by “striding up the slope of enlightenment”. There’s an expectation that public services in Ireland must be “digital by default”, with an emphasis on continual online access to archival content in digital form. John is clear that archives in Ireland “underpin citizen’s rights” and are crucial to the “development of Nation and statehood”, which fits the picture I have of Dublin’s culture – it’s a city with a very clear sense of its own identity, and history.

In terms of change management and advocacy for working digitally, Joanne Rothwell has single-handedly transformed the records management of Waterford City and County Council, using SharePoint. Her resourceful use of an alphanumeric File Index allows machine-readable links between paper records and born-digital content, thus preserving continuity of materials. She also uses SharePoint’s site-creation facility to build a virtual space for holding “non-current” records, which replicate existing file structures. It’s splendid to see sound records management practice carry across into the digital realm so successfully.

DPTP alumnus from the class of November 2011, Hugh Campbell of the Public Record Office of Northern Ireland, has developed a robust and effective workflow for the transfer, characterisation and preservation of digital content. It’s not only a model of good practice, but he’s done it all in-house with his own team, using open source tools and developer skills.

During the breaks I managed to mingle and met many other professionals in Ireland who have responded well to digital challenges. I was especially impressed by Liz Robinson, the Records Officer for the Health and Safety Authority in Ireland. We agreed that any system implementation should only proceed after a thorough planning period, where the organisation establishes its own workflows and procedures, and does proper requirements gathering. This ought to be a firm foundation in advance of purchasing and implementing a system. Sadly, we’ve both seen projects where the system drove the practice, rather than the other way around.

Plan, plan and plan again before you speak to a vendor; this was the underlying message to my ‘How to develop a digital preservation strategy’ talk, so it was nice to be singled out in one Tweet as a “particular highlight” of the day.