Speaking as a traditional archivist, I love cataloguing. I never thought I’d find myself having to justify cataloguing work, but given that it’s possible to attach a cost to everything these days, I find it is a serious consideration.
Experts who understand that specialist work has a real cost will try and tell us that detailed cataloguing might be turning into a luxury we can’t afford. This post will try and consider some of the things that make it expensive, and apply the lessons to digitised content.
I’m proposing that metadata can serve two important functions. One, to make digitised content intelligible to human beings; two, to make it possible for the computer to store, manage, and process that content.
Human-readable catalogues for your digitised collections are an absolute must. Whether it’s an archive catalogue written in ISAD(G), or a library catalogue written in MARC21, or a resource described using Dublin Core. We have standards we can work to, and increasingly we have computer-based cataloguing tools (such as Calm, Adlib, or AtoM) that facilitate the task. I would like to think of these tools as something that help to turn human-readable descriptions into metadata, i.e. something that a computer can store and process.
That’s great if we’re writing a catalogue from scratch, but that’s not always the case; sometimes the original resource has metadata attached to it, perhaps created by its owner. Except that person probably didn’t work to a standard, and so if we want to recycle that metadata, we might be faced with a “mapping” task, normalising their non-standard metadata to standard fields, which is both an intellectual exercise and an IT task, involving importing and exporting values between spreadsheets.
Records managers also might take an interest in a normalising process like that; describing business documents in a records management environment, ensuring the context and meaning of the content is accurate and useful. The difference is they might be applying that metadata in a live environment, rather than applying it after the fact. Anyone who’s about to embark on a SharePoint project will recognise this; one way of looking at the transition from your old Document Management System to SharePoint is to see it as a vast metadata modelling exercise. Given the amount of metadata which SharePoint can support – both for individual documents and for folders and creators – this is worth thinking about.
It’s not just about building an inventory of the resources, but wouldn’t we like to apply our cataloguing skills to help users on their journey by adding navigational elements to web pages, such as structured views, clickable links, and faceted views of the collections based on elements such as dates, names, and subjects? This is totally possible, if you regard all of these things as metadata, individual fields which can be stored in databases and manipulated by web technology.
All of these things take time and money, but the expense is in the cost of information specialisms and expertise, and hours of effort spent carrying out the work. Metadata also can be “computationally expensive,” though. What we mean by this is there’s a potential cost to your IT.
A large-scale digitisation project, particularly if it intends to get serious about metadata creation, sharing, and interoperability, will typically create a lot of pages and possibly store them in XML files. These XML files can have many purposes, including describing the resource, and expressing its relationship to other resources.
Creating lots of XML pages is a grand thing to do, but even so they can take up server space – especially when there are so many of them, even if individually each file has a small “footprint”. It also can be expensive to index that metadata, which requires database operations and processing power; and even serving metadata may have a cost attached to it, as it can be calculated as one more strain on your bandwidth.
The general conclusion here is certainly not to abandon cataloguing and metadata creation, but to be aware of the costs to your organisation, and consider ways of reducing the burden, finding economies of scale, and concentrating your effort on delivering a core of essential metadata for your digitised content. This of course involves knowing the collections, and knowing the users. But that would be the subject of another post!