I very much enjoyed the DPC’s latest event on metadata, particularly the first half of the day which concentrated on the PREMIS preservation metadata standard. One of my interests is how I can improve my teaching when I’m training students on the Digital Preservation Training Programme on this subject. Angela Dappert’s excellent presentation and exercise, now available here, has been enormously helpful for this.
My tendency has been to introduce standards like PREMIS and METS to my eager students in a linear top-down manner, explaining data models and structure…only to find them somewhat overwhelmed by the detail and the degree of effort that seems to be required in implementing it. I sense some students get the impression that they are (a) compelled to use these standards in order to succeed at digital preservation in the first place, or (b) have to implement it in a certain way. Worst case would be if they assume they have to use all the fields they possibly can, to arrive at a “complete” profile of a digital object.
Angela’s common-sense approach is to turn this question on its head. To paraphrase JFK’s inaugural address, we should “Ask not what we must do for PREMIS, ask what PREMIS can do for us.” Angela puts the questions in this order:
- What are the entities and objects we need to describe?
- What metadata do we need to do that?
- Which standard do we use for which metadata?
- How do we implement the selected metadata schemas?
When you do it this way, it becomes clear in short order that the range of metadata you actually need to be collecting and storing turns out to be much more manageable. You choose what you want, then find a metadata standard that suits your needs. Further, your selection decisions – because they are aligned to your overall selection policy – will be driven to some extent by what your user community want, and what your repository can support. How much time can your staff actually spend extracting and parsing metadata, and are they really adding any value by doing it? Is there really an audit requirement that obliges you to demonstrate you have run a virus check three times?
The four lessons I jotted down, and will be adding to the DPTP course, are:
- Seek more information from your content creators when you need it – and don’t be afraid to ask for it!
- Ask the creators for a manifest of all the files in their submission. (When I was an archivist, I’d always insist on a transfer list…)
- Will this metadata be useful? Always ask what function you are supporting with your hard work.
- Analyse and understand your domain – what can your repository support?
At a stroke, Angela has shown how PREMIS is achievable when we start showing that verbose data dictionary just who’s the boss around here!