When is it a good time for a file format migration?

I used to teach a one-day course on file format migration. The course advanced the idea that migration, although one of the oldest and best-understood methods of enacting digital preservation, can still carry risks of loss. To mitigate that loss, we want to make a case for use cases and acceptance criteria – good old-fashioned planning, in short.

When would it be a good time to migrate a file? And when would it be good not to migrate, or at any rate defer the decision? We can think of some plausible scenarios, and will discuss them briefly below.

We think the community has moved on now from its earlier line of thought, which was along the lines of “migrate as soon as possible, ideally at point of ingest” – the risks of careless migrations are hopefully better understood now, and we don’t want to rush into a bad decision. That said, some digital preservation systems still have an automated migration action built into the ingest routine.

Do migrate if: 

  • You don’t trust the format of the submission. The depositor may have sent you something in an obscure, obsolete, or unsupported file format. A scenario like this is likely to involve a private depositor, or an academic who insists on working in their “special” way. Obsolescence (or the imminent threat of it) is a well-established motivator for bringing out the conversion toolkit, though there are some who would disagree.
  • Your archive/repository works to a normalisation policy. This means that you tend to limit the number of preservation formats you work with, so you convert all ingests to the standard set which you support. The policy might be to migrate all Microsoft products to their Open Office equivalent. Indeed, this rule is built into Xena, the open-source tool from National Archives of Australia. Normalisation may have a downside, but it can create economies in how many formats you need to commit to supporting, and may go some way to “taming” wild deposits that arrive in a variety of formats.
  • You want to provide access to the content immediately. This means creating an access copy of the resource, for instance by migrating a tiff image to a jpeg. Some would say this doesn’t really qualify as migration, but it does involve a re-encoding action, which is why we mention it. It might be that this access copy doesn’t have to meet the same high standards as a preservation copy.

Don’t migrate if: 

  • The format of the resource is already stored in a quality format. The deposit you are ingesting may already be encoded in a format that is widely accepted as meeting a preservation standard, in which case migration is arguably not necessary. To ascertain this and verify the format, use DROID or other identification tools. To learn about preservation standard formats, start with the Library of Congress resource Sustainability of Digital Formats.
  • There is no immediate need for you to migrate. In this scenario, you fear that the ingested content’s format may become obsolete one day, but your researches (starting with the PRONOM online registry) indicate that the risk is some way off – maybe even 10-15 years away. In which case deferring the migration is your best policy. Be sure to add a “note to self” in the form of preservation metadata about this decision, and a trigger date in your database that will remind you to take action.
  • You want to migrate, but currently lack the IT skills. To this scenario we could add “you lack the tools to do migration” or even “you lack a suitable destination format”. You’ve made a search on COPTR and still come up empty. Through no fault of your own, technology has simply not yet found a reliable way to migrate the format you wish to preserve, and a tool for migration does not exist. In this instance, don’t wait for the solution – put the content into preservation storage, with a “note to self” (see above) that action will be taken at some point when the technology, tools, skills, and formats are available.
  • You have no preservation plan. This refers to your over-arching strategy which governs your approach to doing digital preservation. Part of it is an agreed action plan for what you will do when faced with particular file formats, including a detailed workflow, choice of conversion tool, and clear rationale for why you’re doing it that way. Ideally, in compiling this action plan, you will have understood the potential losses that migration can cause to the content, and the archivist (and the organisation) have signed off on how much of a “hit” is acceptable. Without a plan like this, you’re at risk of guessing which is the best migration pathway, and your decisions end up being guided by the tools (which are limited) rather than your own preservation needs.