File formats…or data streams?

On 1st December Malcolm Todd of The National Archives gave a good account of the work he’s been doing on File Formats for Preservation, resulting in a substantial new Technology Watch report for the DPC. It was a seminar hosted by William Kilbride, with participants from the BBC, the BL, NLW and others. The afternoon was useful and interesting for me since I teach an elementary module on file formats in a preservation context for our DPTP courses.

My naïve thinking in the area has been characterised by the assumption that the process is rather static or linear, and that the problem we’re facing is broadly the same every time; migrate data from a format that’s about to become obsolete or unsupported, onto another format that’s stable, supported, and open. MS Word document to PDF or PDF/A…now that, I can understand!

In fact, I learned at least two ways of thinking about formats that hadn’t occurred to me before. One simple one is costs; some formats can cost more to preserve than others. This can be calculated in terms of storage costs, multiplied over time, and the costs associated with migrations to new versions of that format. Continue reading “File formats…or data streams?”

Web Continuity Project at The National Archives

Ed and I were pleased to come across an interesting document, recently received from The National Archives, describing their Web Continuity Project. This is the latest of the many digital preservation initiatives undertaken by TNA/PRO, that began with EROS and NDAD in the mid 1990s, leading to the UK Government Web Archive and other recent […]

From the JISC-PoWR Project blog.


Ed and I were pleased to come across an interesting document, recently received from The National Archives, describing their Web Continuity Project. This is the latest of the many digital preservation initiatives undertaken by TNA/PRO, that began with EROS and NDAD in the mid 1990s, leading to the UK Government Web Archive and other recent digital preservation initiatives (many in conjunction with BL and the JISC).

The Web Continuity Project arises from a request by Jack Straw, as leader of the House of Commons in 2007, that government departments ensure continued access to online documents. Further research revealed that:

  • Government departments are increasingly citing URLs in answer to Parliamentary Questions
  • 60% of links in Hansard to UK government websites for the period 1997 to 2006 are now broken
  • Departments vary considerably: for one, every link works; for another every link is broken. (TNA’s own website is not immune!)

Continue reading “Web Continuity Project at The National Archives”

Digital preservation in a nutshell, part II

As Richard noted in Part I, digital preservation is a “series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” But what sort of digital materials might be in scope for the PoWR project?

We think it extremely likely that institutional web resources are going to include digital materials […]

Originally published on the JISC-PoWR blog.


As Richard noted in Part I, digital preservation is a “series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” But what sort of digital materials might be in scope for the PoWR project?

We think it extremely likely that institutional web resources are going to include digital materials such as “records created during the day-to-day business of an organisation” and “born-digital materials created for a specific purpose”.

What we want is to “maintain access to these digital materials beyond the limits of media failure or technological change”. This leads us to consider the longevity of certain file formats, the changes undergone by proprietary software, technological obsolescence, and the migration or emulation strategies we’ll use to overcome these problems.

By migration we mean “a means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next.” In contrast, emulation is “a means of overcoming technological obsolescence of hardware and software by developing techniques for imitating obsolete systems on future generations of computers.”

Note also that when we talk about preserving anything, “for as long as necessary” doesn’t always mean “forever”. For the purposes of the PoWR project, it may be worth us considering medium-term preservation for example, which allows “continued access to digital materials beyond changes in technology for a defined period of time, but not indefinitely.”

We also hope to consider the idea of life-cycle management. According to DPC, “The major implications for life-cycle management of digital resources is the need actively to manage the resource at each stage of its life-cycle and to recognise the inter-dependencies between each stage and commence preservation activities as early as practicable.”

From these definitions alone, it should be apparent that success in the preservation of web resources will potentially involve the participation and co-operation of a wide range of experts: information managers, asset managers, webmasters, IT specialists, system administrators, records managers, and archivists.

(All the quotations and definitions above are taken from the DPC’s online handbook.)

Digital preservation in a nutshell (Part I)

One of the goals of PoWR is to make current trends in digital preservation meaningful and relevant to information professionals with the day-to-day responsibility for looking after web resources. Anyone coming for the first time to the field of digital preservation can find it a daunting area, with very distinct terminology and concepts. Some of […]

From the JISC-PoWR Project blog.


One of the goals of PoWR is to make current trends in digital preservation meaningful and relevant to information professionals with the day-to-day responsibility for looking after web resources. Anyone coming for the first time to the field of digital preservation can find it a daunting area, with very distinct terminology and concepts. Some of these are drawn from time-honored approaches to managing things like government records or institutional archives, while others have been developed exclusively in the digital domain. It is an emerging and evolving field that can take some time to get your head round: so we thought it was a good idea to offer a series of brief primers.

Starting, naturally, with digital preservation: this is defined as a “series of managed activities necessary to ensure continued access to digital materials for as long as necessary” (Digital Preservation Coalition, 2002). Continue reading “Digital preservation in a nutshell (Part I)”