Scan Once for All Purposes – some cautionary tales

The acronym SOAP – Scan Once For All Purposes – has evolved over time among digitisation projects, and it’s a handy way to remember a simple rule of thumb: don’t scan content until you have a clear understanding of all the intended uses that will be made of the resource. This may seem simple, but in some projects it may have been overlooked in the rush to push digitised content out.

One reason for the SOAP rule is because we need to recognise that digitisation is expensive. It costs money, staff time, expertise and expensive hardware to turn analogue content into digital content.

Taking books or archive boxes off shelves, scanning them, and reshelving them all takes time. Scanning paper can damage the original resource, so to minimise that risk we’d only want to do it once. In some extreme cases, scanning can even destroy a resource; there are projects which have sacrificed an entire run of a print journal to the scanner, in order to allow “disaggregation”, which is a euphemistic way of saying “we cut them up with a scalpel to create separate scanner-friendly pages”.

Beyond that, there are digital considerations and planning considerations which prove the importance of the Scan Once For All Purposes rule. To demonstrate this, let’s try and illustrate it with some imaginary but perfectly plausible scenarios for a digitisation project, and see what the consequences could be of failing to plan.

Scenario 1
An organisation decides to scan a collection of photographs of old buses from the 1930s, because they’re so popular with the public. Unfortunately, nobody told them about the differences between file formats, so the scans end up as low-resolution compressed JPEGs scanned at 72 DPI because the web manager advised that was best for sending the images over the web.

Consequence: the only real value these JPEGs have is as access copies. If we wanted to use them for commercial purposes, such as printing, 72 DPI will prove to be ineffectual. Further, if a researcher wanted to examine details of the buses, there wouldn’t be enough data in the scan for a proper examination, so chances are they would have to come in to the searchroom anyway. Result: photographs are once again subjected to more wear and tear. And weren’t we trying to protect these photographs in some way?

Scenario 2
The organisation has another go at the project – assuming they have any money left in the budget. This time they’re better informed about the value of high-resolution scans, and the right file formats for supporting that much image data in a lossless, uncompressed manner. Unfortunately, they didn’t tell the network manager they wanted to do this.

Consequence: the library soon finds their departmental “quota” of space on the server has been exceeded three times over. Because this quota system is managed automatically in line with an IT policy, the scans are now at risk of being deleted, with a notice period of 24 hours.

Scenario 3 
The organisation succeeds in securing enough server space for the high-resolution scans. After a few months running the project, it turns out the users are not satisfied with viewing low-resolution JPEGs and demand online access to full-resolution, zoomable TIFF images. The library agrees to this, and asks the IT manager to move their TIFF scans onto the web server for this purpose.

Consequence: through constant web access and web traffic, the original TIFF files are now exposed to a strong possibility of corruption. Since they’re the only copies, the organisation is now putting an important digital asset at risk. Further, the action of serving such large files over the web – particularly for this dynamic use, involving a zoom viewer – is putting a severe strain on the organisation’s bandwidth, and costing more money.

The simple solution to all of the above imaginary scenarios could be SOAP. The ideal would be for the organisation to handle the original photographs precisely once as part of this digitisation project, and not have to re-scan them because they got it wrong first time. The scanning operation should produce a single high-quality digital image, scanned at a high resolution and encoded in a dependable, robust format. We would refer to this as the “original”.

The project could then then derive further digital objects from the “original”, such as access copies stored in a lower-resolution format. However, this is not part of the scanning operation; it’s managed as part of an image manipulation stage of the project, and is totally digital. The photographs, now completely ‘SOAPed’, are already safely back in their archive box.

The digital “originals” should now go into a safe part of the digital store. They would never be used as part of the online service, and users would not get hold of them. To meet the needs of scenario 3, the project now has to plan a routine that derives further copies from the originals; but these should be encoded in a way that makes them suitable for web access, most likely using a file format with a scalable bitstream that allows the zoom tool to work.

All of the above SOAP operations depend on the project manager having a good dialogue with the network server manager and the web manager too; a trait which such projects share with long-term digital preservation. As can be seen, a little bit of planning will economise the project and get desired results without having to perform a scan twice over.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.