Anti-folder, pro-searching

Chris Loftus at the University of Sheffield has detected a trend among tech giants Google and Microsoft in their cloud storage provision. They would prefer to us make more use of searches to find material, rather than store it in named folders.

With MS SharePoint at least – which is more than just cloud storage, it’s a whole collaborative environment with built-in software and numerous features – my sense is that Microsoft would be happier if we moved away from using folders. One reason for this might be because these cloud-based web-accessed environments would struggle if the pathway or URL is too long; presumably the more folders you add, the more the string grows, and you make the problem worse. So there’s a practical technical reason right there; we wanted a way to work collaboratively in the cloud, but maybe some web browsers can’t cope.

However, I also think SharePoint’s owners are trying to edge us towards taking another view of our content. This is probably based on its use of metadata. SharePoint offers a rich array of tags; one instance that springs to mind is the “Create Column” feature that enables the user to build their own metadata fields for the their content (such as Department Name) and populate it with their own content. This enables the user to create a custom view of thousands of documents, with useful fields arranged in columns. The columns can be searched, filtered, sorted, rearranged.

This could be called a “paradigm shift” by those who like such jargon…it’s a way of moving towards a “faceted view” of individual documents, based on metadata selections, not unlike the faceted views offered by Institutional Repository software (which allow browsing by year, name of author, departments; see this page for instance).

Advocates of this approach would say that this faceted view is arguably more flexible and better than the views of documents afforded by the old hierarchical folder structure in Windows, which tends to flatten out access to a single point of entry, which must be followed by drilling-down into a single route and opening more sub-folders. Anecdotally, I have heard of enthusiasts who actively welcome this future – “we’ll make folders a thing of the past!”

In doing this, perhaps Microsoft are exploiting a feature which has been present in their product for some time now, even before SharePoint. I mean document properties; when one creates a Word file, some of these properties (including dates) are generated. Some of them (Title, Comments) can be added by the user, if so inclined. Some can be auto-populated, for instance a person’s name – if the institution managed to find a way to synch Outlook address book data, or the Identity Management system, with document authoring.

Few users have ever bothered much with creating or using document properties, in my experience. It’s true they aren’t really that “visible”. If you right-click on any given file, you can see some of them. Some of them are also visible if you decide to pick certain “details” from a drop down, which then turn into columns in Windows Explorer. Successive versions of Explorer have gradually tweaked that feature. In one sense, SharePoint have found a way to expose these fields, leverage the properties even more dynamically. Did I mention SharePoint is like a gigantic database?

I might want to add that success in SharePoint metadata depends on an organisation taking the trouble to do it, and configure the system accordingly. If you don’t, SharePoint probably isn’t much of an improvement over the old Windows Explorer way. If you do want to configure it that way, I would say it’s a process that should be managed by a records manager or someone who knows about naming conventions and rules for metadata entry; I seem to be saying it’s not unlike building an old-school (paper) file registry with a controlled vocabulary. How 19th-century is that? But if that path is not followed, might there not be the risk of free-spirited column-adding and naming by individual users, resulting in metadata (and views) that are only of value to themselves.

However, I would probably be in favour of anything that moves us away from the “paper metaphor”. What I mean by this is that storing Word-processed files, spreadsheets and emails in (digital) folders has encouraged us to think we can carry on working the old pre-digital way, and imagine that we are “doing the filing” by putting pieces of paper into named folders. This has led to tremendous errors in electronic records management systems, which likewise perpetuate this paper-based myth, and create the illusion that records can be managed, sentenced and disposed on a folder basis. Any digital change offers us an opportunity to rethink the way we do things, but the paper metaphor gets in the way of that. If nothing else, SharePoint allows us a way of apprehending content that is arguably “truer” to computer science.

Preserving Digital Content – Taking first steps with the AOR toolkit

I have long had an interest in promoting digital preservation, and most recently with the the relaunch of the AIDA toolkit as AOR toolkit. However in my work I meet a lot of people in a lot of organisations, for whom “preservation” – in perhaps the traditional archival sense – isn’t necessarily their sole or principle interest.

AOR Toolkit – Possible uses

To speculate on this for a while, we could instead consider (a) what kind of digital content people typically have and (b) what do they want to do with it. It’s possible that the new AOR Toolkit can help as a first step to assessing your capability to perform (b).

One milieu that we’re most familiar with is higher education, where the (a) is born-digital research data and the (b) is research data management. This may in time shade into a long-term preservation need, but not exclusively. That said, the main driver for many to manage research data in an organised way is actually the requirement of the funders for the research data to be preserved.

Not unrelated to that example, repository managers may not be primarily interested in preservation, but they certainly have a need to manage (a) born-digital publications and research papers and (b) their metadata, storage, dissemination, and use, perhaps using a repository. On the other hand, as the content held in a publications repository over time starts to increase, the repository managers may need to become more interested in selection and preservation.

Another possibility is electronic records management. The (a) is born-digital records, and the (b) could include such activities as classification, retention scheduling, meeting legislative requirements, metadata management, storage (in the short to mid-term), and security. In such scenarios, not all digital content need be kept permanently, and the outcome is not always long-term digital preservation for all content types.

AOR toolkit – Beyond the organisation

Digital librarians, managers of image libraries, in short anyone who holds digital content is probably eligible for inclusion in my admittedly very loose definition of a potential AOR Toolkit user. I would like to think the toolkit could apply, not just to organisations, but also to individual projects both large and small. All the user has to do is to set the parameters. It might even be a way into understanding your own personal capability for “personal archiving”, i.e. ensuring the longevity of your own personal digital history, identity and collections in the form of digital images, documents, and social media presence. Use the AOR toolkit to assess your own PC and hard drive, in other words.

It remains to be seen if the AOR Toolkit can match any of my wide-eyed optimistic predictions, but at least for this new iteration we have attempted to expand the scope of the toolkit, and expanded the definitions of the elements, in order to bring it a step closer towards a more comprehensive, if not actually universally applicable, assessment tool.

AOR toolkit – addressing a community need?

Results from our recent training needs survey also indicate there is a general need for assessment in the context of digital preservation. In terms of suggestions made for subjects that are not currently being taught enough, some respondents explicitly identified the following requirements which indicate how assessment would help advance their case:

  • Self-assessment and audit
  • Assessment/criteria/decision in the context of RDM
  • Quality analysis as part of preservation planning and action
  • Benchmarking in digital preservation (i.e. what to do when unable to comply with OAIS)
  • Key performance indicators for digital preservation
  • What to check over time

In the same survey, when asked about “expected benefits of training”, an even more interesting response was forthcoming. There were 32 answers which I classified under strategy and planning, many of the responses indicating the need for assessment and analysis as a first step; and likewise, 21 answers alluding to the ability to implement a preservation system, with many references to “next steps” and understanding organisational “capacity”. One response in particular is worth quoting in full:

“We have recognised and assessed the problem, decided on a strategy and are nearing the purchase of a system to cope with what we currently have, but once this is done we will need to create two projects – one to address ongoing work and one to resolve legacy work created by our stop-gap solution. I’d expect training to answer both these needs.”

All of the above is simply to reiterate what I said in March: “I hope to make the new AOR toolkit into something applicable to a wider range of digital content scenarios and services.”