File formats are a critical feature in digital preservation. This is hardly news to specialists, but in their objectives the repository managers of our exemplars also expressed an interest in file formats, so I was interested to discover what the recent report on file formats from the Digital Preservation Coalition (DPC) might offer them.
 Malcolm Todd (The National Archives), File Formats for Preservation, DPC Technology Watch Report Series, Report 09-02, 2 December 2009
 Malcolm Todd (The National Archives), File Formats for Preservation, DPC Technology Watch Report Series, Report 09-02, 2 December 2009
This is a well researched, wide-ranging and a deeply considered critical review of recent work on file formats. The report is intended ‘to assist repository managers and the preservation community’. My impression is it may work better for the latter than the former. ‘Repositories’ is used as a generic term; institutional repositories are mentioned by reference only.
In KeepIt our approach aims to be practical, joining up a series of tools for the workflow of file format management. This is based on an approach elaborated as active preservation at the National Archives, but not directly mentioned in the report. “The development and use of tools developed within the digital preservation community has a mostly separate literature from that of defining and implementing selection criteria.” (section 5)
The report’s summary says “At the time of writing, there is apparent consensus on five main criteria for file format selection.” It goes on to list the criteria, but work here has already progressed beyond this. The P2 registry that this project described at iPres 2009 links hundreds of criteria for different formats.
The report continues: “The main finding of this report is to support the proposal by Rog and van Wijk of the National Library of the Netherlands (2008) that such criteria should be used as a tool to work out the detailed implementation of a clear preservation strategy according to a prioritisation appropriate to the repository. This is essential to make sense of an otherwise bewildering array of considerations and provides key governance to ensure a preservation institution is managing the risk of obsolescence to its holdings.”
In other words, there has to be a way of connecting the format information with the repository requirements. This is being done in KeepIt by integrating the P2 registry with a planning tool, Plato, developed by Andreas Rauber and colleagues at Vienna University of Technology for the Planets project. It’s this joined-up approach that has been conspicuously lacking for preservation file format management workflow so far – “Some of the current literature appears to minimise this (interdependence)”. Although a work-in-progress this potentially ground-breaking approach will be the focal point of our ongoing KeepIt course on digital preservation tools (see module 4).
“Integrating the ability of formats to represent information content into scoring criteria seems some way off except for very simple digital objects”, but not as far off as it may seem (see slides 23, 24).
My initial thought on the report was that it will be useful to the extent that the work reviewed might be considered useful, but on detailed reading I have reappraised that view. The report’s effect is to progress the work it is reporting upon, by bringing new insights and identifying little-noted connections explicitly, although it also has to be said that this is probably the most complex aspect of the investigation, but worth the effort.
Effectively it’s describing the foundations for work that has already moved forward significantly: “this topic has progressed rapidly in the last decade. This research has improved considerably our understanding of effective format management strategies – even if the proliferation of initiatives and tools seems at first to render it less accessible.” I think it is fair to say this work is even further advanced than this report recognised.
