Repositories Are Doin’ It For Themselves (with due deference to Annie Lennox)
“Up to a point, Lord Copper” (with due deference to Evelyn Waugh)
 When faced with the challenge of doing digital preservation, what will institutional repository managers choose to do? The KeepIt project aimed to find out, and now we have some answers.
When faced with the challenge of doing digital preservation, what will institutional repository managers choose to do? The KeepIt project aimed to find out, and now we have some answers.
Digital preservation is not a straightforward choice. The obvious answer, often provided by those without such responsibility, is to preserve everything forever. The practicality is different.
Before they begin, some repository managers in higher education institutions are faced with other contradictory voices, arguing that it is too soon for preservation to be a priority. That can be attributed to their current stage in the anticipated lifecycle of digital repositories, that is, at an early stage: many institutional repositories are less than 5 years old. For many, growing content volumes is a bigger concern.
Of more direct impact for repositories are economic constraints: limited resources, staffing and access to specialist expertise. The primary drivers of resources for digital preservation are national heritage and legislation, neither of which have much bearing on institutional repositories, yet.
Faced with these views, what would repositories actually choose for themselves? How can we tie these apparently disparate issues together? Working with four selected exemplar repositories in the UK for 18 months, KeepIt set out to support and inform, but not prescribe. We would develop some repository preservation tools – based on EPrints repository software, since the exemplars were all using this repository software – and run a structured course based on practical preservation tools. The exemplars themselves helped to design and shape the course.
“The lens of digital preservation can provide the vision for shaping the whole repository context.”
We surveyed each exemplar initially to identify where preservation may have a role to play. We then asked each exemplar to specify their preservation objectives.
What we found by the end of the project was different repositories make different preservation choices, and that they are entirely pragmatic in these choices, not seeking to over-extend or over-emphasise preservation among their active plans, but to phase different tools and approaches into their strategies. There is no single approach.
You can discover what each exemplar chose, in their own voices:
- NECTAR, a research repository – NECTAR and the Data Asset Framework; Reflections on KeepIt
- UAL, an arts repository – Final thoughts
- EdShare, an educational repository – File type analysis for education repositories; Lessons from KeepIt
- eCrystals, a science data repository – Preserving data, a costs-based analysis; Lessons for a science repository
So that we might extend these results beyond these specific exemplars to other repositories, it is helpful to relate these choices to factors that might characterise the exemplars.
Institutional context
First, what each of these repositories have in common is they are, or have aspirations to become, institutional in scope or in terms of becoming part of the institutional infrastructure for disseminating the digital outputs of the academic enterprise. In this respect they have to understand the institutional context in which they operate, which resolves to scope, policy, resources, and ultimately risk, as well as the technical infrastructure. In other words, this is nothing short of how the manager of a repository fulfils the objectives designated for the repository by both the senior management of the institution and the producers of the academic content they choose to target, that is, taking both a top-down and bottom-up view of the institution.
Tools such as DAF and AIDA show us this context is wider than we might have expected. As well as auditing the scope of academic outputs currently being produced within the institution, we have to anticipate the changing profiles of these outputs, that is, both the nature of that content and its impact and its usage, as well as the volumes of the different types of content within the profile. One of the curious and less appreciated effects of digital preservation is that it tends to be applied to current content, but it also involves anticipation of future content.
When it comes to the real institutional context we find this is rarely as well defined as we would like or expect. This means that repositories in some cases have sought to become the drivers for e.g. policy. Where AIDA helps is more subtle, allowing the repository to identify institutional context that is either explicit or latent.
“Our exemplars show there is a ready buy-in for preservation tools, providing these are set within the repository.”
Limited resources, most obviously financial but also staffing and access to specialist preservation expertise, will define the extent of what can be done. There is no point raising expectations if resources don’t permit. Translating expectations into quantified resources therefore matters, and lifecycle costing and benefits analysis tools, such LIFE and KRDS, are needed.
The final part of the institutional context is that once users buy into the repository, by providing content in response to various different prompts such as open access advocacy or policy, they will begin to look for trust and reassurance that their content is well looked after. A similar view will permeate the institutional management who invest in the repository, either tangibly or in terms of goodwill, but because they are managers rather than content producers their perspective will centre on risk and how that risk could affect the institution, that is, what could go wrong? This is where a tool like DRAMBORA can help by highlighting the risks that can impact the repository.
With this in mind we can now turn the question around, from which preservation tools did the exemplars choose, to what do the tools chosen tell us about the repositories?
First, a repository that applies DAF is likely to be seeking to expand or confirm its scope. It may be a relatively new repository, or one that is responding to new management prompts or is seeking to influence or inform future management decisions affecting the repository.
A repository that is concerned with tools to assess the impact of costs on preservation probably has quite a large or rapidly growing volume of content and/or an uncertain or cyclical financial income.
A repository that chooses a risk analysis tool such as DRAMBORA maybe has a substantial body of content, is confident in its targetting of content providers and the type of content it presents, and is beginning to think of this content more in terms of required or implied responsibilities. This might be in response to management concerns. As we have also seen, this approach can preempt such concern and can instead be turned to advantage. Highlighting risks can engage management in supporting the necessary actions and providing the necessary resources to minimise the impact of the identified risks.
Technical infrastructure
Technical approaches can tend to dominate pre-conceived ideas of what digital preservation involves. We have just seen how this must be balanced with the substantial element of institutional context. Nevertheless, our exemplars show there is a ready buy-in for preservation tools for storage, format identification, preservation planning and format transformation, providing these are set within the repository. That is what we did with the EPrints preservation apps and, because the exemplars all use EPrints, we were able to set them up with the facilities offered by the apps. In effect, the buy-in involved allowing us to accelerate the natural upgrade cycle for their repositories to install the latest version of EPrints (v3.2), which is required to use the apps, to fit the timescale of the project.
As a first-stage use of these apps, format profiles were produced for each exemplar. Admittedly the extent of the involvement of each exemplar in producing the profiles varied – some produced these themselves, others were produced by the KeepIt developer, Dave Tarrant. The exemplars, and many others at a series of international workshops, were introduced to the full format management workflow supported by the apps, but none has yet gone beyond identifying the format profiles. It is reasonable they should get used to producing, interpreting, updating and acting on the profiles before using these as the basis for preservation planning.
“What we found was different repositories make different preservation choices, and that they are entirely pragmatic in these choices. There is no single approach.”
Format profiles are not just internal tools for preservation, however. They reveal what we might now recognise as distinctive fingerprints of different types of repository. They tell us much more about the repository than just file types. Among the exemplars we have an emerging category of repository – focussed on collecting teaching and learning materials, or OERs (open educational resources) – but not yet a formal consensus on how these should be organised, institutionally or nationally, by content types, etc.
A repository that uses format profiling not just to examine its own files but those of other similar repositories may be seeking to be at the forefront of defining the genre of that type of repository, or it may be reflecting uncertainty and seeking to identify a community consensus on how such repositories should be organised. This is a real application of the fingerprint characterisation offered by format profiles. Unfortunately, what was found initially was that other repositories don’t have the facility to produce such profiles, because they don’t have tools installed, so full comparison will have to wait. Without the right tools to run with your repository it’s not easy.
Conclusion
This is what our exemplars show us about how different types of repository at different stages of development might tackle preservation when provided with appropriate tools. Don’t expect a revolution, nor expect preservation to suddenly become top priority for repositories. They will engage with preservation at a pace and in a way that is appropriate for their needs and assists them to make progress with their current problems. Would we expect any different? What we can learn from these exemplars, because they represent real repositories and real cases, not prescriptions, is where and how the approaches they have chosen might be adopted by other repositories depending on the stage of development reached, scope and institutional (and perhaps national) context and technical platform (repository software).
As we can see, digital preservation is not just about assuring future access for the content a repository has already acquired. Content growth and preservation, risk and resources: these are often two sides of the same issue. The lens of digital preservation can provide the vision for shaping the whole repository context.
This is likely to be the last substantive Diary entry from the KeepIt project. Our time is already up. Thanks for reading and commenting on this blog. To all digital repositories, keep up the good work, and do all you need to ensure your content remains accessible.
 
							