You are on page 1of 6

The Emerald Research Register for this journal is available at www.emeraldinsight.

com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0024-2535.htm

LR 54,9

DIGITAL DIRECTIONS

Strategies for managing digital content formats


Andrew Williamson
Researcher, Centre for Digital Library Research, University of Strathclyde, Glasgow, UK
Abstract
Purpose With heavy ongoing investment in the creation, storage and delivery of electronic content, it is important to consider the long-term preservation of the resources produced. Design/methodology/approach A viewpoint paper based on extensive practitioner experience with the management of digitisation, digital preservation, and quality assurance procedures. Findings The choice of file and media formats for the content can have a significant effect on longterm access to electronic content. Practical implications Gives some useful insights on some of the issues surrounding the choice of open or proprietary formats. The paper also examines some of the pitfalls of a proprietary approach and suggests some strategies that might be employed for managing digital content formats in the long-term. Originality/value An attempt to provide clear, experience based strategies on how best to engage in the long-term management of digital content formats. Keywords Digital storage, Information systems, Digital libraries Paper type Viewpoint

508
Received 25 May 2005 Reviewed 31 May 2005 Revised 1 June 2005 Accepted 2 June 2005

Library Review Vol. 54 No. 9, 2005 pp. 508-513 # Emerald Group Publishing Limited 0024-2535 DOI 10.1108/00242530510629515

Introduction: a standards based approach to digital preservation Digital content has become an increasingly important element in many library collections over recent years. At institutional, regional, national and international levels, large sums of money are being invested in the creation of such content, and the means of storage and delivery to users. Google made headlines in 2004, pledging to spend between $150 million and $200 million over a decade on digitising some 15 million books from library collections in the USA and the UK (Riding, 2005). Also in the UK, the NOF-Digi programme has invested 50 million across 150 projects to produce and publish online material that supports lifelong learning for all (Nicholson and Macgregor, 2003). Consideration must be given at an early stage to ensuring the longevity of digital resources, in order to protect and maximise the return on the investment in content creation. One of the key components in ensuring resource longevity is the choice of file and media formats used to create, store, and deliver digital content, and the strategies that are employed to manage these in the long term. Guidance from funding bodies and advisory services now generally recommends, and in some cases mandates, a standards-based approach to the entire process, arguing that electronic content should be created, stored, maintained and disseminated using open standards whenever possible. An example of such guidance can be found in UKOLN (2003). The UK joint information systems committee (JISC) quality assurance focus (QA Focus, 2003) identified the following as the characteristics of open standards:
. .

They are the product of an open standards-making process; Documentation of the standard is freely available;

. .

The standard can be used unrestricted by patent or licence issues; The standard is ratified by a recognised standards body, such as NISO. Resources are freed from dependence on a single application, or particular hardware platforms; Resources can be preserved and accessed over the long term.

An open standards approach brings a wide range of benefits including:


.

Strategies for managing digital content 509

One open standard that is becoming ever more important and widespread is extensible markup language (XML). Yeates (2002, p. 72) asks, Why are so many librarians, archivists and museum curators talking about XML? and then answers the question by illustrating the potential (and problems) of exporting data from legacy systems to XML in order to promote interoperability, resource discovery and, by implication, nonproprietary digital preservation. While preference should always be given to an open standards approach, it is important to realise that situations will arise where an open approach is not possible and proprietary formats will be chosen instead. These formats are owned by an organisation or group (e.g. Microsoft), may sometimes be accepted as de facto standards through sheer ubiquity, and might even be referred to as standards, but cannot be regarded as open since the owner could theoretically choose to change the format or the conditions of usage at any time. The main focus of this brief paper is on the proprietary approach; considering some of the reasons why organisations may choose a proprietary format, the problems this might cause in the future, and considering some of the strategies which may be employed to manage digital content formats both open and proprietary in the long term. Why might organisations choose proprietary formats? Organisations or individuals may choose to utilise proprietary rather than open formats for a number of reasons:
.

Delayed development of open formats. For certain content types there may be no suitable open format available at the time that the content is being created; Organisational expertise. Proprietary software and formats (e.g. Microsoft Office), may already be widely deployed within an organisation, with staff being trained and comfortable in its use; Resourcing. There may be a reluctance to move to an open standards approach due to the additional training and software costs required, particularly when ubiquitous proprietary solutions are already easily available.

What problems can the choice of proprietary formats cause? The choice of proprietary media and/or storage formats can lead to digital preservation problems in the future, arising from both the choice of digital media and the file formats encoded on that media. Media format issues When a physical media format is chosen for the storage of electronic content, consideration must be given to the possibility of that format becoming obsolete over

LR 54,9

510

time. This can particularly be a problem with new storage technologies, where a number of similar formats may be competing or coexisting in the marketplace e.g. the competition between VHS and Betamax format video recorders, or the current market for recordable DVD technology, which sees several competing standards vying for dominance (DAmbrise, 2004). There is always the possibility that one format will eventually dominate whether through technological superiority or the power of marketing thus marginalising competitors and, ultimately over time, rendering any opposing formats obsolete. Darlington et al. (2003) outline a famous example of media obsolescence; that of the BBC Domesday Project, a collection of digital content created in 1986 to mark the 900th anniversary of the original Domesday book. The content was stored using a proprietary laser disk format, the media and players for which were no longer available, thus rendering the output of the innovative project virtually inaccessible. Darlington outlines the painstaking work undertaken in 2002 and 2003 to preserve the content, noting that the work had taken place just in time while some original systems and hardware were still available and workable. It is clear that physical storage media (CDs, tapes, etc.), the associated storage hardware, and the necessary software for reading/writing the media must be considered and maintained together, as each becomes effectively useless without the others. If hardware develops faults over time it may become impossible to retrieve the content from the media and may result in damage to the media, compounding the problems. Equally, pristine hardware cannot protect against data loss due to compromised media. As some degree of physical degradation is inevitable over time, the strategies outlined later should be employed to mitigate loss. File format issues The choice of proprietary file formats adds further complexity to ensuring long-term access to electronic content. Proprietary software applications are regularly updated with new versions. While functionality may not change markedly from one version to its immediate successor, cumulative changes to a file format may become more significant in the longer term, potentially jeopardising backwards compatibility. Maintaining copies of legacy software may seem desirable, but can be fraught with problems. Just like application software, operating systems are also periodically upgraded and may, in the long term, simply cease to support legacy packages as underlying system architectures develop. For example, the release of service pack 2 for Windows XP in 2004 witnessed reports of functionality problems with over 200 applications (Leyden, 2004). Maintaining older operating systems may not be an attractive solution, particularly in an online networked environment where there exists an increased risk of new security problems emerging in unsupported legacy systems. Strategies for managing digital formats As outlined above, the choice of media and file formats for the storage of electronic content could cause serious problems for the long-term accessibility of the materials, particularly where a proprietary format has been used. Whatever the choice of approach, strategies must be put in place to manage digital formats over the long term, in order to mitigate (or avoid altogether) the problems outlined earlier. It is not within the remit of this short paper to explore the intricacies of each strategy; references and further reading lists are provided for this purpose. Rather, the strategies outlined

below serve to raise awareness among readers as to the options available to those practitioners engaging in the long-term management of digital resources. These strategies can be grouped under six headings. Although most of the strategic elements within each are interlinked, few will be successful in the long term if pursued in isolation. It is also worth noting that while some of these elements are more applicable to the proprietary approach, they are generally valid across all electronic content, regardless of format. Each of these strategic components might also be problematic within organisations particularly those in project-funded environments, where staffing and other technical resources may not be readily available beyond the funded lifespan of a project. Strategy 1: documentation It is with some irony that the preservation of digital resources begins with ensuring the preservation of staff knowledge and sound knowledge management practices. Quality documentation is a key component of any preservation strategy and it is important that information about the technical decisions taken at each stage of the creation, storage and maintenance process is available in the long term, possibly after those staff that had direct knowledge and experience of the process have moved on. Strategy 2: migration Migration involves ensuring that all electronic content is held in a format which is useable and accessible by current software and hardware; keeping content up to date with the latest developments and guarding against format obsolescence. Where content is stored using a proprietary format, it is particularly desirable to migrate to a suitable open standard format, as and when one becomes available. Migration is potentially time-consuming, complex and expensive, and could represent a significant drain on organisational resources in the long term, particularly as the need to migrate may depend on the progress of a volatile technology industry. Moreover, migration can potentially inhibit any functionality inherent in the original. Such costs must be balanced against the initial investment in content creation and the value of long-term access to the content. Strategy 3: refreshment Refreshment is the periodic transfer of electronic content to newer storage media (e.g. CD/DVD/DAT tape). This helps to guard against data loss due to media degradation. The timing of refreshment cycles should be informed by manufacturers information on, and practitioners experience of, the typical lifespan of their physical media. It is advisable to check a random sample of used storage media on a regular basis at least annually to ensure that the physical media remain accessible and the contents remain intact. If problems emerge within the sample, then urgent refreshment action should be taken A prudent strategy would be to ensure that content is on at least two types of digital media and in different physical locations. Strategy 4: emulation In the event of system or media obsolescence, organisations may choose to create or use emulation software, to mimic the behaviour of obsolete hardware and operating systems, and enable use of legacy software. The emergence of a significant market in legacy emulators would seem a real possibility as and when access problems begin to be widespread.

Strategies for managing digital content 511

LR 54,9

512

However, it should be recognised that although avoiding the repeated costs associated with migration, the widespread deployment of specialist emulation software in libraries remains tentative, with further research and wider practical experimentation sorely required. New emulators can be costly unless there is scope to reap economies of scale and such software often has to be created in parallel with significant computer paradigm shifts. Indeed, as Jones and Beagrie note (2002), such realities can quite feasibly surpass the costs incurred by assuming a repeated migration strategy. Strategy 5: controlled storage To mitigate against the degradation of storage media and access devices, these should be stored and operated in suitable environmental conditions, ideally within the environmental tolerances specified by manufacturers. Storage media should be handled as infrequently as possible, with minimal movement that involves exposing the media to significantly different environmental conditions. Backup media should ideally be stored offsite, as a precaution against disasters that may damage onsite resources. Strategy 6: backup/recovery procedures Digital content is inherently vulnerable to loss or damage from hardware or software faults. Resources must therefore be allocated to the backup and recovery requirements of an organisation. Initial backups should be created at the time a resource is created, with a regular routine implemented so that further backups are created during the lifetime of the resource. The recovery phase must also be considered. Procedures for data recovery should be tested periodically to ensure that data can be restored from backup media, and that the media remains compatible with changes in backup technology. Concluding thoughts Whether as a result of organisational expertise issues, resource issues, or because a suitable open standard has yet to be developed, organisations will often be compelled to use, and will occasionally choose, a proprietary format for their digital resources. This brief paper has outlined the rationale behind such behaviour and has aimed to highlight some of the problems proprietary formats can cause for digital resource management and suitable strategies for managing digital formats both open and proprietary in the long-term. The huge sums being invested in the creation of electronic content have the potential to create a golden digital heritage for future generations. For this potential to be realised however, attention must be given at all stages of the content creation, storage and delivery process to the digital content formats being employed, and steps must be taken to actively manage content formats over time, to guard against the dangers of creeping technical obsolescence or long-term degradation of resources.
References DAmbrise, R. (2004), DVD update: from double layers to blue lasers, Computer Technology Review, Vol. 24 No. 5, pp. 30-2. Darlington, J., Finney, A. and Pearce, A. (2003), Domesday Redux: the rescue of the BBC domesday project videodisc, Ariadne, No. 36, available at: www.ariadne.ac.uk/issue36/ tna/ (accessed 2 June 2005).

Jones, M. and Beagrie, N. (2002), Preservation Management of Digital Materials A Handbook, British Library, London, available at: www.dpconline.org/graphics/handbook/ (accessed 2 June 2005). Leyden, J. (2004), 200 apps clash with XP SP2, The Register, 17 August 2004, available at: www.theregister.co.uk/2004/08/17/xp_sp2_glitches/ (accessed 2 June 2005). Nicholson, D. and Macgregor, G. (2003), NOF-Digi: putting UK culture online, OCLC Systems and Services, Vol. 19 No. 3, pp. 96-9. QA Focus (2003), What are Open Standards?, UKOLN, University of Bath, available at: www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-11/html/ (accessed 2 June 2005). Riding, A. (2005), France detects a cultural threat in Google, The New York Times, 11 April. UKOLN (2003), Technical Guidelines for Digital Content Creation Programmes, Working Draft Version 0.05, UKOLN, University of Bath, available at: www.minervaeurope.org/ structure/workinggroups/servprov/documents/techguid005draft.pdf (accessed 2 June 2005). Yeates, R. (2002), An XML infrastructure for archives, libraries and museums: resource discovery in the COVAX project, Program: Electronic Library and Information Systems, Vol. 36 No. 2, pp. 72-88. Further reading Lin, L.S., Ramaiah, C.K. and Wal, P.K. (2003), Problems in the preservation of electronic records, Library Review, Vol. 52 No. 3, pp. 117-25. New Opportunities Fund (2004), NOF-Digitise Programme Manual: Digital Preservation, NOF-Digitise Technical Advisory Service, University of Bath, available at: www.ukoln.ac.uk/ nof/support/manual/digital-preservation/ (accessed 2 June 2005). Semple, N. (2004), Developing a digital preservation strategy at Edinburgh University Library, VINE: The Journal of Information and Knowledge Management Systems, Vol. 34 No. 1, pp. 33-7.

Strategies for managing digital content 513

You might also like