You are on page 1of 6

Chapter 12

Basic Preservation Strategies

Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat. (Sun Tzu) There are a number of basic preservation strategies upon which one can build more complex strategies. These are the ones which are described explicitly or implicitly by OAIS, based around ensuring that the digital object will be usable and understandable to the Designated Community. Of course one also has to maintain the trail of information to support evidence of authenticity and other PDI. Many publications on digital preservation say that the available strategies may be summed up in the phrase emulate or migrate. We show here that this is inadequate. OAIS discusses some important aspects of information preservation as follows. The fast-changing nature of the computer industry and the ephemeral nature of electronic data storage media are at odds with the key purpose of an OAIS: to preserve information over a long period of time. No matter how well an OAIS maintains its current holdings, it will eventually need to migrate much of its holdings to different media (which may or may not involve changing the bit sequences) and/or to a different hardware or software environment to keep them accessible. Todays digital data storage media can typically be kept at most a few decades before the probability of irreversible loss of data becomes too high to ignore. Further, the rapid pace of technology evolution makes many systems much less cost-effective after only a few years. In addition to the technology changes there will be changes to the Knowledge Base of the Designated Community which will affect the Representation Information needed. There are a number of fundamental approaches to information preservation. In the rst the Content Data Object remains in its original form, and access and use is achieved by providing adequate descriptions of the digital encoding with Structure and Semantic Representation Information; in some cases the original access and use mechanisms are adequate, in which case software emulation (using Other Representation Information) may be useful, although this tends to limit the ways

D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_12, C Springer-Verlag Berlin Heidelberg 2011

197

198

12

Basic Preservation Strategies

in which the Content Data Object may be used. One advantage of leaving the bit sequences unchanged is that evidence of Authenticity is more easily sustained. Alternatively the object may be changed into one that can be processed with contemporary access and use mechanisms. This is referred to in OAIS as a Transformation, a type of Migration, which is discussed below. There are implications for Authenticity which are discussed in Chap. 13, particularly Sect. 13.6.2. The following matrix shows the various combinations of these alternatives.
Content data object unchanged Access service unchanged If using the original software executable: emulation If using the original source code: rebuild executable Implement new access services based on the representation information describing original content data object Content data object changed Re-implement access service

Access service changed

Implement new access services based on the representation describing the new content data object

12.1 Description Adding Representation Information


As should be clear from the discussion in earlier chapters it is necessary to maintain the Representation Network so that it is adequate for a member of the Designated Community to continue to understand and use the digital object. However things change over time and so the Representation Network must be altered appropriately. In order to do this the techniques extensively discussed in Chap. 8 to identify any potential gaps in the Representation Network can be used. Practical ways of doing this are described in detail in Chap. 16 and illustrated in Part II. This approach allows the greatest exibility because one has the ability to discover entirely new ways of looking at the digital objects, however whilst it can be the most rewarding, it can also be the most difcult.

12.2 Maintaining Access


An alternative to using description is to maintain the current ways of accessing the digital object, and OAIS discusses several ways of doing this. One can think of this in terms of interfaces, either programmatic or user interfaces. In addition hardware emulation can be viewed as doing essentially the same thing but this deserves the more extensive discussion given in Sect. 7.9, although another type of emulation is described below.

12.2

Maintaining Access

199

12.2.1 Access and Use Services


OAIS discusses maintaining the Dissemination API in order to continue to support applications which the Designated Community uses to access and use the digital object. This is closely related to the ideas of virtualisation discussed in Sect. 7.8. The virtualisation approach has the advantage that it facilitates the ability of the Designated Community to be able to use their favourite applications to access and use the digital object. This can be consistent with maintaining the Dissemination API by means of appropriate software wrappers. A number of options are discussed in some detail in Chap. 9.

12.2.2 Access Software Look and Feel


This option focuses on the assumption that the Designated Community wishes to maintain the original look and feel of the Content Information of a set of AIUs as presented by a specied application or set of applications. Discussion of hardware emulation, which provides the ultimate maintenance of look and feel is provided in Sect. 7.9. Conceptually, the OAIS provides (i.e. makes available/points to) a software environment that allows the Consumer to view the AIUs Content Information through the applications transformation and presentation capabilities. For example, there may be a desire to use a particular application that extracts data from an ISO 9660 CD-ROM and presents it as a multi-spectral image. This application runs under a particular operating system, requires a set of control information and use of a CDROM reading device, and presents the information to driver software for a particular display device. In some cases this application may be so pervasive that all members of the Designated Community have access to the environment and the OAIS merely designates the Content Data Object to be the bit string used by the application. Alternatively, an OAIS may supply (as Representation Information) such an environment, including the Access Software application, when the environment is less readily available. However, as the OAIS and/or the Designated Community moves to new computing environments, at some point the application will cease to function or will function incorrectly. At such a point Transformation will become an attractive option. 12.2.2.1 Emulation of Look and Feel the Hard Way It is worth discussing in a little more detail another way of maintaining look and feel when, for example the compiled version of the application or libraries it depends upon, are not available, nor is the source code. The term emulation may be applied to this technique since emulation may be dened as the ability of a computer program or electronic device to imitate another program or device [79]. The OAIS may, despite the drawbacks, consider emulation for the access application in the following way. If the application provides a well-known set of operations and a well-dened API for access, the API could be adequately documented and

200

12

Basic Preservation Strategies

tested to attempt an emulation of that application. However, if the consumer interface is primarily one of display or other devices which affect human senses (e.g., sound), this reverse engineering becomes nearly impossible, because it may not be obvious when the application runs but does not function correctly for all possible inputs. To guarantee the discovery of all such situations, it would be necessary to record the Access Softwares correctly functioning output, and preserve this alongside the emulation. The behaviour would need to be checked with the results obtained after from the emulation. This may be quite difcult if the application has many different modes of operation. Further, if the applications output is primarily sent to a display device, recording this stream does not guarantee that the display looks the same in the new environment and therefore the combination of application and environment may no longer be giving completely correct information to the Consumer. Maintaining a consistent look and feel may require, as a starting point, capturing that look and feel with a separate recording to use as validation information. In general, it may be difcult if not impossible to formally describe the look and feel. However, a number of Transformational Information Properties may essentially dene criteria against which preservation may be tested; validation against these Information Properties would be a necessary, although not always sufcient, condition for testing the adequacy of the preservation activity.

12.3 Migration/Transformation
At some point it may be decided that maintaining the original medium or the Representation Network for a digital object is not practical for cost reasons, or does not meet requirements for some other reason. Therefore the digitally encoded information must be encoded in some other way, either the same bit sequences on new media or else changed bit sequences. It is possible to identify four primary digital Migration types. The primary types, ordered by increasing risk of information loss, are: 1. Operations which do not change the bit sequences Refreshment: A Digital Migration where a media instance, holding one or more AIPs or parts of AIPs, is replaced by a media instance of the same type by copying the bits on the medium used to hold AIPs and to manage and access the medium. As a result, the existing Archival Storage mapping infrastructure, without alteration, is able to continue to locate and access the AIP. As discussed at the start of the book many processes go on to translate from magnetic domains (for a magnetic disk) to bits. This bit copy may not be a physical copy. Replication: A Digital Migration where there is no change to the Packaging Information, the Content Information and the PDI. The bits used to convey these information objects are preserved in the transfer to the same or new

12.3

Migration/Transformation

201

media-type instance. Refreshment is also a Replication, but Replication may require changes to the Archival Storage mapping infrastructure. 2. Operations which change the bit sequences Repackaging: A Digital Migration where there is some change in the bits of the Packaging Information. Transformation: A Digital Migration where there is some change in the Content Information or PDI bits while attempting to preserve the full information content. This deserves some extended discussion, which follows.

12.3.1 Transformation
Transformation implies a change in the bit sequence of either the Content Information or the PDI. In many discussions of digital preservation the term Migration is used when in fact what is meant is specically Transformation because the aim in those discussions is to change the digital encoding of the information. Given a certain piece of information there could be many different ways of encoding it digitally. For example an image could be encoded as a TIFF le or a JPEG; a document could be held as Word or PDF; a table containing scientic data could be held as a FITS table or as a CSV (comma-separated values) le. Each of these alternatives would need it their own, different, Representation Network. However some Transformations make more sense than others. This will commonly be regarded as changing from one data format to another, but one must also think about the associated semantics. Some formats have little or no room for the semantics. Another consideration is the number and types of applications commonly associated with the various formats. For example an image could be regarded as a table where each of the cells contains a number. However it would not make good sense to encode the image as a CSV le because of the loss of semantics involved. Moreover the applications (e.g. spreadsheet programmes) normally used to deal with a CVS le do not normally display the data as one would expect an image to be displayed. With regard to the semantics, one can supplement the capabilities of a particular format with something else e.g. the CSV le could have an associated text le to supply the missing semantic information, such as the meanings of the columns, which would otherwise be missing. In this case one would need the Representation Information for (1) the CSV le (2) the text le and (3) the relationship between them. While this is possible, the more attractive option would be to choose a new format which can itself handle the required semantics, with available applications that supply the required functionality, at least as well as the original format. Therefore given a piece of digitally encoded information that one needs to preserve, the transformation which one should reasonably apply is not arbitrary.

202

12

Basic Preservation Strategies

There are deep reasons for making a careful choice and documenting that choice appropriately. This is discussed in detail in Sect. 13.6. However there are a number of useful points which should be made here. For example one can think of the ideal Transformation in which the new digital object has the same information as the original. If this is the case then it should be possible to conrm this by means of another Transformation back to the original bit sequence. If one can nd this pair of Transformations then one can dene (following the revised version of OAIS): Reversible Transformation: A Transformation in which the new representation denes a set (or a subset) of resulting entities that are equivalent to the resulting entities dened by the original representation. This means that there is a one-to-one mapping back to the original representation and its set of base entities. On the other hand if one looks at the other transformations mentioned above, for example from FITS to CSV, then one would, without additional information, e.g. the supplementary text le mentioned above, lose information and therefore not be able to make the reverse transformation. It is therefore reasonable to dene: Non-Reversible Transformation: A Transformation which cannot be guaranteed to be a Reversible Transformation. An important point to note is that the denition of non-reversible is drawn as broadly as possible. For example one does not need to have to prove there is no backward transformation, only that one cannot guarantee that such a transformation can be constructed. We will come back to these denitions in Chap. 13 where they play an important role in considerations of Authenticity.

12.4 Summary
This chapter has raced through a number of the basic preservation strategies and techniques; it should be clear that each technique has its own strengths and weaknesses, and one must be careful to recognise these. The reader must be careful not to be misled by the amount of material on emulation here; this was a useful location for this material. Other preservation techniques are discussed in much more detail throughout this book. Other chapters are devoted to descriptive Representation Information and also to Transformations. In Part II we provide examples of many of these techniques with evidence to support their efcacy when applied appropriately.

You might also like