Professional Documents
Culture Documents
Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat. (Sun Tzu) There are a number of basic preservation strategies upon which one can build more complex strategies. These are the ones which are described explicitly or implicitly by OAIS, based around ensuring that the digital object will be usable and understandable to the Designated Community. Of course one also has to maintain the trail of information to support evidence of authenticity and other PDI. Many publications on digital preservation say that the available strategies may be summed up in the phrase emulate or migrate. We show here that this is inadequate. OAIS discusses some important aspects of information preservation as follows. The fast-changing nature of the computer industry and the ephemeral nature of electronic data storage media are at odds with the key purpose of an OAIS: to preserve information over a long period of time. No matter how well an OAIS maintains its current holdings, it will eventually need to migrate much of its holdings to different media (which may or may not involve changing the bit sequences) and/or to a different hardware or software environment to keep them accessible. Todays digital data storage media can typically be kept at most a few decades before the probability of irreversible loss of data becomes too high to ignore. Further, the rapid pace of technology evolution makes many systems much less cost-effective after only a few years. In addition to the technology changes there will be changes to the Knowledge Base of the Designated Community which will affect the Representation Information needed. There are a number of fundamental approaches to information preservation. In the rst the Content Data Object remains in its original form, and access and use is achieved by providing adequate descriptions of the digital encoding with Structure and Semantic Representation Information; in some cases the original access and use mechanisms are adequate, in which case software emulation (using Other Representation Information) may be useful, although this tends to limit the ways
D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_12, C Springer-Verlag Berlin Heidelberg 2011
197
198
12
in which the Content Data Object may be used. One advantage of leaving the bit sequences unchanged is that evidence of Authenticity is more easily sustained. Alternatively the object may be changed into one that can be processed with contemporary access and use mechanisms. This is referred to in OAIS as a Transformation, a type of Migration, which is discussed below. There are implications for Authenticity which are discussed in Chap. 13, particularly Sect. 13.6.2. The following matrix shows the various combinations of these alternatives.
Content data object unchanged Access service unchanged If using the original software executable: emulation If using the original source code: rebuild executable Implement new access services based on the representation information describing original content data object Content data object changed Re-implement access service
Implement new access services based on the representation describing the new content data object
12.2
Maintaining Access
199
200
12
tested to attempt an emulation of that application. However, if the consumer interface is primarily one of display or other devices which affect human senses (e.g., sound), this reverse engineering becomes nearly impossible, because it may not be obvious when the application runs but does not function correctly for all possible inputs. To guarantee the discovery of all such situations, it would be necessary to record the Access Softwares correctly functioning output, and preserve this alongside the emulation. The behaviour would need to be checked with the results obtained after from the emulation. This may be quite difcult if the application has many different modes of operation. Further, if the applications output is primarily sent to a display device, recording this stream does not guarantee that the display looks the same in the new environment and therefore the combination of application and environment may no longer be giving completely correct information to the Consumer. Maintaining a consistent look and feel may require, as a starting point, capturing that look and feel with a separate recording to use as validation information. In general, it may be difcult if not impossible to formally describe the look and feel. However, a number of Transformational Information Properties may essentially dene criteria against which preservation may be tested; validation against these Information Properties would be a necessary, although not always sufcient, condition for testing the adequacy of the preservation activity.
12.3 Migration/Transformation
At some point it may be decided that maintaining the original medium or the Representation Network for a digital object is not practical for cost reasons, or does not meet requirements for some other reason. Therefore the digitally encoded information must be encoded in some other way, either the same bit sequences on new media or else changed bit sequences. It is possible to identify four primary digital Migration types. The primary types, ordered by increasing risk of information loss, are: 1. Operations which do not change the bit sequences Refreshment: A Digital Migration where a media instance, holding one or more AIPs or parts of AIPs, is replaced by a media instance of the same type by copying the bits on the medium used to hold AIPs and to manage and access the medium. As a result, the existing Archival Storage mapping infrastructure, without alteration, is able to continue to locate and access the AIP. As discussed at the start of the book many processes go on to translate from magnetic domains (for a magnetic disk) to bits. This bit copy may not be a physical copy. Replication: A Digital Migration where there is no change to the Packaging Information, the Content Information and the PDI. The bits used to convey these information objects are preserved in the transfer to the same or new
12.3
Migration/Transformation
201
media-type instance. Refreshment is also a Replication, but Replication may require changes to the Archival Storage mapping infrastructure. 2. Operations which change the bit sequences Repackaging: A Digital Migration where there is some change in the bits of the Packaging Information. Transformation: A Digital Migration where there is some change in the Content Information or PDI bits while attempting to preserve the full information content. This deserves some extended discussion, which follows.
12.3.1 Transformation
Transformation implies a change in the bit sequence of either the Content Information or the PDI. In many discussions of digital preservation the term Migration is used when in fact what is meant is specically Transformation because the aim in those discussions is to change the digital encoding of the information. Given a certain piece of information there could be many different ways of encoding it digitally. For example an image could be encoded as a TIFF le or a JPEG; a document could be held as Word or PDF; a table containing scientic data could be held as a FITS table or as a CSV (comma-separated values) le. Each of these alternatives would need it their own, different, Representation Network. However some Transformations make more sense than others. This will commonly be regarded as changing from one data format to another, but one must also think about the associated semantics. Some formats have little or no room for the semantics. Another consideration is the number and types of applications commonly associated with the various formats. For example an image could be regarded as a table where each of the cells contains a number. However it would not make good sense to encode the image as a CSV le because of the loss of semantics involved. Moreover the applications (e.g. spreadsheet programmes) normally used to deal with a CVS le do not normally display the data as one would expect an image to be displayed. With regard to the semantics, one can supplement the capabilities of a particular format with something else e.g. the CSV le could have an associated text le to supply the missing semantic information, such as the meanings of the columns, which would otherwise be missing. In this case one would need the Representation Information for (1) the CSV le (2) the text le and (3) the relationship between them. While this is possible, the more attractive option would be to choose a new format which can itself handle the required semantics, with available applications that supply the required functionality, at least as well as the original format. Therefore given a piece of digitally encoded information that one needs to preserve, the transformation which one should reasonably apply is not arbitrary.
202
12
There are deep reasons for making a careful choice and documenting that choice appropriately. This is discussed in detail in Sect. 13.6. However there are a number of useful points which should be made here. For example one can think of the ideal Transformation in which the new digital object has the same information as the original. If this is the case then it should be possible to conrm this by means of another Transformation back to the original bit sequence. If one can nd this pair of Transformations then one can dene (following the revised version of OAIS): Reversible Transformation: A Transformation in which the new representation denes a set (or a subset) of resulting entities that are equivalent to the resulting entities dened by the original representation. This means that there is a one-to-one mapping back to the original representation and its set of base entities. On the other hand if one looks at the other transformations mentioned above, for example from FITS to CSV, then one would, without additional information, e.g. the supplementary text le mentioned above, lose information and therefore not be able to make the reverse transformation. It is therefore reasonable to dene: Non-Reversible Transformation: A Transformation which cannot be guaranteed to be a Reversible Transformation. An important point to note is that the denition of non-reversible is drawn as broadly as possible. For example one does not need to have to prove there is no backward transformation, only that one cannot guarantee that such a transformation can be constructed. We will come back to these denitions in Chap. 13 where they play an important role in considerations of Authenticity.
12.4 Summary
This chapter has raced through a number of the basic preservation strategies and techniques; it should be clear that each technique has its own strengths and weaknesses, and one must be careful to recognise these. The reader must be careful not to be misled by the amount of material on emulation here; this was a useful location for this material. Other preservation techniques are discussed in much more detail throughout this book. Other chapters are devoted to descriptive Representation Information and also to Transformations. In Part II we provide examples of many of these techniques with evidence to support their efcacy when applied appropriately.