You are on page 1of 29

Sponsored by:

This story appeared on JavaWorld at http://www.javaworld.com/javaworld/jw-01-1999/jw-01-step.html

Use a RandomAccessFile to build a low-level database


How to extend a RandomAccessFile to support the storage and retrieval of arbitrary record data
By Derek Hamner, JavaWorld.com, 01/01/99 As I searched JavaWorld's site for ideas for this month's Step by Step, I came across only a few articles covering low-level ile access. Although high-level APIs such as JDBC give us the lexibility and power needed in large enterprise applications, many smaller applications require a more simple and elegant solution. In this article, we will build an extension to the RandomAccessFile class that allows us to store and retrieve records. This "records ile" will be equivalent to a persistent hashtable, allowing keyed objects to be stored and retrieved from ile storage.

A primer on i les and records


Before we jump headlong into the example, let's start with a basic backgrounder. We'll begin by deining some terms pertaining to iles and records, then we'll briely discuss class java.io.RandomAccessFile and platform-dependency. Terminology The following deinitions are tuned to our example, rather than to traditional database terminology.

Record -- A collection of related data stored in a ile. A record typically has multiple i elds, each being a named and typed item of information. Key -- An identiier for a record. Keys are usually unique. File -- A sequential collection of data stored in some sort of stable storage such as a hard drive. Nonsequential i le access -- Allows data to be read from arbitrary locations in the ile. File pointer -- A number holding the position of the next byte of data to be read from a ile.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Record pointer -- A record pointer is a ile pointer that points to the location where a particular record begins. Index -- A secondary means of accessing records in a ile; that is, it maps keys to record pointers. Heap -- A sequential ile of unordered and variable-sized records. A heap requires some external indexing in order to meaningfully access the records. Persistence -- Refers to storing an object or record for a certain length of time. This length of time is typically longer than the span of one process, so objects are usually persisted in iles or databases.

Overview of class java.io.RandomAccessFile Class RandomAccessFile is Java's way of providing nonsequential access to iles. The class allows us to jump to a certain location in the ile by using the seek() method. Once the ile pointer has been positioned, data can be read from and written to the ile using the DataInput and DataOutput interfaces. These interfaces allow us to read and write data in a platform-independent manner. Other handy methods in RandomAccessFile allow us to check and set the length of the ile. Platform-dependent considerations Modern databases rely on disk drives for storage. Data on a disk drive is stored in blocks, which are distributed across tracks and surfaces. The disk's seek time and rotational delay dictate how data can be most eficiently stored and retrieved. A typical database management system relies closely on the disk's attributes in order to streamline performance. Unfortunately (or fortunately, depending on your interest in low-level ile I/O!), these parameters lie far from reach when using a high-level ile API such as java.io. Given this fact, our example will disregard the optimizations that knowledge of the disk's parameters could provide.

Designing the RecordsFile example


Now we're ready to design our example. To start, I'll lay out some design requirements and goals, resolve issues of concurrent access, and specify the low-level ile format. Before proceeding to the implementation, we'll also look at the main record operations and their corresponding algorithms. Requirements and goals Our main goal in this example is to use a RandomAccessFile to provide a way of storing and retrieving record data. We'll associate a key of type String with each record as a means of uniquely identifying it. The keys will be limited to a maximum length, although the record data will not be limited. For the purposes of this example, our records will consist of only one ield -- a "blob" of binary data. The ile code will not attempt to interpret the record data in any way. As a second design goal, we'll require that the number of records our ile supports not be ixed at creation time. We will allow the ile to grow and shrink as records are inserted and removed. Because our index and record data will be stored in the same ile, this restriction will cause us to add extra logic to dynamically increase the index space by reorganizing records.

PDF created with pdfFactory Pro trial version www.pdffactory.com

Accessing data in a ile is orders of magnitude slower than accessing data in memory. This means that the number of ile accesses the database performs will be the determining performance factor. We'll require that our main database operations not depend on the number of records in the ile. In other words, they'll be of constant order time with respect to ile accesses. As a inal requirement, we'll assume our index is small enough to load into memory. This will make it easier for our implementation to fulill the requirement that dictates access time. We'll mirror the index in a Hashtable, which provides immediate record header lookups. Code Correction The code for this article has a bug that causes it to throw a NullPointerException in many possible cases. There is a routine named insureIndexSpace(int) in the abstract class BaseRecordsFile. The code is intended to move existing records to the end of the ile if the index area needs to expand. After the "irst" record's capacity is reset to its actual size, it is moved to the end. The dataStartPtr is then set to point to the second record in the ile. Unfortunately, if there was free space in the irst record, the new dataStartPtr will not point to a valid record, since it was incremented by the irst record's length rather than its capacity. The modiied Java source for BaseRecordsFile can be found in Resources. from Ron Walkup Senior Software Engineer bioMerieux, Inc.

Synchronization and concurrent i le access For simplicity, we start by supporting only a single-thread model, in which ile requests are prohibited from happening concurrently. We can accomplish this by synchronizing the public access methods of the BaseRecordsFile and RecordsFile classes. Note that you can relax this restriction to add support for concurrent reads and writes on nonconlicting records: You'll need to maintain a list of locked records and interleave reads and writes for concurrent requests. Details of the i le format We will now explicitly deine the format of the records ile. The ile consists of three regions, each with its own format.

PDF created with pdfFactory Pro trial version www.pdffactory.com

The records i le format The i le headers region. This irst region holds the two essential headers needed to access records in our ile. The irst header, called the data start pointer, is a long that points to the start of the record data. This value tells us the size of the index region. The second header, called the num records header, is an int that gives the number of records in the database. The headers region starts on the irst byte of the ile and extends for FILE_HEADERS_REGION_LENGTH bytes. We'll use readLong() and readInt() to read the headers, and writeLong() and writeInt() to write the headers. The index region. Each entry in the index consists of a key and a record header. The index starts on the irst byte after the ile headers region and extends until the byte before the data start pointer. From this information, we can calculate a ile pointer to the start of any of the n entries in the index. Entries have a ixed length -- the key data starts on the irst byte in the index entry and extends MAX_KEY_LENGTH bytes. The corresponding record header for a given key follows immediately after the key in the index. The record header tells us where the data is located, how many bytes the record can hold, and how many bytes it is actually holding. Index entries in the ile index are in no particular order and do not map to the order in which the records are stored in the ile. Record data region. The record data region starts on the location indicated by the data start pointer and extends to the end of the ile. Records are positioned back-to-back in the ile with no free space permitted between records. This part of the ile consists of raw data with no header or key information. The database ile ends on the last block of the last record in the ile, so there is no extra space at the end of the ile. The ile grows and shrinks as records are added and deleted. The size allocated to a record does not always correspond to the actual amount of data the record contains. The record can be thought of as a container -- it may be only partially full. Valid record data is positioned at the start of the record. Supported operations and their algorithms The RecordsFile will support the following main operations:

PDF created with pdfFactory Pro trial version www.pdffactory.com

Insert -- Adds a new record to the ile Read -- Reads a record from the ile Update -- Updates a record Delete -- Deletes a record Ensure capacity -- Grows the index region to accommodate new records

Before we step through the source code, let's go over the chosen algorithms for each of these operations:

Insert. This operation inserts a new record into the ile. To insert, we: 1. 2. 3. 4. 5. Make sure the key being inserted isn't already contained in the ile Ensure the index region is large enough for the additional entry Find free space in the ile large enough to hold the record Write the record data to the ile Add the record header to the index

Read. This operation retrieves a requested record from the ile based on a key. To retrieve a record, we: 1. Use the index to map the given key to the record header 2. Seek down to the start of the data (using the pointer to the record data stored in the header) 3. Read the record's data from the ile

Update. This operation updates an existing record with new data, replacing the new data with the old. The steps for our update vary, depending on the size of the new record data. If the new data its into the existing record, we: 1. Write the record data to the ile, overwriting the previous data 2. Update the attribute that holds the length of the data in the record's header

Otherwise, if the data is too big for the record, we: 1. Perform a delete operation on the existing record 2. Perform an insert of the new data

PDF created with pdfFactory Pro trial version www.pdffactory.com

Delete. This operation removes a record from the ile. To delete a record, we: 1. Reclaim the space allocated to the record being removed by either shrinking the ile, if the record is the last one in the ile, or by adding its space to an adjacent record 2. Remove the record's header from the index by replacing the entry being deleted with the last entry in the index; this ensures the index is always full, with no empty spaces between entries Ensure capacity. This operation makes sure the index region is large enough to accommodate additional entries. In a loop, we move records from the front to the end of the ile until there is suficient space. To move one record we: 1. Locate the record header of the irst record in the ile; note that this is the record with data at the top of the record data region -- not the record with the irst header in the index 2. Read the target record's data 3. Grow the ile by the size of the target record's data by using the setLength(long) method in RandomAccessFile 4. Write the record data to the bottom of the ile 5. Update the data pointer in the record that was moved 6. Update the global header that points to the irst record's data

Implementation details -- stepping though the source code


We're now ready to get our hands dirty and work through the code for the example. You can download the complete source from Resources. Note: You must use the Java 2 platform (formerly known as JDK 1.2) to compile the source. Class BaseRecordsFile BaseRecordsFile is an abstract class and is the main implementation of our example. It deines the main access methods as well as a slew of utility methods for manipulating records and index entries.
package hamner.db; import java.io.*;

PDF created with pdfFactory Pro trial version www.pdffactory.com

import java.util.*;

public abstract class BaseRecordsFile { // The database file. private RandomAccessFile file; // Current file pointer to the start of the record data. protected long dataStartPtr; // Total length in bytes of the global database headers. protected static final int FILE_HEADERS_REGION_LENGTH = 16; // Number of bytes in the record header. protected static final int RECORD_HEADER_LENGTH = 16; // The length of a key in the index. protected static final int MAX_KEY_LENGTH = 64; // The total length of one index entry - the key length plus the record header length. protected static final int INDEX_ENTRY_LENGTH = MAX_KEY_LENGTH + RECORD_HEADER_LENGTH; // File pointer to the num records header. protected static final long NUM_RECORDS_HEADER_LOCATION = 0; // File pointer to the data start pointer header. protected static final long DATA_START_HEADER_LOCATION = 4; /** * Creates a new database file, initializing the appropriate headers. Enough space is allocated in * the index for the specified initial size. */ protected BaseRecordsFile(String dbPath, int initialSize) throws IOException, RecordsFileException { File f = new File(dbPath); if (f.exists()) { throw new RecordsFileException("Database already exits: " + dbPath); } file = new RandomAccessFile(f, "rw"); dataStartPtr = indexPositionToKeyFp(initialSize); // Record Data Region starts were the setFileLength(dataStartPtr); // (i+1)th index entry would start. writeNumRecordsHeader(0); writeDataStartPtrHeader(dataStartPtr); } /** * Opens an existing database file and initializes the dataStartPtr. The accessFlags * parameter can be "r" or "rw" -- as defined in RandomAccessFile. */ protected BaseRecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException { File f = new File (dbPath); if(!f.exists()) { throw new RecordsFileException("Database not found: " + dbPath); } file = new RandomAccessFile(f, accessFlags); dataStartPtr = readDataStartHeader(); } /** * Returns an Enumeration of the keys of all records in the database. */ public abstract Enumeration enumerateKeys(); /** * Returns the number or records in the database. */ public abstract int getNumRecords(); /** * Checks there is a record with the given key. */ public abstract boolean recordExists(String key); /** * Maps a key to a record header. */ protected abstract RecordHeader keyToRecordHeader(String key) throws RecordsFileException;

PDF created with pdfFactory Pro trial version www.pdffactory.com

/** * Locates space for a new record of dataLength size and initializes a RecordHeader. */ protected abstract RecordHeader allocateRecord(String key, int dataLength) throws RecordsFileException, /** * Returns the record to which the target file pointer belongs - meaning the specified location * in the file is part of the record data of the RecordHeader which is returned. Returns null if * the location is not part of a record. (O(n) mem accesses) */ protected abstract RecordHeader getRecordAt(long targetFp) throws RecordsFileException; protected long getFileLength() throws IOException { return file.length(); } protected void setFileLength(long l) throws IOException { file.setLength(l); } /** * Reads the number of records header from the file. */ protected int readNumRecordsHeader() throws IOException { file.seek(NUM_RECORDS_HEADER_LOCATION); return file.readInt(); } /** * Writes the number of records header to the file. */ protected void writeNumRecordsHeader(int numRecords) throws IOException { file.seek(NUM_RECORDS_HEADER_LOCATION); file.writeInt(numRecords); } /** * Reads the data start pointer header from the file. */ protected long readDataStartHeader() throws IOException { file.seek(DATA_START_HEADER_LOCATION); return file.readLong(); } /** * Writes the data start pointer header to the file. */ protected void writeDataStartPtrHeader(long dataStartPtr) throws IOException { file.seek(DATA_START_HEADER_LOCATION); file.writeLong(dataStartPtr); } /** * Returns a file pointer in the index pointing to the first byte * in the key located at the given index position. */ protected long indexPositionToKeyFp(int pos) { return FILE_HEADERS_REGION_LENGTH + (INDEX_ENTRY_LENGTH * pos); } /** * Returns a file pointer in the index pointing to the first byte * in the record pointer located at the given index position. */ long indexPositionToRecordHeaderFp(int pos) { return indexPositionToKeyFp(pos) + MAX_KEY_LENGTH; } /** * Reads the ith key from the index. */ String readKeyFromIndex(int position) throws IOException {

PDF created with pdfFactory Pro trial version www.pdffactory.com

} /** * Reads the ith record header from the index. */ RecordHeader readRecordHeaderFromIndex(int position) throws IOException { file.seek(indexPositionToRecordHeaderFp(position)); return RecordHeader.readHeader(file); } /** * Writes the ith record header to the index. */ protected void writeRecordHeaderToIndex(RecordHeader header) throws IOException { file.seek(indexPositionToRecordHeaderFp(header.indexPosition)); header.write(file); } /** * Appends an entry to end of index. Assumes that insureIndexSpace() has already been called. */ protected void addEntryToIndex(String key, RecordHeader newRecord, int currentNumRecords) throws IOExce DbByteArrayOutputStream temp = new DbByteArrayOutputStream(MAX_KEY_LENGTH); (new DataOutputStream(temp)).writeUTF(key); if (temp.size() > MAX_KEY_LENGTH) { throw new RecordsFileException("Key is larger than permitted size of " + MAX_KEY_LENGTH + " bytes") } file.seek(indexPositionToKeyFp(currentNumRecords)); temp.writeTo(file); file.seek(indexPositionToRecordHeaderFp(currentNumRecords)); newRecord.write(file); newRecord.setIndexPosition(currentNumRecords); writeNumRecordsHeader(currentNumRecords+1); } /** * Removes the record from the index. Replaces the target with the entry at the * end of the index. */ protected void deleteEntryFromIndex(String key, RecordHeader header, int currentNumRecords) throws IOEx if (header.indexPosition != currentNumRecords -1) { String lastKey = readKeyFromIndex(currentNumRecords-1); RecordHeader last = keyToRecordHeader(lastKey); last.setIndexPosition(header.indexPosition); file.seek(indexPositionToKeyFp(last.indexPosition)); file.writeUTF(lastKey); file.seek(indexPositionToRecordHeaderFp(last.indexPosition)); last.write(file); } writeNumRecordsHeader(currentNumRecords-1); } /** * Adds the given record to the database. */ public synchronized void insertRecord(RecordWriter rw) throws RecordsFileException, IOException { String key = rw.getKey(); if (recordExists(key)) { throw new RecordsFileException("Key exists: " + key); } insureIndexSpace(getNumRecords() + 1); RecordHeader newRecord = allocateRecord(key, rw.getDataLength()); writeRecordData(newRecord, rw); addEntryToIndex(key, newRecord, getNumRecords()); }

file.seek(indexPositionToKeyFp(position)); return file.readUTF();

PDF created with pdfFactory Pro trial version www.pdffactory.com

/** * Updates an existing record. If the new contents do not fit in the original record, * then the update is handled by deleting the old record and adding the new. */ public synchronized void updateRecord(RecordWriter rw) throws RecordsFileException, IOException { RecordHeader header = keyToRecordHeader(rw.getKey()); if (rw.getDataLength() > header.dataCapacity) { deleteRecord(rw.getKey()); insertRecord(rw); } else { writeRecordData(header, rw); writeRecordHeaderToIndex(header); } } /** * Reads a record. */ public synchronized RecordReader readRecord(String key) throws RecordsFileException, IOException { byte[] data = readRecordData(key); return new RecordReader(key, data); } /** * Reads the data for the record with the given key. */ protected byte[] readRecordData(String key) throws IOException, RecordsFileException { return readRecordData(keyToRecordHeader(key)); } /** * Reads the record data for the given record header. */ protected byte[] readRecordData(RecordHeader header) throws IOException { byte[] buf = new byte[header.dataCount]; file.seek(header.dataPointer); file.readFully(buf); return buf; } /** * Updates the contents of the given record. A RecordsFileException is thrown if the new data does not * fit in the space allocated to the record. The header's data count is updated, but not * written to the file. */ protected void writeRecordData(RecordHeader header, RecordWriter rw) throws IOException, RecordsFileExc if (rw.getDataLength() > header.dataCapacity) { throw new RecordsFileException ("Record data does not fit"); } header.dataCount = rw.getDataLength(); file.seek(header.dataPointer); rw.writeTo((DataOutput)file); } /** * Updates the contents of the given record. A RecordsFileException is thrown if the new data does not * fit in the space allocated to the record. The header's data count is updated, but not * written to the file. */ protected void writeRecordData(RecordHeader header, byte[] data) throws IOException, RecordsFileExcepti if (data.length > header.dataCapacity) { throw new RecordsFileException ("Record data does not fit"); } header.dataCount = data.length; file.seek(header.dataPointer); file.write(data, 0, data.length); }

PDF created with pdfFactory Pro trial version www.pdffactory.com

/** * Deletes a record. */ public synchronized void deleteRecord(String key) throws RecordsFileException, IOException { RecordHeader delRec = keyToRecordHeader(key); int currentNumRecords = getNumRecords(); if (getFileLength() == delRec.dataPointer + delRec.dataCapacity) { // shrink file since this is the last record in the file setFileLength(delRec.dataPointer); } else { RecordHeader previous = getRecordAt(delRec.dataPointer -1); if (previous != null) { // append space of deleted record onto previous record previous.dataCapacity += delRec.dataCapacity; writeRecordHeaderToIndex(previous); } else { // target record is first in the file and is deleted by adding its space to // the second record. RecordHeader secondRecord = getRecordAt(delRec.dataPointer + (long)delRec.dataCapacity); byte[] data = readRecordData(secondRecord); secondRecord.dataPointer = delRec.dataPointer; secondRecord.dataCapacity += delRec.dataCapacity; writeRecordData(secondRecord, data); writeRecordHeaderToIndex(secondRecord); } } deleteEntryFromIndex(key, delRec, currentNumRecords); } // Checks to see if there is space for and additional index entry. If // not, space is created by moving records to the end of the file. protected void insureIndexSpace(int requiredNumRecords) throws RecordsFileException, IOException { int currentNumRecords = getNumRecords(); long endIndexPtr = indexPositionToKeyFp(requiredNumRecords); if (endIndexPtr > getFileLength() && currentNumRecords == 0) { setFileLength(endIndexPtr); dataStartPtr = endIndexPtr; writeDataStartPtrHeader(dataStartPtr); return; } while (endIndexPtr > dataStartPtr) { RecordHeader first = getRecordAt(dataStartPtr); byte[] data = readRecordData(first); first.dataPointer = getFileLength(); first.dataCapacity = data.length; setFileLength(first.dataPointer + data.length); writeRecordData(first, data); writeRecordHeaderToIndex(first); dataStartPtr += first.dataCapacity; writeDataStartPtrHeader(dataStartPtr); } }

PDF created with pdfFactory Pro trial version www.pdffactory.com

/** * Closes the file. */ public synchronized void close() throws IOException, RecordsFileException { try { file.close(); } finally { file = null; } } }

Figure 1. HREF="http://www.javaworld.com/javaworld/jw-01-1999/step/hamner /db/BaseRecordsFile.java">BaseRecordsFile

Because the code is fairly well documented, I won't step through the entire class; I'll only cover the more important segments.
private RandomAccessFile file;

All ile access is done through an instance of RandomAccessFile. We declare it as private so we don't have to worry that subclasses will inadvertently corrupt our data format.
protected long dataStartPtr;

This is the current ile pointer to the start of the record data. It mirrors the value of the data start pointer header in the ile in order to reduce ile accesses.
protected BaseRecordsFile(String dbPath, int initialSize) throws IOException, RecordsFileException { File f = new File(dbPath); if (f.exists()) { throw new RecordsFileException("Database already exits: " + dbPath); } file = new RandomAccessFile(f, "rw"); dataStartPtr = indexPositionToKeyFp(initialSize); setFileLength(dataStartPtr); writeNumRecordsHeader(0); writeDataStartPtrHeader(dataStartPtr); }

The irst constructor for this class is used to create a new database ile. After checking that the ile doesn't already exist, the RandomAccessFile is created. The dataStartPtr is set to the byte where the

PDF created with pdfFactory Pro trial version www.pdffactory.com

(n+1)th index entry would be, meaning exactly enough space is allocated for the index region to hold n index entries. The two ile headers are then written to the ile using the writeNumRecordsHeader() and writeDataStartPtrHeader() methods.
protected BaseRecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException { File f = new File (dbPath); if(!f.exists()) { throw new RecordsFileException("Database not found: " + dbPath); } file = new RandomAccessFile(f, accessFlags); dataStartPtr = readDataStartHeader(); }

The other constructor is used to open an existing database. The accessFlags parameter determines whether the ile will be read-only (r) or read-write (rw). Once the ile is opened, the dataStartPtr variable is initialized to the value held in the ile.
protected int readNumRecordsHeader() throws IOException { file.seek(NUM_RECORDS_HEADER_LOCATION); return file.readInt(); }

The above method is used to fetch the number of record headers from the ile. We irst use seek() to position the ile pointer, and then use readInt() to read the data. This method is typical of how most data access methods work in this class.
protected long indexPositionToKeyFp(int pos) { return FILE_HEADERS_REGION_LENGTH + (INDEX_ENTRY_LENGTH * pos); }

The indexPositionToKeyFp() method maps a position in the ile index to a ile pointer pointing to the irst byte of the key part of the index entry. Because the index entries are ixed length, it's an easy task to calculate this value; we just multiply the length of one entry by the position and then add FILE_HEADERS_REGION_LENGTH bytes to account for the headers region at the start of the ile. The indexPositionToRecordHeaderFp() method is identical to this method except it adds an offset to account for the length of a key. These two methods enable us to iterate over all the records in the ile.

protected void addEntryToIndex(String key, RecordHeader newRecord, int currentNumRecords) throws IOExcept DbByteArrayOutputStream temp = new DbByteArrayOutputStream(MAX_KEY_LENGTH); (new DataOutputStream(temp)).writeUTF(key); if (temp.size() > MAX_KEY_LENGTH) { throw new RecordsFileException("Key is larger than permitted size of " + MAX_KEY_LENGTH + " bytes") } file.seek(indexPositionToKeyFp(currentNumRecords)); temp.writeTo(file); file.seek(indexPositionToRecordHeaderFp(currentNumRecords));

PDF created with pdfFactory Pro trial version www.pdffactory.com

newRecord.write(file); newRecord.setIndexPosition(currentNumRecords); writeNumRecordsHeader(currentNumRecords+1);

The addEntryToIndex() method adds a new entry to the end of the index, and then updates the record count by calling writeNumRecordsHeader(). It assumes there is suficient space in the index to hold the entry. The temp stream is used to ensure the key its in the allowed space. A RecordsFileException is thrown if the key doesn't it.

protected void deleteEntryFromIndex(String key, RecordHeader header, int currentNumRecords) throws IOExce if (header.indexPosition != currentNumRecords -1) { String lastKey = readKeyFromIndex(currentNumRecords-1); RecordHeader last = keyToRecordHeader(lastKey); last.setIndexPosition(header.indexPosition); file.seek(indexPositionToKeyFp(last.indexPosition)); file.writeUTF(lastKey); file.seek(indexPositionToRecordHeaderFp(last.indexPosition)); last.write(file); } writeNumRecordsHeader(currentNumRecords-1); }

The deleteEntryFromIndex() method removes an entry from the index by copying the last entry into the position of the entry being removed. After moving the entry, the record count header is decremented.
public synchronized void insertRecord(RecordWriter rw) throws RecordsFileException, IOException { String key = rw.getKey(); if (recordExists(key)) { throw new RecordsFileException("Key exists: " + key); } insureIndexSpace(getNumRecords() + 1); RecordHeader newRecord = allocateRecord(key, rw.getDataLength()); writeRecordData(newRecord, rw); addEntryToIndex(key, newRecord, getNumRecords()); }

We insert a record by growing the index space (if necessary) by calling insureIndexSpace() and then calling allocateRecord() to locate a position for the new record data. The data is then written to the ile and the entry is added to the index.
public synchronized void updateRecord(RecordWriter rw) throws RecordsFileException, IOException { RecordHeader header = keyToRecordHeader(rw.getKey()); if (rw.getDataLength() > header.dataCapacity) { deleteRecord(rw.getKey()); insertRecord(rw); } else { writeRecordData(header, rw);

PDF created with pdfFactory Pro trial version www.pdffactory.com

writeRecordHeaderToIndex(header);

If the new record data its in the original space allocated to the record, updateRecord() writes the new record data into that space, then updates the header. Otherwise, we must do the high-level, delete-and-insert operation to allocate enough space for the new data.

protected void writeRecordData(RecordHeader header, RecordWriter rw) throws IOException, RecordsFileExcep if (rw.getDataLength() > header.dataCapacity) { throw new RecordsFileException ("Record data does not fit"); } header.dataCount = rw.getDataLength(); file.seek(header.dataPointer); rw.writeTo((DataOutput)file); }

This method irst checks that the new data its in the old space and then positions the ile pointer to the dataPointer ield in the header. It then calls the writeTo() method of the RecordWriter in order to write the data to the ile. There is also a writeRecordData() method, which takes a raw byte buffer instead of a RecordWriter.
public synchronized void deleteRecord(String key) throws RecordsFileException, IOException { RecordHeader delRec = keyToRecordHeader(key); int currentNumRecords = getNumRecords(); if (getFileLength() == delRec.dataPointer + delRec.dataCapacity) { // shrink file since this is the last record in the file setFileLength(delRec.dataPointer); } else { RecordHeader previous = getRecordAt(delRec.dataPointer -1); if (previous != null) { // append space of deleted record onto previous record previous.dataCapacity += delRec.dataCapacity; writeRecordHeaderToIndex(previous); } else { // target record is first in the file and is deleted by adding its space to // the second record. RecordHeader secondRecord = getRecordAt(delRec.dataPointer + (long)delRec.dataCapacity); byte[] data = readRecordData(secondRecord); secondRecord.dataPointer = delRec.dataPointer; secondRecord.dataCapacity += delRec.dataCapacity; writeRecordData(secondRecord, data); writeRecordHeaderToIndex(secondRecord); } } deleteEntryFromIndex(key, delRec, currentNumRecords); }

We can delete a record in one of several ways. If the record is the last one in the ile, we simply shrink the ile by calling the setFileLength() method. Otherwise, we have two choices. If the record

PDF created with pdfFactory Pro trial version www.pdffactory.com

has a record before it in the ile (which is found by calling getRecordAt()), we add its space onto the previous record's space; if the record is the irst one in the ile, its space is added onto the second record's space.
protected void insureIndexSpace(int requiredNumRecords) throws RecordsFileException, IOException { int currentNumRecords = getNumRecords(); long endIndexPtr = indexPositionToKeyFp(requiredNumRecords); if (endIndexPtr > getFileLength() && currentNumRecords == 0) { setFileLength(endIndexPtr); dataStartPtr = endIndexPtr; writeDataStartPtrHeader(dataStartPtr); return; } while (endIndexPtr > dataStartPtr) { RecordHeader first = getRecordAt(dataStartPtr); byte[] data = readRecordData(first); first.dataPointer = getFileLength(); first.dataCapacity = data.length; setFileLength(first.dataPointer + data.length); writeRecordData(first, data); writeRecordHeaderToIndex(first); dataStartPtr += first.dataCapacity; writeDataStartPtrHeader(dataStartPtr); } }

The insureIndexSpace() is used to grow the index. If the ile is empty, the ile is grown to the desired size. In the while loop, records are read from the top of the ile and written to the end of the ile. The dataStartPtr is updated with each iteration. Class RecordsFile RecordsFile extends BaseRecordsFile to provide an implementation for the abstract methods.
package hamner.db; import java.io.*; import java.util.*; public class RecordsFile extends BaseRecordsFile { /** * Hashtable which holds the in-memory index. For efficiency, the entire index * is cached in memory. The hashtable maps a key of type String to a RecordHeader. */ protected Hashtable memIndex; /** * Creates a new database file. The initialSize parameter determines the * amount of space which is allocated for the index. The index can grow * dynamically, but the parameter is provide to increase * efficiency. */ public RecordsFile(String dbPath, int initialSize) throws IOException, RecordsFileException { super(dbPath, initialSize); memIndex = new Hashtable(initialSize); } /** * Opens an existing database and initializes the in-memory index. */ public RecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException {

PDF created with pdfFactory Pro trial version www.pdffactory.com

} /** * Returns an enumeration of all the keys in the database. */ public synchronized Enumeration enumerateKeys() { return memIndex.keys(); } /** * Returns the current number of records in the database. */ public synchronized int getNumRecords() { return memIndex.size(); } /** * Checks if there is a record belonging to the given key. */ public synchronized boolean recordExists(String key) { return memIndex.containsKey(key); } /** * Maps a key to a record header by looking it up in the in-memory index. */ protected RecordHeader keyToRecordHeader(String key) throws RecordsFileException { RecordHeader h = (RecordHeader)memIndex.get(key); if (h==null) { throw new RecordsFileException("Key not found: " + key); } return h; } /** * This method searches the file for free space and then returns a RecordHeader * which uses the space. (O(n) memory accesses) */ protected RecordHeader allocateRecord(String key, int dataLength) throws RecordsFileException, IOExcept // search for empty space RecordHeader newRecord = null; Enumeration e = memIndex.elements(); while (e.hasMoreElements()) { RecordHeader next = (RecordHeader)e.nextElement(); int free = next.getFreeSpace(); if (dataLength <= next.getFreeSpace()) { newRecord = next.split(); writeRecordHeaderToIndex(next); break; } } if (newRecord == null) { // append record to end of file - grows file to allocate space long fp = getFileLength(); setFileLength(fp + dataLength); newRecord = new RecordHeader(fp, dataLength); } return newRecord;

super(dbPath, accessFlags); int numRecords = readNumRecordsHeader(); memIndex = new Hashtable(numRecords); for (int i = 0; i < numRecords; i++) { String key = readKeyFromIndex(i); RecordHeader header = readRecordHeaderFromIndex(i); header.setIndexPosition(i); memIndex.put(key, header); }

PDF created with pdfFactory Pro trial version www.pdffactory.com

} /** * Returns the record to which the target file pointer belongs - meaning the specified location * in the file is part of the record data of the RecordHeader which is returned. Returns null if * the location is not part of a record. (O(n) mem accesses) */ protected RecordHeader getRecordAt(long targetFp) throws RecordsFileException { Enumeration e = memIndex.elements(); while (e.hasMoreElements()) { RecordHeader next = (RecordHeader) e.nextElement(); if (targetFp >= next.dataPointer && targetFp < next.dataPointer + (long)next.dataCapacity) { return next; } } return null; } /** * Closes the database. */ public synchronized void close() throws IOException, RecordsFileException { try { super.close(); } finally { memIndex.clear(); memIndex = null; } } /** * Adds the new record to the in-memory index and calls the super class add * the index entry to the file. */ protected void addEntryToIndex(String key, RecordHeader newRecord, int currentNumRecords) throws IOExce super.addEntryToIndex(key, newRecord, currentNumRecords); memIndex.put(key, newRecord); } /** * Removes the record from the index. Replaces the target with the entry at the * end of the index. */ protected void deleteEntryFromIndex(String key, RecordHeader header, int currentNumRecords) throws IOEx super.deleteEntryFromIndex(key, header, currentNumRecords); RecordHeader deleted = (RecordHeader)memIndex.remove(key); }

Figure 2. BaseRecordsFile

As with the previous class, I won't discuss all the methods here because most of the code is straightforward and well documented.
protected Hashtable memIndex;

The instance variable memIndex is used to cache the entire ile index in memory. The hashtable maps

PDF created with pdfFactory Pro trial version www.pdffactory.com

keys to record headers.


public RecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException { super(dbPath, accessFlags); int numRecords = readNumRecordsHeader(); memIndex = new Hashtable(numRecords); for (int i = 0; i < numRecords; i++) { String key = readKeyFromIndex(i); RecordHeader header = readRecordHeaderFromIndex(i); header.setIndexPosition(i); memIndex.put(key, header); } }

This constructor initializes the ile by calling the super's constructor and then prepares the in-memory index by iterating over the index entries while using the readKeyFromIndex() and readRecordHeaderFromIndex() methods to add entries to the hashtable.
protected RecordHeader keyToRecordHeader(String key) throws RecordsFileException { RecordHeader h = (RecordHeader)memIndex.get(key); if (h==null) { throw new RecordsFileException("Key not found: " + key); } return h; }

This method maps a record's key to its RecordHeader. Note that because no ile accesses are required, and the lookup is done through the hashtable, this operation takes place in constant order time; that is, the time it takes to look up one record is not a function of the number of records in the ile.

protected RecordHeader allocateRecord(String key, int dataLength) throws RecordsFileException, IOExceptio // search for empty space RecordHeader newRecord = null; Enumeration e = memIndex.elements(); while (e.hasMoreElements()) { RecordHeader next = (RecordHeader)e.nextElement(); int free = next.getFreeSpace(); if (dataLength <= next.getFreeSpace()) { newRecord = next.split(); writeRecordHeaderToIndex(next); break; } } if (newRecord == null) { // append record to end of file - grow file to allocate space long fp = getFileLength(); setFileLength(fp + dataLength); newRecord = new RecordHeader(fp, dataLength); } return newRecord; }

PDF created with pdfFactory Pro trial version www.pdffactory.com

This method locates free space for a new record of size dataLength by iterating over all the entries in memIndex until adequate space is found. If space is not found in an existing record, the new record is appended to the end of the ile. If the record does contain space, it is split into two parts: one for the existing record and one for the new record. I need to point out here that this is not the optimal solution for this operation because it makes inserting an element an order n operation with respect to memory accesses. A more advanced implementation of the RecordsFile class would keep a sorted memory map so that inding free space would give log(n) performance. However, this implementation still holds true to our design requirement, since the number of ile accesses required is not dependent on the number of records in the ile.
protected RecordHeader getRecordAt(long targetFp) throws RecordsFileException { Enumeration e = memIndex.elements(); while (e.hasMoreElements()) { RecordHeader next = (RecordHeader) e.nextElement(); if (targetFp >= next.dataPointer && targetFp < next.dataPointer + (long)next.dataCapacity) { return next; } } return null; }

This method locates the record that has data residing at the indicated position in the ile. As with the last method, this implementation is O(n) with respect to memory accesses and leaves room for improvement. Class RecordHeader
RecordHeader provides a wrapper to hold key information about a record. package hamner.db; import java.io.*; public class RecordHeader { /** * File pointer to the first byte of record data (8 bytes). */ protected long dataPointer; /** * Actual number of bytes of data held in this record (4 bytes). */ protected int dataCount; /** * Number of bytes of data that this record can hold (4 bytes). */ protected int dataCapacity; /** * Indicates this header's position in the file index. */ protected int indexPosition; protected RecordHeader() { }

PDF created with pdfFactory Pro trial version www.pdffactory.com

protected RecordHeader(long dataPointer, int dataCapacity) { if (dataCapacity < 1) { throw new IllegalArgumentException("Bad record size: " + dataCapacity); } this.dataPointer = dataPointer; this.dataCapacity = dataCapacity; this.dataCount = 0; } protected int getIndexPosition() { return indexPosition; } protected void setIndexPosition(int indexPosition) { this.indexPosition = indexPosition; } protected int getDataCapacity() { return dataCapacity; } protected int getFreeSpace() { return dataCapacity - dataCount; } protected void read(DataInput in) throws IOException { dataPointer = in.readLong(); dataCapacity = in.readInt(); dataCount = in.readInt(); } protected void write(DataOutput out) throws IOException { out.writeLong(dataPointer); out.writeInt(dataCapacity); out.writeInt(dataCount); } protected static RecordHeader readHeader(DataInput in) throws IOException { RecordHeader r = new RecordHeader(); r.read(in); return r; } /** * Returns a new record header which occupies the free space of this record. * Shrinks this record size by the size of its free space. */ protected RecordHeader split() throws RecordsFileException { long newFp = dataPointer + (long)dataCount; RecordHeader newRecord = new RecordHeader(newFp, getFreeSpace()); dataCapacity = dataCount; return newRecord; }

Figure 3. RecordHeader

Here are the highlights of the source:


protected long dataPointer;

The dataPointer is a ile pointer to the irst byte of this record's data.

PDF created with pdfFactory Pro trial version www.pdffactory.com

protected int dataCount;

The dataCount is used to keep track of the actual number of bytes of data held in this record. This is the number of valid bytes written and should not to be confused with the dataCapacity.
protected int dataCapacity;

The dataCapacity is the number of bytes of data this record can hold. This value will always be equal to or larger than the dataCount.
protected int indexPosition;

The index position is used to speed accesses to this header's entry in the ile. Unlike the other ields in this class, this value is stored only in-memory and is not written to the ile as part of the record header.
protected void write(DataOutput out) throws IOException { out.writeLong(dataPointer); out.writeInt(dataCapacity); out.writeInt(dataCount); }

The write() method writes the header to the ile.


protected void read(DataInput in) throws IOException { dataPointer = in.readLong(); dataCapacity = in.readInt(); dataCount = in.readInt(); }

This method reads and initializes the header ields from the ile. Note that the ields are read in the same order in which they were written.
protected RecordHeader split() throws RecordsFileException { long newFp = dataPointer + (long)dataCount; RecordHeader newRecord = new RecordHeader(newFp, getFreeSpace()); dataCapacity = dataCount; return newRecord; }

PDF created with pdfFactory Pro trial version www.pdffactory.com

The split() method returns a new record header, which occupies the free space of this record. It also shrinks this record's size by the size of its free space. Class RecordWriter RecordWriter is used to construct a new record and write it to the ile as a byte buffer. To accomplish this, the user writes data into a DbByteArrayOutputStream, which is in turn written to the ile. This design protects the ile from being corrupted by the user, since the user is never allowed to write directly to the ile. This speeds things up, since the user can build a record (often times consuming due to object serialization) without blocking other users from reading and writing records.
package hamner.db; import java.io.*; public class RecordWriter { String key; DbByteArrayOutputStream out; ObjectOutputStream objOut; public RecordWriter(String key) { this.key = key; out = new DbByteArrayOutputStream(); } public String getKey() { return key; } public OutputStream getOutputStream() { return out; } public ObjectOutputStream getObjectOutputStream() throws IOException { if (objOut == null) { objOut = new ObjectOutputStream(out); } return objOut; } public void writeObject(Object o) throws IOException { getObjectOutputStream().writeObject(o); getObjectOutputStream().flush(); } /** * Returns the number of bytes in the data. */ public int getDataLength() { return out.size(); } /** * Writes the data out to the stream without re-allocating the buffer. */ public void writeTo(DataOutput str) throws IOException { out.writeTo(str); } }

Figure 4. RecordWriter

Here are the code highlights for this class:


String key; DbByteArrayOutputStream out;

PDF created with pdfFactory Pro trial version www.pdffactory.com

ObjectOutputStream objOut; public RecordWriter(String key) { this.key = key; out = new DbByteArrayOutputStream(); }

The constructor sets the key ield and then creates a DbByteArrayOutputStream, which is used to buffer the data written to the record.
public ObjectOutputStream getObjectOutputStream() throws IOException { if (objOut == null) { objOut = new ObjectOutputStream(out); } return objOut; }

This method returns an ObjectOutputStream that can be used to store serialized objects in the record.
public void writeObject(Object o) throws IOException { getObjectOutputStream().writeObject(o); getObjectOutputStream().flush(); }

This utility method serializes an object to the record.


public int getDataLength() { return out.size(); }

This method returns the number of bytes of data in this record. Note that this method will only return the correct value if all buffered streams constructed off of the output stream have been lushed.
public void writeTo(DataOutput str) throws IOException { out.writeTo(str); }

This method writes the record data to the DataOutput stream. It uses the writeTo() method of class DbByteArrayOutputStream. Class RecordReader

PDF created with pdfFactory Pro trial version www.pdffactory.com

A RecordReader is used to read data from the record. It is a utility class that makes record data more accessible than it is in its raw byte buffer form.
package hamner.db; import java.io.*; public class RecordReader { String key; byte[] data; ByteArrayInputStream in; ObjectInputStream objIn; public RecordReader(String key, byte[] data) { this.key = key; this.data = data; in = new ByteArrayInputStream(data); } public String getKey() { return key; } public byte[] getData() { return data; } public InputStream getInputStream() throws IOException { return in; } public ObjectInputStream getObjectInputStream() throws IOException { if (objIn == null) { objIn = new ObjectInputStream(in); } return objIn; } /** * Reads the next object in the record using an ObjectInputStream. */ public Object readObject() throws IOException, OptionalDataException, ClassNotFoundException { return getObjectInputStream().readObject(); } }

Figure 5. RecordReader

Here's a look at the code highlights:


String key; byte[] data; ByteArrayInputStream in; ObjectInputStream objIn;

The RecordReader has four instance variables: The record's key A byte buffer that holds the record data

PDF created with pdfFactory Pro trial version www.pdffactory.com

A ByteArrayInputStream for reading the record data An ObjectInputStream used for reading serialized objects from the record

public RecordReader(String key, byte[] data) { this.key = key; this.data = data; in = new ByteArrayInputStream(data); }

The constructor initializes the key and data ields and then creates the ByteArrayInputStream off of the data buffer.
public InputStream getInputStream() throws IOException { return in; }

This method returns an input stream for reading ields from the record.
public ObjectInputStream getObjectInputStream() throws IOException { if (objIn == null) { objIn = new ObjectInputStream(in); } return objIn; }

This method returns an ObjectOutputStream for reading serialized objects. The user should use either this method or the getInputStream() method for reading data -- mixed access will produce undeined results.
public Object readObject() throws IOException, OptionalDataException, ClassNotFoundException { return getObjectInputStream().readObject(); }

This is a utility method to read the next serialized object from the objIn stream. Class DbByteArrayOutputStream DbByteArrayOutputStream is a small utility class that extends ByteArrayOutputStream in order to provide a way to write the buffer to a DataOutput without reallocating it.
package hamner.db;

PDF created with pdfFactory Pro trial version www.pdffactory.com

import java.io.*; /** * Extends ByteArrayOutputStream to provide a way of writing the buffer to * a DataOutput without re-allocating it. */ public class DbByteArrayOutputStream extends ByteArrayOutputStream { public DbByteArrayOutputStream() { super(); } public DbByteArrayOutputStream(int size) { super(size); } /** * Writes the full contents of the buffer a DataOutput stream. */ public synchronized void writeTo (DataOutput dstr) throws IOException { byte[] data = super.buf; int l = super.size(); dstr.write(data, 0, l); } }

Figure 6. DbByteArrayOutputStream

Class RecordsFileException RecordsFileException is a simple exception class that indicates that an error has occurred while performing an operation on the records ile.
package hamner.db; public class RecordsFileException extends Exception { public RecordsFileException (String msg) { super(msg); } }

Figure 7. RecordsFileException

Using the RecordsFile class


The following code creates a new RecordsFile and inserts a test record that consists of a serialized Date. The records ile is created with an initial capacity of 64. We can use this code in any application that needs to persistently store the last time that a particular resource was accessed.
RecordsFile recordsFile = new RecordsFile("testDatabase.jdb", 64); RecordWriter rw = new RecordWriter("foo.lastAccessTime"); rw.writeObject(new Date()); recordsFile.insertRecord(rw);

Later in our application, when we wish to retrieve the record, the code would be as follows:

PDF created with pdfFactory Pro trial version www.pdffactory.com

RecordReader rr = recordsFile.readRecord("foo.lastAccessTime"); Date d = (Date)rr.readObject(); System.out.println("last access was at: " + d.toString());

With this small amount of code, we have accomplished a rather important task: we have inserted an arbitrarily sized object into our ile, and then retrieved it again. When a new access to the resource takes place at a later time, we can update the contents of the record as follows:
RecordWriter rw = new RecordWriter("foo.lastAccessTime"); rw.writeObject(new Date()); recordsFile.updateRecord(rw);

Finally, if we need to remove the record from the database, we can use the deleteRecord() method:
recordsFile.deleteRecord("foo.lastAccessTime");

To see this code in action, check out the TestRecords class provided in the source distribution for this example.

Possible additions and optimizations


You could improve on our example in several ways. To improve performance you could restructure the operations that depend on iterating over the entire in-memory index by keeping a datastructure that holds pointers to the free space in the record heap. Another optimization would be to add a "quick insert" that always appends the record to the end of the ile. This approach would be useful when an entire database is being constructed from scratch; that is, when there is never free space between record data. Other improvements to reduce the number of ile accesses and, hence, improve performance include caching recently accessed records, and using a buffer to load and store index entries as a single unit, rather than calling separate reads and writes for each ield in the entry.

Conclusion
This example has shown that we can design and implement our own low-level ile format to store records. Although it was designed as a means of illustrating the uses of the RandomAccessFile class, it has practical applications in a wide range of programming situations. Here are a few possible examples that come to mind:

About the author

PDF created with pdfFactory Pro trial version www.pdffactory.com

Derek Hamner has extensive experience designing and implementing collaborative systems in Java. He is president of CSR Inc., a North Carolina startup that creates Internet classroom applications for distance learning and corporate training. Derek is also coauthor of Java Network Programming, Second Edition, which covers many related networking and I/O issues, including the Java 2 platform (formerly known as JDK 1.2), and is due for release January 1999. A persistent data cache: A RecordsFile could be used to cache arbitrary data in a ile for applications such as Web browsers, Web servers, and proxy servers, among others. A "properties" ile for arbitrary objects: The Java properties ile format, provided by class java.util.Properties is a very useful means of storing and retrieving name/value pairs. This example provides the same basic functionality, but allows the user to store binary data. Any application where the data model is too simple to justify the use of a heavyweight database API such as JDBC.

All contents copyright 1995-2010 Java World, Inc. http://www.javaworld.com

PDF created with pdfFactory Pro trial version www.pdffactory.com

You might also like