You are on page 1of 21

Database design guide

www.techrepublic.com

TechRepublic's database design guide


If an enterprises data is its lifeblood, then the database design can be the most important part of an application. Volumes have been written on this topic, and entire college degree programs have been built around it. However, as has been said time and time again on TechRepublic, theres no teacher like experience. So we recently asked our members to share their experience by providing their favorite database design tips. Developer Republics editors selected the 60 best tips from the more than 130 responses we received. We then compiled them into this document, organized into five sections for ease of reference:

Section 1Before you build


Here are 12 tips on laying the groundwork for your project, from naming conventions to gathering business requirements.

Section 2Table design


These 24 guidelines cover everything from fields you should include in every table to common pitfalls and how to avoid them.

Section 3Key selection


What should your keys be? Here are 10 tips on the correct use of system-generated primary keys and when (and how) to index fields for best performance.

Section 4Ensuring data integrity


Find out how to help your database keep itself clean and healthy. These eight tips focus on keeping bad data to a minimum.

Section 5Miscellaneous tips


The collection of tips wraps up with everything that didnt fit into the first four sections: six general rules of thumb to make your life easier. Enjoy!

Page 1

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

Section 1Before you build


1. Do your homework
Not only should you research your business needs when designing a new database but you should also check out the existing system, as well. Few database projects are built from scratch; there is almost always an existing system (maybe not computerized) that the organization is using to fulfill its needs. Obviously, the existing system is not perfect; otherwise, you wouldnt be building a new one. But by studying it, you may discover nuances that you would otherwise have missed had you ignored it. If nothing else, examining the existing system is usually good for a chuckle or two. Lamont Adams I took on a side project, for a local transportation company, to develop a simple storage app in Access. I laid out the parameters of the project, reviewed them with the customer, previewed a working model that worked perfectly in our development environment, and finally deployed the app, which promptly developed terminal whooping cough and died right in front of my eyes! Hours of hair pulling before I realized the company had two database apps running on the network that required explicit and very restrictive user accounts and permissions. More hours and less available hair later, a solution was finally createdusing the customer's system but not before some very embarrassing moments. Moral of the story: Do your homework and remember that if you're developing an app in a common environment such as Access or Interbase, dig a little deeper than the surface. kg

2. Define a standard object-naming scheme


Always define a strategy for naming your DB objects. For tables, at the start of a project, decide whether they will all be plural or singular and stick to it. Define simple rules for aliases of tables (for example, the first four letters from tables with one-word names, the first two from each word with two-word names, one letter from the first two words and two from the last for three-word tables, and so on). For work tables, prefix the table name with WORK_ and append the name of the program that uses it. For columns, use a set of rules for keys. For example, if it is numeric, use _NO as the suffix; if it is character, use _CODE. Use standard prefixes and suffixes for column names. For instance, if you have a lot of money fields, add the suffix _AMT to each column. A useful rule for date columns is to always start the column name with DATE_. richard

Page 2

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

Watch your naming conventions between table names, report names, and query names. It can get very confusing very fast as to what you are working with and where its at. If you insist on naming these components identically, at least identify them with table, query, or report at the beginning of the name of the object. rrydenm With Microsoft Access, it is acceptable to use qry, rpt, tbl, and mod to identify objects (e.g., tbl_Employees). When I deal with SQL Server (or Oracle) I still use tbl to reference tables, but I use sp_company (currently sp_feft_) for stored procedures, because I sometimes keep copies of special ones Ive written if I've found a clever way to do something, and v_ for views. When we implement SQL Server 2000, I will use udf_ (or something similar) for the functions I write. Timothy J. Bruce

3. Plan ahead
Back in the early 1980s, when working with an asset ledger system and a System 38, I had an opportunity to make all the date fields so they would handle the year 2000 problem without a lot of extra work. Many people said that I should forget about working on the problem because it would take too much effort to deal with it. (This was long before it became known as the Y2K problem.) I bit the bullet back then, planning ahead. It took a couple of weeks to make all the changes in the set of programs. But because of that preplanning, Y2K mods should have been minimal. (The last I heard, the programs were going strong on an AS/400 in 1995. The only problem with them at that time was the removal of comments from the source code.) generalist

4. Get The Data Model Resource Book


For those looking for sample models, The Data Model Resource Book by Len Silverston, W. H. Inmon, and Kent Graziano is the best data modeling book you can have. This book includes chapters on lots of various common data areas, such as people, organizations, and work effort. minstrelmike

5. Think about the future, but dont forget past lessons


I have always found it useful to ask the users how they see their requirements changing in the future. This accomplishes two things: First, it gives you a good idea of where the design has to be especially flexible and avoid performance bottlenecks, and second, you know that if changes occur that were not on this list, the user group will be as surprised as you are. chrisdk Remember the past as well! This is where experience really pays off, and we developers may be able to help each other by sharing our own. Even if users think that they will never need to have more than one phone number or need to separate
Page 3 2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

first name from middle name, we should try to sell them on it. We all have had those "if only Id done it this way" moments. dhattrem

6. Do logical design before physical design


Use logical design before diving into physical design. With the number of CASE tools available that allow design at a logical level, you can usually get a better overall understanding of what is needed in the database as a whole. chardove

7. Know the business


Echoing [tip number 1] a little, do not put a single table on your ER model [you do have a model, right? See tip number 9] before you are 100 percent sure about what the system is supposed to do from the client's point of view. This knowledge will save you a lot of time on the next stages. Once you know the business requirements, you can make a lot of decisions on your own. rangel If I may expand on this a bit: Once you think you know the business, do a quick whiteboard ERD with the client. Use the client's terms and try to explain back to them what you think you heard. By also expressing the cardinality of each relation, in terms of may, will, must, etc., you can get the client to correct your understanding and then put in a much better starting ERD. teburlew

8. Create a data dictionary and ER diagram


Always take the time to create an ER diagram and data dictionary. They should contain at least the data types of each field and what the primary and foreign keys are in each table. These are time-consuming to create but are essential for other developers to understand the design. Creating one early helps avoid a lot of confusion later and allows anyone who understands databases to figure out how to retrieve data from the database. bgumbert I can't stress enough how important it is to keep up-to-date documentation like ER diagrams, very useful for showing relationships between tables, and a data dictionary that describes what each field is used for and any aliases that may exist. Documenting SQL statements is a must as well. vanduin.chris.cj

Page 4

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

9. Always create a model


A picture is worth a thousand words: Not only can the developer read and implement it, but it can be used to talk with the user. That helps promote a collaborative approach and it's less likely that large holes will be present in the first database design. The model doesnt have to be grandiose; it could even be simply handwritten on a piece of paper. Just making sure the logical relationships hang together will yield a huge benefit later on. Dana Daigle

10. Design from the output in


When defining database table and field requirements (inputs), first look at existing or desired reports, queries, and screens (outputs) to determine what tables and fields will be necessary to support these outputs. A simple example would be: If the customer will require a report that sorts, breaks, or subtotals by ZIP code, be sure to include a separate ZIP code field rather than lump the ZIP code into an address field. peter.marshall

11. Reporting tips


Understand how users will report on the data most often: batch or online? By day, week, month, quarter, or year? Consider creating summary tables if needed. System-generated primary keys are difficult to manage for reporting. Users perform lookups against secondary keys on tables with system-generated primary keys, often returning many duplicates. The performance is generally awful, and the confusion is high. kol

12. Make sure you understand the customer


This may seem obvious, but requirements come from the customer (think both internal and external customers here). Don't depend on what the customer writes in requirements being what he/she really wants. Ask for the interpretations of what the requirements "say" and, as development proceeds, check back with the customer to ensure that his/her needs are still being met. Invariably, an "I'll know what I want when I see it" approach will cause major rework when the database doesn't deliver something that the customer never wrote down. Worse yet, your interpretation of their requirements only belongs to you and might be incorrect. kgilson

Page 5

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

Section 2Table design and field selection


1. Remember to audit changes over time
Whenever I design a new database, I consider which data fields may change over time. The obvious example here, and one that is very commonly overlooked, is last name. Whenever I build a system to store customer information, I tend to store the last name field (and other transitory data items) in a separate table along with additional data fields for Date From and Date To, in order to track all changes to that data item. Shropshire Lad

2. Use meaningful field names


I once worked on a project I inherited from another programmer who liked to name fields using the name of the on-screen control that displayed the data from that field. Thats all well and good, but unfortunately, she also liked to name her controls using some strange convention that combined Hungarian notation with the order in which she added the controls to the UI: cbo1, txt2, txt2_b, and so on. Unless you are using a system that restricts you to short field names, make them as descriptive as possiblewithin reason, of course. Its possible to go overboard with this. Customer_Shipping_Address_Street_Line_1 is very descriptive and meaningful, but no one would want to have to type it more than once. Lamont Adams

3. Use prefixes for recurring names


If you have fields of the same type (like a FirstName) in multiple tables, name them with a table-specific prefix (CusLastName). This helps keep your sanity when you start doing joins. notoriousDOG

4. Provide auditing for time-sensitive data


For time-sensitive data, include a "last updated date/time" field. Time stamps can be useful for debugging data problems, reprocessing/reloading data by date, and purging old data. kol

5. Normalize and data-drive


Normalize to at least 3rd Normal Form. You will make life so much easier for yourself and others down the road if you keep your data normalized. Put as much in the database as you can. For example, if your UI accesses outside data sources (flat files, XML documents, other databases, etc.), store the connection or path information in support tables for your UI. Also, if the UI performs tasks like workflow (sends e-mails, prints letters, changes record status), store the data to generate the

Page 6

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

workflow in the database as well. It takes a little more effort up front, but if those processes are data-driven rather than hard-coded in the UI, policy changes and maintenance are much easier. In fact, if the process is data-driven, you can give much of that responsibility back to the users and let them maintain their own workflow processes and change them without coming to you. tduvall

6. Normalize, but dont overnormalize


For those unfamiliar with the term, normalization helps eliminate the redundancy of data in a database by ensuring that all fields in a table are atomic. There are several forms of normalization, but the Third Normal Form (3NF) is generally regarded as providing the best compromise between performance, extensibility, and data integrity. Briefly, 3NF states that: Each value in a table is to be represented only once. Each row in a table should be uniquely identifiable. (It should have a unique key.) No nonkey information that relies upon another key should be stored in the table. Databases in 3NF are characterized by a group of tables storing related data that is joined together through keys. For example, a 3NF database for storing customers and their related orders would likely have two tables: Customer and Order. The Order table would not contain any information about an orders related customer. Instead, it would store the key that identifies the row containing the customers information in the Customer table. Higher levels of normalization exist, but is more normal necessarily better? Not always. In fact, for some projects, even 3NF may introduce too much complexity into the database to be worth the rewards. Lamont Adams There are many legitimate instances where a denormalized table is necessary for speed. I'm in the middle of a financial analyzer in which some 40-second queries were reduced to a couple of seconds with a denormalized table. When I have to do that, I never put the denormalized tables in the basic design. Instead, I make them derivative, so that it is always possible to regenerate the denormalized table from the original if it gets corrupted. It is not terribly difficult to keep the denormalized tables up to date with triggers and the like or even to do a union of the denormalized table to a certain date and do joins on the normalized tables later. epepke

Page 7

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

7. Microsoft Access reporting tip


If you're using Microsoft Access, use user-friendly field names instead of coded names: Customer Name instead of txtCNaM. That way, when you use the wizards for forms and reports, the names will be something people can read, not geekspeak. jwoodruf

8. Inactive or unused indicator


One thing I have found helpful is to add a field to indicate if the record is no longer active in the business. Be it a customer, an employee, or a widget, it helps to be able to filter on active or inactive status when running queries. This eliminates a lot of questions when a new user is working on the data and prevents problems associated with deleting records once they are no longer used. theoden

9. Use role entities to define columns belonging to a category


When you need to define things as belonging to a specific category or having a specific role, use a role entity intersection to create specific relations that are timebound and therefore self-documenting. Rather than having a PERSON entity with a Title field in it, why not have a PERSON entity and a PERSON_TYPE entity to describe that person. Then, when John Smith, Engineer gets promoted to John Smith, Director and finally to John Smith, CIO, all you need do is change the key of the relationship between two tables, PERSON and PERSON_TYPE, and add a date/time field to know when the change occurred. This way, your PERSON_TYPE table contains all possible types of PERSON, such as: Associate, Engineer, Director, CIO, CEO, etc. The alternative is to always change the PERSON record to reflect new titles, and you lose your audit-trail as to what timeframe each individual was in which position. teburlew

10. Use generic entity names to organize data


The simplest way to organize data is by using generic names: PERSON, ORGANIZATION, ADDRESS, PHONE, etc. When you combine these or create specific unique secondary (subtype) entities of these, you can get specific. The main reason for using generic terms to start is that all business people can conceptualize in the abstract. Once you have these generic abstracts, you can get very specific in the secondaries. For instance, PERSON can be Employee, Spouse, Patient, Client, Customer, Vendor, Teacher, etc. Likewise, ORGANIZATION can be MyCompany, MyDepartment, Competitor, Hospital, Warehouse, Government, etc. And finally,

Page 8

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

ADDRESS can be Site, Location, Home, Work, Client, Vendor, Corporate, FieldOffice, etc. By using generic abstract terms to identify classes of "things," you gain the greatest flexibility in relating the data to meet business needs while at the same time reducing the amount of redundant storage you need for the data. teburlew

11. Remember that there may be users outside the United States
When designing a database that will be used on the Web or other international stage, remember that most countries have a different format for fields like ZIP/post codesand some, like New Zealand, do not use these codes. billh

12. If its repetitive, it needs a separate table


If you find yourself repeating an entry, make a new table and a new relationship. Alan Rash

13. Three useful fields that should be added to every table


dRecordCreationDate, default to Now() in VB or GETDATE() in SQL Server sRecordCreator, default to NOT NULL DEFAULT USER in SQL Server nRecordVersion, the version identifier of the record; helps to accurately interpret any missing or null data in that record Peter Ritchie

14. Multiple fields for Address & Phone


One line for the street address is no longer enough. Address_Line1, Address_Line2, Address_Line3 offers more flexibility. Also, telephone numbers and e-mail addresses are no longer address-specific. They probably need their own tables, with type and some kind of preferred flag. dwnerd Be careful not to overnormalize, which can lead to performance problems. While separate address and phone tables commonly are best, it may be appropriate to store the preferred information in a parent table (e.g., Customer) if you will need to access it often. The trade-off between normalization and speed of access can be significant. dhattrem

15. Use multiple name fields


I'm amazed by how many people make name one field in a database. I tend to think that is the sign of a beginning developer, but having seen it enough times on Web sites, I'm not so sure. So enter the first name and last name as separate fields

Page 9

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

(include the middle initial field if it's appropriate); then concatenate the fields later in your queries. klempan Klempan isn't the only one to notice widespread use of a single name field. You have several options for making it user-friendly. One of my favorites is to simply create a computed column in the same table that will automatically concatenate the normalized fields yet change when the data changes. However, this can get tricky when using modeling software. A view is also a great way to insulate users/developers from the tedium of concatenating fields. damon

16. Watch out for mixed-case object names and special characters
Something that has caused me grief in the past is when an existing database I had to work with had mixed-case object names (CustomerData). The problem I ran into was porting from Access to Oracle. I didn't want mixed-case objects, so I had to change them manually. Will this database/application grow to need a larger, more powerful database someday? Use uppercase and include the underscore character for better readability (CUSTOMER_DATA). Another big no-no is putting spaces in object names. bfren

17. Watch out for reserved words


Make sure that none of your field names are reserved words, either for your database system or commonly used access methods. As an example, I recently ODBC-ed to a table that used DESC as the field name for description. Choke! DESC is a reserved word abbreviation for DESCENDING. A SELECT * on the table worked, but I would up pulling a bunch of extra useless stuff across the wire. Daniel Jordan

18. Be consistent with field names and types


When naming fields and specifying their data types, be consistent. If the field is called agreement_number in one table, don't change the name to ref1 in another. If the data type is integer in one table, don't make it char in another. Remember other people will have to work and understand the database after you've moved to the well-paying job where you're more appreciated. setanta

19. Choose numeric types carefully


Beware using smallint and tinyint types in SQL. It may be tempting, but remember that the field type must accommodate any calculations you wish SQL to perform. For example, if you want to see total sales for a month, and your invoice total field is smallint, you won't be able to perform the calculation if the total is over $32,767. egermain
Page 10 2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

20. Flag for deletion rather than delete


Include a "delete flag" field so rows can be flagged for deletion. Never delete rows individually in a relational database; always use purge programs and be careful to maintain referential integrity. kol

21. Avoid triggers


There are usually other ways to accomplish what a trigger does. Triggers can become gotchas later when trying to debug a problem. If you absolutely have to use a trigger, document it centrally. kol

22. Include a versioning mechanism


One suggestion that has always served me well is to include some mechanism in the database to determine which "version" of the database you are using. No matter how hard you try to fix requirements, over time your users requirements will almost always change. Eventually, this may require a change in your database structure. Although you can determine what version of the database structure you are looking at by checking for the existence of new fields or indexes, I have always found it most useful to store this explicitly in a table. Richard Foster

23. Make text fields larger than you need


ID-type text fields, such as customer ID or purchase order number, should be made larger than you think you need because you'll end up needing the extra characters before long. For instance, suppose that your customer ID is based upon a 10-digit sequential value. Make the field 12 or 13 characters long instead. Does this waste space? A little, but not as much as you think: A field with three extra characters would only increase the database size by about 3 megabytes if there were a million records in it, plus a little more for a larger index. But the extra space will allow for growth without needing to restructure the entire database at some future date. How many megabytes would you be willing to sacrifice now to avoid having to restructure a dozen, or two dozen, data tables, or having to update a bunch of programs whose subroutines rely on the length of the field a year from now? tlundin

24. Column naming tip


We find that if you use a unique prefix with a column name for each table, it can greatly simplify the writing of SQL statements. This does have the drawback of breaking those automatic table-linking tools that link on common column names that some databases come with now, but even these tools can sometimes get the join wrong. As a simple example, consider two tables: Customer and Order. The Customer table is given prefix cu_, so its fields would be named: cu_name_id,

Page 11

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

cu_surname, cu_initials, cu_address, etc. Well give the Order table the prefix or_ and name its fields or_order_id, or_cust_name_id, or_quantity, or_description, etc. So the SQL to select a trivial given row from this database looks like this:
Select * from Customer, Order Where cu_surname = "MYNAME" and cu_name_id = or_cust_name_id and or_quantity = 1;

While without those field prefixes, it would look like this:


Select * from Customer, Order Where Customer.surname = "MYNAME" and Customer.name_id = Order.cust_name_id and Order.quantity = 1

There is a lot less typing involved in the first SQL statement, even in this trivial example. Expand this to a query with five tables and many more columns and this tips usefulness becomes more apparent. Bryce Stenberg

Page 12

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

Section 3Key Selection and indexing


1. Plan ahead for data mining
I learned the hard way. After our marketing department called over 80,000 contacts and filled out the necessary data on each customer (no small task, I might add), I decided to target-market certain groups of clients. When I initially designed the tables for the form fields, I tried not to add too many fields to the primary index so as to speed the database up. I then realized that specific group lookups and mining were inaccurate and slow. I rebuilt and remerged the data with the proper fields in the primary index. I found that index planning is criticalwhy have fax number as a primary indexed field when I want to create lookups for system type instead? I can still search by fax number, but it is not nearly as important to me as the system type. By making the latter a primary field, the database is reindexed as it is updated and searches are much faster. hscovell Thats the difference between indexing in an operational data store (ODS) environment vs. in a data warehouse (DW) environment. In a DW environment, you need to consider how your marketing department is going to construct their campaigns. They, not the DBA, should be defining what constitutes key information. This is a case where an architect or database marketer should analyze the structure to determine best case for both performance and validity of output. teburlew

2. Use system-generated primary keys


This is similar to [tip number 1], but I feel it's important to repeat. If you always design your database to use system-generated keys as the primary keys, you control the referential integrity of the database. This way, the databases and not humans control access to each row of data stored. An added advantage in using system-generated keys for primary keys is that it is easier to identify logic flaws when going through dumps when you have a consistent key structure. teburlew

3. Break up fields for indexing


Along with separating name fields and inclusion of fields to support user-defined reports, consider breaking other fields, even primary keys, into their component elements so that they may then be indexed. Indexing will increase the speed of execution of SQL and report generator scripts. For example, I routinely create reports where I have to use a SQL LIKE expression because a case number field was not separated into its basic parts of year, serial number, case type, and

Page 13

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

defendant code. Performance is generally bad, and these reports would run much faster if the year and type fields were separate indexed fields. rdelval

4. Four key rules for keys


Always create foreign keys for linked fields. All keys should be unique. Avoid compound keys. A foreign key should always link to a unique key field. Peter Ritchie

5. Don't forget the index


Indexing is one of the most efficient ways to retrieve your data. Ninety-five percent of my performance and tuning issues have been resolved with an index. As a rule of thumb, I generally use a unique clustered index on the logical primary key, a unique nonclustered index on the system key (for stored procedures), and nonclustered indexes on any foreign key columns. Remember though, indexes are like salttoo much can be a bad thing. Consider how much room you have for the database, how the table is going to be accessed, and whether that access will primarily be for read or write. tduvall Most databases index primary key fields automatically upon creation, but don't forget to index foreign key fieldsthey'll be used every time you want to run a query that shows a record from the primary table and all related tables. Also, don't index memo/notes fields and try to stay away from indexing large (many characters) fields this will make your indexes take up more space. gbrayton

6. Dont index small, high-activity tables


Do not give any keys to small tables, especially if they have high amounts of insert and delete activity. The index maintenance on those inserts and deletes may cost you more time than a table space scan. kbpatel

7. Never use Social Security Number(SSN) as a key


One should never use SSN as a database key. Aside from the privacy angle and the fact that the government is moving toward disallowing the use of SSN except for income-related purposes, it needs to be hand entered. Never ever use a handentered key as the primary key since once you enter it wrong, the only choice you have is to delete the entire record and start over. teburlew

Page 14

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

When I was in college in the 1970s, I recall that the SSN was used as the student ID despite the fact that such usage was illegal. People knew that it was illegal, but they used it anyway. Decades later, as identity theft increases, the college campus I'm on now is going through the pains of removing SSN from those screens and reports that use them but don't need them. It is a major problem, mandated by the state but not funded. generalist

8. Take the users keys away


When deciding which field or fields to use as keys in a table, always consider the fields that users will be editing. Its usually a bad idea to choose a user-editable field as a key. Doing so forces you to take one of these two actions: Restrict the user from editing the field after the records creation. If you do so, you may discover that your application isnt flexible enough when business requirements suddenly change and users need to edit that uneditable field. What happens when a user makes a mistake in data entry and doesnt notice until the record is saved? Delete and re-create? What if the record isnt re-creatable; suppose the customer left? Provide some way of detecting and correcting key collisions. Usually, this can be done with some effort, but it is expensive in terms of performance. Also, a key correction may wind up being possible only from outside the data layer, forcing you to break the isolation between your data and business/UI layers. The underlying maxim here is this: Make your design fit the user; dont make the user fit the design. Lamont Adams The reason we don't make primary keys updateable is that in a relational model, they provide the links between the various tables. For example, the Customer table will have a primary key (say, CustomerID) and customers will place orders, kept in a separate table. The primary key of the Order table may well be something like OrderNo (a unique number) or a composite of OrderNo, CustomerID, and date. Whichever key you choose, you will need to store the CustomerID on the Order table to ensure that you can find the record for the customer who placed each order. If you change the CustomerID in the Customer table, you must find all related records in the Order table and change them too. Otherwise, you will have orders that don't belong to a customeryou will upset the referential integrity of your database. If referential integrity rules are enforced at table level, which they should be, then it can be almost impossible to change the key of one record and all associated

Page 15

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

records throughout the database without a lot of code and appending and deleting of records. This process is frequently prone to errors and should be avoided. ljboast

9. Candidate keys sometimes make the best primary key


Remember, humans are the ones who have to query the data. Although not always possible, if you have a candidate key, go ahead and use it as a primary key. That way, you have the value everywhere it is referenced. This keeps people using the database from having to join tables to properly filter data. On a database with tightly controlled domain tables, this overhead can be significant. If something is a true candidate key, it meets criteria for a primary key! My point is if you have a candidate key, such as state_code in a state table, don't create a sequential key on top of the existing key that cannot change and is unique. You've done nothing but create extra worthless data. Consider the example below: Now instead of:
Select count(*) from address, state_ref where address.state_id = state_ref.state_id and state_ref.state_code = 'TN'

I do:
Select count(*) from address where and state_code = 'TN'

If you get several of these simple joins caused by the overuse of sequential keys in a table, the overhead can really mount. Stocker

10. Dont forget your foreign keys


Most databases index primary key fields automatically upon creation. But don't forget to index foreign key fieldsthey'll be used every time you want to run a query that shows a record from the primary table and its related records. Also, don't index memo/notes fields and try to stay away from indexing large (many characters) text fieldsthis will make your indexes take up more space. gbrayton

Page 16

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

Section 4Ensuring data integrity


1. Use constraints to enforce data integrity, not business rules
If you are dealing with requirements that are based on business rules, they should be validated in the business layer/UI: If the business rules later change, updates need only be made in one place. If the requirements are based on the need to maintain data integrity, they should be validated through constraints in the database layer. If you do use constraints in the data layer, make sure there is a way to communicate the reason why an update failed a constraint check back to the UI, in language the user understands. Unless you have been very verbose in your field naming, field names themselves are rarely sufficient. Lamont Adams Whenever possible, use the database system for data integrity. This not only includes integrity by design through normalization but also by functionality. Add triggers to ensure that data is correct when written. Do not rely on the business layer to ensure data integrity; it can't ensure cross-table (foreign key) integrity so don't force other integrity rules. Peter Ritchie

2. Distributed data systems


For distributed systems, estimate your amount of data after five years (medium) or 10 years (large) before you decide whether to replicate all your data at every site or keep your data only at one place. When you transfer data to other sites, its better to set some flags in a database field. Update your flags after the targeted sites have received your data. To carry out the transfer, write your own batch processing or scheduling program to run at specific time intervals rather than asking a user to post it at the end of the day. Copy your maintenance data, like calculation constants and interest rates, locally and set a version number to make sure that the data is the same at every site. SuhairTechRepublic

3. Enforce referential integrity


There is no good way to eliminate bad data after it's in the database, so you should attempt to eliminate it before its in the database. Activate the database systems referential integrity feature. This will keep your data clean but will force developers to put more time into handling error conditions. kol

Page 17

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

4. Relationships
If there is a many-to-one relationship between two entities, and there is any remote possibility that it could turn into a many-to-many relationship, make it many-to-many to start with. It is harder to go from an existing many-to-one relationship to a manyto-many relationship than it is to have a many-to-many relationship to begin with. CS Data Architect

5. Use views
To provide another layer of abstraction between your database and your applications code, try building views specifically for the use of your application rather than let it access tables directly. This also provides you with a little more freedom when handling database changes. Gay Howe

6. Plan for data retention and recovery


Think through the data retention policy and build it into the design. Design your data recovery processes up frontyou will need them. Use a data dictionary that can be published to users/developers for easy data identification and be sure to document data sources. Write online updates to "update queues" that can be used later to reprocess updates in case of data loss. kol

7. Use stored procedures to let the system do the hard work


Having gone to lots of trouble to generate a high-integrity DB solution, my team (rightly!) decided to encapsulate small groups of functionally related tables by providing a suite of regular stored procedures to access each group in order to speed up and simplify client code development. During this, we found the common approach from 3GL coders was to trap all possible error conditions, as per standard 3GL good practice:
SELECT Cnt = COUNT (*) FROM [<Table>] WHERE [<primary key column>] = <new value> IF Cnt = 0 BEGIN INSERT INTO [<Table>] ( [< primary key column>] ) VALUES ( <New value> ) END ELSE BEGIN <indicate duplication error> END

Whereas one non-3GL coder would rather do the following:


INSERT INTO [<Table>] ( [< primary key column>] ) VALUES ( <New value> )

Page 18

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

IF @@ERROR = 2627 -- Literal error code for Primary Key Constraint BEGIN <indicate duplication error> END

The second is a lot simpler, and in fact, utilizes the power we have given the database by all that integrity-ensuring effort. Although I personally don't like the use of the embedded literals (2627), that can be easily replaced with a bit of preprocessing. Remember, the DB is not just a repository for data; it can empower and simplify your coding. a-smith

8. Make use of lookups


The best way to control data integrity is to limit a user's choices. Wherever possible, have a distinct list of values for a user to select from. This will cut down on typing errors and misunderstandings and provide consistency in the data. Some common data thats good for turning into lookups: state codes, status codes, titles, etc. CS Data Architect

Page 19

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

Section 5Miscellaneous tips


1. Document, document, document
Document and explain any shorthand, naming conventions, restrictions, functions, etc. nickypendragon Use the database facility of commenting tables, columns, triggers, etc. Yes, it is more work but pays huge dividends in the long run, for further development, support, and tracking of modifications. chardove Depending on what database system you use, there may be some software that will give you a decent start on the documentation. You might want to start with the largest pieces and work inward, getting more and more detailed. Or you might want to do a lifecycle walkthrough, starting when new data is entered and detailing each piece as you go. No matter how you choose to do it, always document your databaseseither within the database itself or in a separate document. That way, when you come back a year from now to do "version 2," or when another developer steps in, you'll be less likely to make any blunders. mrs_helm

2. Use plain English (or whatever your language is) instead of codes
There are a number of good reasons why we use codes for things (e.g., 9935A might be the supply code for a box of ink pens, 4XF788-Q might be the accounting code for a business you buy things from). That's great, but users tend to think in English, not codes. The accountant who's been there for five years probably knows exactly who 4XF788-Q is, but the new accountant won't have a clue. When creating pull-down menus, lists, reports, etc., sort them by the plain-English names. If you need a code, show the user the plain-English names beside the codes. I also put a pop-up help statement telling the user that after they make their selection, only the code will appear. amasa

3. Keep some general information around


I have also found it most useful to have a table containing general database information. In that table, I place information such as the current version of the database, the date it was last checked/repaired (for Access), the name(s) of related design documents, customer information, etc. This provides a simple mechanism for keeping track of the database, especially useful in non-client/server situations when customers complain their database is not working as expected and e-mail or FTP the file to you. Richard Foster

Page 20

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

Database design guide

www.techrepublic.com

4. Test, test, and test again


After building or revising a database, it is a must to always test the data fields with live input from users. Most importantly, run user tests and work with users to ensure that the data types you chose fit the needs and requirements of the business. Testing needs to be accomplished before putting the new database into service live on the system. juneebug

5. Validating the design


A general technique for validating the database design during development is to look at the database through the prototype of the application it is supporting. In other words, for each area of the prototyped application that will eventually show data, make sure that you can look at the data model and see how the data will be extracted. jgootee

6. Access-specific design tip


For complex Microsoft Access database applications, put all of the primary tables into one database file; then add other database files that carry out specific functions relating to the original tables. Link to the primary tables in the primary file as needed to carry out those functions. Examples include data entry, data QC, statistical analysis, reports to management or governmental agencies, and various types of read-only queries. This approach simplifies assignment of user and group permissions, and it also groups and compartmentalizes application functions, making them easier to manage when it becomes necessary to modify them. Dennis Walden

Page 21

2001 TechRepublic, Inc. www.techrepublic.com. All rights reserved.

You might also like