You are on page 1of 2

Computing for Clinical Scientists (SBI-101)

Extract a data structure into third normal form using a formalised process. (Competency 5)
 Third normal form.
 Relational database concepts (primary key, joins, etc.).

Dataset (prepared by Stuart Hardwick) represents a set of data of Patient IDs, and appointment
information with 4 sets of results.

S:\SAH\1st Yrs\Trainee Tutorials\SQL\MOCK_DATA.csv

Figure 1. Sample extract of the patient and appointment data (.csv format)
PatientID First Name Surname DOB Gender DOA TOA Clinician Result1 Result2 Result3 Result4

5281586483 Corrina Haithwaite 17-Mar-64 0 29-Jun-17 06:23 Stuart 1 3 4 1

1674236905 Fina Duffer 28-Jun-41 1 20-Mar-17 06:35 Steve 3 4 3 0

827592388 Karlie Genever 11-Oct-70 1 03-Jul-17 08:19 Steve 1 2 2 0

7278668875 Sidonia Messum 20-Mar-84 0 21-Jul-17 06:02 Steve 2 1 3 0

9168997159 Yorker Hazelhurst 06-Jul-68 0 21-Sep-16 06:49 Steve 2 1 3 0

3093140722 Patrice Cocci 13-Nov-75 1 15-Feb-17 06:02 Stuart 4 2 1 1

7184550072 Hulda Elen 26-Apr-58 0 16-Dec-16 05:21 Stuart 2 2 1 0

7643201781 Godiva Kerin 29-Aug-89 1 11-Apr-17 07:17 Matt 4 4 1 1

1116779218 Erny Kensall 23-Feb-46 1 22-Mar-17 08:43 Stuart 1 3 1 1

3578819182 Elie Doig 22-May-45 0 21-Sep-16 06:48 Steve 3 4 3 0

1793329826 Mercy Stranks 27-Apr-73 0 11-May-17 08:40 Stuart 4 4 2 0

3158715821 Gertrudis Fritschel 10-Feb-49 1 21-Feb-17 08:17 Steve 4 1 2 0

Database normalisation allows for handling, manipulation and editing of a table. Normalisation is
organising the data into multiple tables to avoid/reduce data redundancy. Additionally, reducing the
redundancy reduces the overall size of the database.

Normalisation of a database is a multi-step process:

First Normal Form (1NF): each row must be individual as a whole. This step ensures that each
column contains only one value. While redundancy is increased, each row is still unique. In the case
of the patient and appointment data, we would start by ensuring that each of the rows are unique.
PatientID First Name Surname DOB Gender DOA TOA Clinician Result1 Result2 Result3 Result4

Second Normal Form (2NF): this step of normalisation introduces a primary and unique key for each
of the rows within the table. In this case, information that does not describe the primary key can be
separated into new tables. To achieve second normal form, the table must first be in first normal
form and additionally, all non-key columns are dependent on the primary key.
Third Normal Form (3NF): every non-prime attribute within the table must be dependent on a
primary key. In order to achieve third normal form this may require the introduction of additional
primary keys (such as appointment ID or clinician ID) in order to provide unique identifiers. Data
within third normal form is more transitive than second normal form: in order to minimise the
redundancy, multiple tables are connected (e.g. you cannot access results from the patient name,
rather will have to use patient ID and Appointment ID to access the results table via INNER JOIN.)

Second and third normal form are often achieved simultaneously. In the case of this patient and
appointment data, we split the original table into 4 tables, as a way of reducing redundancy and
generating a single primary key for each table. Each table must also include a foreign key in order to
connect the tables (Figure 2.)

Figure 2. Organisation of 4 separate tables in order to achieve third normal form.

PatientID First Surname DOB Gender


Name

Clinician ID Time of Date of Appointment Patient ID


Arrival Arrival ID

Clinician ID Clinician Name

Appointment Results ID Results 1 Results 2 Results 3 Results 4


ID

Using a UML diagram we are able to visualise the connections between the normalised dataset. This
diagram represents the number of relational interactions between tables. For example, 1:1
relationships, the primary key for one of the tables is included as foreign key in the other table.
1:many relationships. Primary key of the table in the ’1’ side is added as foreign key in the table in
the opposing size side, the interaction can occur with more than one entry. ‘Many to Many’
relationships. A new table (join table) is created. The primary key is composed of the primary keys
from the two original tables. These relationships are represented in figure 3.

Figure 3. EER (enhanced entity relationship) diagram representing the interactions of data tables.

You might also like