You are on page 1of 79

4.

) Column Value Management


Objectives

Upon completion of this module, the student should be able to:

Insert into a table using the DEFAULT VALUES feature.

Perform string functions using the POSITION feature.

Test case sensitivity with the LOWER attribute.

Rename a column of a table.

Using DEFAULT VALUES

Teradata has the ability to insert a row using only the DEFAULT VALUES keywords.
For this feature to work successfully, one of the following statements must be true
for each column of the table:

- the column has defined default values

- the column has a default system value specified

- the column permits nulls

If none of these statements is true, an insert using DEFAULT VALUES will fail. Note
that such an insert may be executed multiple times as long as no uniqueness
attributes of the table are violated.
Column Attributes:

NOT NULL - Nulls are not permitted for this column

DEFAULT 22 - Unless otherwise specified, assign the column a value of 22

DEFAULT DATE '2010-01-01' - Unless otherwise specified, assign a date of


Jan 1, 2010

WITH DEFAULT - Assign the system default, spaces for char strings, zero for
numeric data types and current date for date data type

DEFAULT TIME - Assign the current time to the integer data type

DEFAULT USER - Assign the user id of the session to the character string

The command - INSERT INTO tablename DEFAULT VALUES;

Will insert defined default values into each column.

Will insert a null if no default is defined.

Will fail if no default is defined and a null is not allowed.

Create a Test table


CREATE TABLE test_tbl
(cola SMALLINT NOT NULL DEFAULT 22
,colb CHAR(1)
,colc DATE DEFAULT DATE '2010-01-01'
,cold DEC(3,2) NOT NULL WITH DEFAULT
,cole TIME(0) DEFAULT CURRENT_TIME
,colf INT DEFAULT TIME
,colg CHAR(8) DEFAULT USER);

Populate the test table


INSERT
SELECT
cola
-----22
INSERT
SELECT
cola
-----22
22

INTO test_tbl DEFAULT


* FROM test_tbl;
colb
colc
cold
---- -------- ----?
10/01/01
.00
INTO test_tbl DEFAULT
* FROM test_tbl;
colb
colc
cold
---- -------- ----?
10/01/01
.00
?
10/01/01
.00

VALUES;
cole
-------15:27:31

colf
----------152731

colg
-------TD036

cole
-------15:27:31
15:27:43

colf
----------152731
152743

colg
-------TD036
TD036

VALUES;

Defaulting Methods

Defaulting of data values may also be accomplished by the use of positional


commas in an INSERT statement. The positional commas indicate the use of a
default value if one is specified and a null if one is not. If neither outcome is
possible, an error is returned.
Let's look at the traditional method of defaulting values.

The INSERT
INSERT INTO test_tbl VALUES (,,,,,,);

The SELECT
SELECT * FROM test_tbl;
cola colb
colc
cold
------ ---- -------- -----

cole
--------

colf
-----------

colg
--------

22
22
22

?
?
?

10/01/01
10/01/01
10/01/01

.00
.00
.00

15:27:31
15:27:43
15:33:13

152731
152743
153313

TD036
TD036
TD036

While it is possible to alter a column definition via an ALTER TABLE statement, care
must be used to not add attributes which are in conflict with existing attributes or
existing data. Adding a new column with a NOT NULL attribute to a table returns an
error if the table already has rows defined. The new column must initially be set to
either a null or a value for the existing rows.

The ALTER TABLE


ALTER TABLE test_tbl ADD colh SMALLINT NOT NULL;
***Failure 3559 Column COLH is not NULL and it has no default value.
By adding the WITH DEFAULT phrase, the NOT NULL attribute is permitted. The new
column being added will carry the system default value initially.

The Correct ALTER TABLE


ALTER TABLE test_tbl ADD colh SMALLINT NOT NULL WITH DEFAULT;

The INSERT
INSERT INTO test_tbl DEFAULT VALUES;

The SELECT
SELECT
cola
-----22
22
22
22

* FROM test_tbl;
colb
colc
cold
---- -------- ----?
10/01/01
.00
?
10/01/01
.00
?
10/01/01
.00
?
10/01/01
.00

cole
-------15:27:31
15:27:43
15:33:13
15:38:42

colf
----------152731
152743
153313
153842

colg
-------TD036
TD036
TD036
TD036

colh
-----0
0
0
0

Creating a Table
CREATE TABLE abc (a INT NOT NULL)

The INSERT
INSERT INTO abc DEFAULT VALUES;
***Failure 3811 Column 'a' is NOT NULL. Give the column a value.

Creating Tables in Teradata Mode

Tables may be created in either Teradata mode or ANSI mode. Tables created in
Teradata mode will have all character columns defined as NON CASESPECIFIC by

default. This means that the data will be stored in the column in the same case that
it was entered. It will also be returned to the user in this stored case, however all
testing against the column will ignore case specificity.
The default comparison operation for non-casespecific columns is always noncasespecific.

CREATE a Table
CREATE TABLE case_nsp_test
(col1 CHAR(7)
,col2 CHAR(7));
(Columns created in Teradata mode are defaulted to NOT CASESPECIFIC)

SHOW the Table


SHOW TABLE case_nsp_test;
CREATE SET TABLE PED.case_nsp_test ,NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
col1 CHAR(7) CHARACTER SET LATIN NOT CASESPECIFIC,
col2 CHAR(7) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX (col1);

INSERT into the Table


INSERT INTO case_nsp_test VALUES('LAPTOP','laptop');

SELECT from the Table


SELECT * FROM case_nsp_test WHERE col1=col2;

Result
col1
LAPTOP

col2
laptop

Because both columns are defined as NCS, all comparision tests will be done NCS.

Creating ANSI Mode Tables

Tables created in ANSI mode will have character columns defaulted to


CASESPECIFIC. As before, data will be stored and retrieved in the case of the
original INSERT, however all tests and comparisons of the data will be done in

casespecific mode.

In order to do non-casespecific testing, also called case blind testing, it is necessary to


apply a function such as the UPPER function to both sides of the comparison, thus
rendering case a non-factor in the test.
Example
Initiate an ANSI session.
.SET SESSION TRANSACTION ANSI;
.LOGON L7544/tdxxx;

CREATE a Table
CREATE TABLE case_sp_test
(col1 CHAR(7)
,col2 CHAR(7));

(Columns created in ANSI mode are defaulted to Casespecific.)


SHOW the Table
SHOW TABLE case_sp_test;

CREATE MULTISET TABLE tdxxx.case_sp_test,NO FALLBACK,


NO BEFORE JOURNAL,
NO AFTER JOURNAL
( col1 CHAR(7) CHARACTER SET LATIN CASESPECIFIC,
col2 CHAR(7) CHARACTER SET LATIN CASESPECIFIC)
PRIMARY INDEX (col1);
INSERT into the Table
INSERT INTO case_sp_test VALUES('LAPTOP','laptop');

SELECT from the Table


SELECT * FROM case_sp_test WHERE col1=col2;
***Query completed. No rows found.

SELECT using UPPER


SELECT * FROM case_sp_test WHERE UPPER(col1)=UPPER(col2);
col1
col2
------- ------LAPTOP
laptop

In ANSI, it is necessary to perform a case blind test to do non-case specific testing.

The LOWER Function (1 of 2)

The LOWER function behaves similarly to the UPPER function but in the reverse
direction. LOWER may be used as the choice for case blind test, just as UPPER may.
The LOWER function:

Allows case blind testing on case specific strings.

Allows storage and retrieval of lower case characters.

Example

SELECT *
col1
------LAPTOP

FROM case_sp_test WHERE LOWER(col1)=LOWER(col2);


col2
------laptop

Note: Case blind test.

UPDATE the Table


UPDATE case_sp_test SET col2=col1;
SELECT *
col1
------LAPTOP

FROM case_sp_test;
col2
------LAPTOP

Both LOWER and UPPER may be used to change the stored contents of a column from
one case to another. This may be done via an UPDATE statement which applies the
function to the updated column values.

UPDATE the Table


UPDATE case_sp_Test SET col1=LOWER(col1);
SELECT * FROM case_sp_test;
col1
col2
------- ------laptop
LAPTOP
The LOWER function provides the reverse capabilities of the UPPER function.

The LOWER Function (2 of 2)


A third way to accomplish a case blind test is via use of the NOT CASESPECIFIC
attribute applied to the column test. This may be abbreviated to NOT CS and is
applied to the column within parenthesis following the column name. Unlike UPPER
and LOWER, NOT CS is not a function but rather an attribute of the column, to be

applied to the column as part of the test in this situation.


Note that the use of NOT CS, while accomplishing the same case blind test, is not
considered ANSI standard syntax. If ANSI standard compliance is required, the case
blind test should be done using either LOWER or UPPER.
Example
Let's INSERT a second row into the table.

INSERT INTO case_sp_test VALUES ('LAPTOP','LAPTOP');


SELECT *
col1
------laptop
LAPTOP

FROM case_sp_test;
col2
------LAPTOP
LAPTOP

SELECT * FROM case_sp_test WHERE col1=col2;


col1
col2
------- ------LAPTOP
LAPTOP
Note: Case sensitive result.

SELECT * FROM case_sp_test WHERE col1=LOWER(col2);


col1
col2
------- ------laptop
LAPTOP
SELECT *
col1
------laptop
LAPTOP

FROM case_sp_test WHERE col1(NOT CS)=col2(NOT CS);


col2
------LAPTOP
LAPTOP

Note: Case blind test but non-ANSI syntax.

SELECT LOWER (col1) FROM case_sp_test;


Lower(col1)
----------laptop
laptop
Note: Convert display to lowercase.

POSITION Function

The POSITION function is the ANSI standard form of the INDEX function of Teradata
SQL. They are both used for locating the position of a string within a string.
Both functions require two arguments, the column or character string to be tested,
and the character or string of characters to be located.
With the INDEX function, the two arguments are separated with a comma.
With the POSITION function, the more English-like IN keyword is used in place of a
comma.
While both functions will continue to be available for compatibility purposes, it is
suggested that future coding be done with POSITION.
Example

SELECT INDEX ('laptop','p');


Index('laptop','p')
------------------3
SELECT INDEX ('laptop','top');
Index('laptop','top')
--------------------4
The POSITION function is the ANSI standard function for locating a string within a
string.

SELECT POSITION ('p' IN 'laptop');


Position('p' in 'laptop')
------------------------3
SELECT POSITION ('top' IN 'laptop');
Position('top' in 'laptop')
--------------------------4
Both POSITION and INDEX are available functions, but only POSITION is
ANSI standard.

POSITION and Case Sensitivity

Care must be exercised in executing the same scripts in both ANSI and Teradata
session mode, particularly where issues of case sensitivity are involved. This may
become very significant in using the POSITION function which will or will not
attempt to match case depending on the session type.

Note that in three of the five examples shown here, the ANSI session produces a
different result than the Teradata session.
Examples
Note: "ANSI result" implies an ANSI transaction session mode. "Teradata result"
implies a BTET transaction session mode.

SELECT POSITION ('top' IN 'laptop');

ANSI result Teradata result


4
4
SELECT POSITION ('TOP' IN 'laptop');

ANSI result Teradata result


0
4
SELECT POSITION (upper('top') IN 'laptop');

ANSI result Teradata result


0
4
SELECT POSITION (lower('TOP') IN 'laptop');

ANSI result Teradata result


4
4
SELECT POSITION (lower('TOP') IN 'LAPTOP');

ANSI result Teradata result


0
4
Renaming Columns

Columns in a table may be renamed using the ALTER TABLE command. In order to
qualify for renaming, a column must not be referenced by any external objects and
must be assigned a new name not already in use by the table.
A column which participates in any index is not a candidate for renaming. Likewise,
a column which is either a referenced or referencing column in a referential
integrity constraint may not be renamed.
Note that renaming a column does not cascade the new name to any macros or
views which reference it. The views and macros will no longer function properly
until they have been updated to reflect the new name.
A column may be renamed to a different name provided that:

The new name doesn't already exist in the table.

The column is not part of an index.

The column is not part of any referential integrity contraints.

The affected column is not referenced in the UPDATE OF clause of a trigger.

Example

CREATE a Table
CREATE TABLE rename_test
(col1 INT
,col2 INT
,col3 INT)
UNIQUE PRIMARY INDEX (col1)
,INDEX (col3);

Now ALTER it
ALTER TABLE rename_test RENAME col2 TO colb;

SHOW the Table


SHOW TABLE rename_test;
CREATE SET TABLE PED.rename_test ,NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
col1 INTEGER,
colb INTEGER,
col3 INTEGER)
UNIQUE PRIMARY INDEX (col1)
INDEX (col3);

ALTER it Again
ALTER TABLE rename_test RENAME col1 TO cola;
Result

Failure:

Column COL1 is an index column and cannot be modified.

ALTER it Again
ALTER TABLE rename_test RENAME col3 TO colc;
Result

Failure:

Lab

Column COL3 is an index column and cannot be modified.

Try It!

For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com

Be sure to change your default database to the Customer_Service


database in order to run these labs.
Click on the buttons to the left to see the answers.
Answers:
Lab A
Lab B
Lab C
Lab D
Lab E

A. Show the first and last name of any employee in the employee table who
uses an initial followed by a period, instead of his/her first name. Use
position to solve this.
B. Show any employee who has an upper case letter B in his first name, but
in a position other than the first position. Use the POSITION function in
an ANSI mode session to solve this. While in ANSI mode, use the LIKE
operator to solve this problem again. Try these solutions in Teradata
(BTET) mode. Do they produce the same result? Is there a solution to this
problem in Teradata mode?
C. Show first and last name of any employee who has the letter 'a' in the
same position in both their first and last name. (Return to BTET mode
first.)
D. Display first names in lowercase and last names in uppercase for
employees whose last name begins with the letter R. Use Position to solve
this.
E. Create a small table according to the following definition.
Hint: For the remainder of these labs, it will be helpful to reset your
default database to your userid. (DATABASE tdxxx;)
F.
G.
H.
I.

CREATE TABLE rename_tbl


(col1
INT
,col2
INT
,col3
INT);
Populate the table with one row as follows. INSERT INTO
rename_tbl (1,2,3);
Create a view to access this table as follows:

CREATE VIEW rename_view AS


SELECT col1 AS vcol1 ,col2 AS vcol2
FROM rename_tbl;
Select the row via the view.
Attempt to rename col1 of rename_tbl to be colA.
Attempt to rename col2 of the rename_tbl to be colB.
Attempt to select the row from the view again.
Replace the view with the renamed column.
Attempt to select the row from the view again.

5.) Identity Columns and Key Retrieval


Objectives

After completing this module, you should be able to:

Use the Identity Column feature to generate an identity column for a table.

Use this feature to implement a Primary Key column.

Use this feature to implement a Unique column.

Automatically retrieve generated identity column values after row insertion.

Identity Column Features

The Generated Identity Column feature permits the automated generation of a


column value based on a prescribed sequencing and interval set. A typical
application of this feature would be to generated the values associated with a
system assigned primary key.
The following are options which can be used with this feature:
GENERATED ALWAYS - will always generate a value for this column, whether or not
a value has been specified.
GENERATED BY DEFAULT - will generate a value for this column only when
defaulting or a null is specified for the column value.
START WITH - the value which will be used to start the generated sequence.

(default is 1)
INCREMENT BY - the interval which is used for each new generated value. (default
is 1)
MINVALUE - the smallest value which can be place in this column (default is
smallest value supported by the data type of the column)
MAXVALUE - the largest value which can be place in this column (default is largest
value supported by the data type of the column)
CYCLE - after the maxvalue has been generated, restart the generated values
using the MINVALUE.
Only numeric data types from the following list may be used for identity columns:

INTEGER

SMALLINT

BYTEINT

DECIMAL

NUMERIC

The identity column feature only supports whole numbers.

Use of ALWAYS, MAXVALUE and CYCLE


Create a table with an ALWAYS system-generated unique primary index with a
maximum value of 3.
CREATE SET TABLE test_idCol ,FALLBACK
(
Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 3)
,Col2 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX ( Col1 );
Populate the table as follows:
INSERT INTO test_idcol VALUES (, 9);
INSERT INTO test_idcol VALUES (NULL, 9);
INSERT INTO test_idcol VALUES (6, 9);

*** Warning: 5789 Value for Identity Column is replaced by a


system-generated number

SELECT * FROM test_idCol ORDER BY 1;


Col1
----------1
2
3

Col2
----------9
9
9

Things to notice:
A value is generated regardless of whether:

a default is specified

a null is specified

a value is specified

When a value is specified and replaced, a warning is given.


Now, let's add another row.
INSERT INTO test_idcol VALUES (7, 9);
***Failure 5753 Numbering for Identity Column Col1 is over its limit.

It was not possible to add another row, because the MAXVALUE has been achieved.
In fact, no new row may be added to this table ever again.
Remove the rows from the table.
DELETE FROM test_idCol;
Now try to add the additional row.
INSERT INTO test_idcol VALUES (7, 9);
***Failure 5753 Numbering for Identity Column Col1 is over its limit.

Note that even after you delete all rows from the table, you cannot add new rows if
you have exceeded MAXVALUE and you have not specifed the CYCLE option. You
must drop and recreate this table in order to add rows.
Create another table similar to the previous one, but with the CYCLE option.
CREATE SET TABLE TEST_ID3 ,FALLBACK
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 3
Col2 INTEGER)
UNIQUE PRIMARY INDEX ( Col1 );
Again, insert the first three rows as follows:
INSERT INTO test_id3 VALUES (, 9);

CYCLE),

INSERT INTO test_id3 VALUES (NULL, 9);


INSERT INTO test_id3 VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a system-generated
number

SELECT * FROM test_id3 ORDER BY 1;


Col1
Col2
----------- ----------1
9
2
9
3
9
Now, attempt to execute the same three insert statements a second time.
INSERT INTO test_id3 VALUES (, 9);
INSERT INTO test_id3 VALUES (NULL, 9);
INSERT INTO test_id3 VALUES (6, 9);

*** Warning: 5789 Value for Identity Column is replaced by a


system-generated number
SELECT * FROM test_id3 ORDER BY 1;
Col1
-----------2147483647
-2147483646
-2147483645
1
2
3

Col2
----------9
9
9
9
9
9

Notice what happens here:

After hitting the MAXVALUE, it reverts to the default minimum value.

It does not revert to the START WITH value, which is 1.

The default minimum value for an integer is approximately negative two


billion.

Each subsequent insert increments the value by the default increment of 1.

Use of ALWAYS, MINVALUE and CYCLE


Drop and recreate table with a MINVALUE of 1 specified.
DROP TABLE test_id3;
CREATE SET TABLE test_id3
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 3

CYCLE),
Col2 INTEGER)
UNIQUE PRIMARY INDEX ( Col1 );
Insert the same three starter rows.
INSERT INTO test_id3 VALUES (, 9);
INSERT INTO test_id3 VALUES (NULL, 9);
INSERT INTO test_id3 VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
2
3

Col2
----------9
9
9

Now, insert a 4th row.


INSERT INTO test_id3 VALUES (NULL, 9);
***Failure 2801 Duplicate unique prime key error in PaulD.TEST_ID3.
What happened here?

The fourth row would have made Col1 = 1 due to the CYCLE option.

This violates the uniqueness of the primary index Col1, thus it


is rejected.

Drop and recreate the table with a MINVALUE specified and with a nonunique primary index.
DROP TABLE test_id3;
CREATE SET TABLE TEST_ID3
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 3
CYCLE),
Col2 INTEGER)
PRIMARY INDEX ( Col1 );
Insert the same three starter rows.
INSERT INTO test_id3 VALUES (, 9);
INSERT INTO test_id3 VALUES (NULL, 9);
INSERT INTO test_id3 VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number
Now try to add the 4th row.

INSERT INTO test_idcol VALUES (NULL, 9);


***Failure 2802 Duplicate row error in PaulD.TEST_ID3.
What happened this time?

The fourth row would have had values (1,9) due to the CYCLE
option..

This violates the 'no duplicate row' rule, thus it is rejected.

The following rows are inserted.


INSERT INTO test_id3 VALUES (, 7);
INSERT INTO test_id3 VALUES (NULL, 7);
INSERT INTO test_id3 VALUES (6, 7);
*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number.
SELECT * FROM test_id3 ORDER BY 2,1;
Col1
----------1
2
3
1
2
3

Col2
----------7
7
7
9
9
9

All rows are inserted successfully and the identity column recycles.

Handling Gaps in the Sequence


Now, let's drop and recreate the table as previously with a Unique Primary Index.
DROP TABLE test_id3;
CREATE SET TABLE test_id3
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1
MAXVALUE 5 CYCLE),
Col2 INTEGER)
UNIQUE PRIMARY INDEX ( Col1 );
Insert the same three starter rows.
INSERT INTO test_id3 VALUES (, 9);
INSERT INTO test_id3 VALUES (NULL, 9);
INSERT INTO test_id3 VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a system-generated number

SELECT * FROM test_id3 ORDER BY 1;


Col1
----------1
2
3

Col2
----------9
9
9

Remove the second row inserted.


DELETE FROM test_id3 WHERE Col1 = 2;
Insert an additional row.
INSERT INTO test_id3 VALUES (NULL, 9);
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
3
4

Col2
----------9
9
9

What happened here?

Gaps in the sequence are not filled.

The column always knows what its last used value is.

It increments that value and attempts to assign the next value.

Now, add the following two additional rows.


INSERT INTO test_id3 VALUES (, 9); - adds the (5,9) row
INSERT INTO test_id3 VALUES (, 9); - fails

*** Failure 2802 Duplicate row error in TEST_ID3.

SELECT
* FROM test_id3 ORDER BY 1;
Col1
----------1
3
4
5

Col2
----------9
9
9
9

Things to notice:

The final insert intended to insert a (1,9) row.

This is because the MAXVALUE is 5 and CYCLE was specified.

It couldn't do that since the (1,9) row already exists.

Insert an additional row.

INSERT INTO test_id3 VALUES (NULL, 8); - adds (2,8)


SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
2
3
4
5

Col2
----------9
8
9
9
9

Note that the increment occurs, even though the insert of 1 previously
failed.

Generating BY DEFAULT (1 OF 2)
Now let's switch to BY DEFAULT mode. This mode will generate a value only when a
value is not explicitly expressed.
CREATE SET TABLE TEST_ID3 ,FALLBACK
(
Col1 INTEGER GENERATED BY DEFAULT AS IDENTITY
(MINVALUE 1 MAXVALUE 5 CYCLE),
Col2 INTEGER )
PRIMARY INDEX ( Col1 );

Add two rows to the table.


INSERT INTO test_id3 VALUES (1, 9);
INSERT INTO test_id3 VALUES (3, 9);
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
3

Col2
----------9
9

Note that no identity columns were generated. The values were provided
explicitly.
INSERT INTO test_id3 VALUES (, 9);
*** Failure 2802 Duplicate row error in TEST_ID3.
Things to notice:

This insert attempted to add the row (1,9) again.

This is because it used the START WITH value of 1.

Because this row already exists, the insert fails.

Add the following row.


INSERT INTO test_id3 VALUES (, 8); - adds (2,8)
SELECT * FROM test_id3 ORDER BY 1;
Col1
Col2
----------- ----------1
9
2
8
3
9
Notice, 2 is the generated value.
The insert the value 1 previously failed and it will not be reused
until it cycles.
Now, add the following row.
INSERT INTO test_id3 VALUES (6, 9); - adds (6,9)
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
2
3
6

Col2
----------9
8
9
9

Note, added values are explicit, thus no defaulting


Insert the following three additional rows.
INSERT INTO test_id3 VALUES (, 8); - adds (3,8)
INSERT INTO test_id3 VALUES (, 8); - adds (4,8)
INSERT INTO test_id3 VALUES (, 8); - adds (5,8)
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
2
3
3
5
6

Col2
----------9
8
8
9
8
8
9

Note, last generated value, prior to these inserts, was 2.


Add the following row.
INSERT INTO test_id3 VALUES (, 7); - adds (1,7)
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
1
2
3
3
4
5
6

Col2
----------7
9
8
8
9
8
8
9

Note that the recycle has begun by reverting back to the MINVALUE OF
1.

Generating BY DEFAULT (2 OF 2)

Consider that the last row generated for this table is the (1,7) row.

SELECT * FROM test_id3 ORDER BY 1;


Col1
-----------

Col2
-----------

1
1
2
3
3
4
5
6

7
9
8
8
9
8
8
9

Now remove all rows from the table.

DELETE FROM test_id3;


Add the following row to the table.

INSERT INTO test_id3 VALUES (, 6);


SELECT * FROM test_id3 ORDER BY 1;
Col1
----------2

Col2
----------6

Things To Notice:

The id generation picks up where it left off at 2.

Emptying the table does not reset to the START WITH value.

Add the following row to the table.


INSERT INTO test_id3 VALUES (4, 6);
SELECT * FROM test_id3 ORDER BY 1;

Col1
----------2
4

Col2
----------6
6

Note that no identity columns were generated. The value 4 was provided
explicitly.
Now insert the following three rows.
INSERT INTO test_id3 VALUES (, 5); - adds (3,5)
INSERT INTO test_id3 VALUES (, 5); - adds (4,5)
INSERT INTO test_id3 VALUES (, 5); - adds (5,5)
SELECT * FROM test_id3 ORDER BY 1;
Col1
Col2
----------- ----------2
6
3
5
4
5
4
6

Note that another 4 row is generated even though one already exists.
The generating mechanism operates independently or pre-existing rows.
INSERT INTO test_id3 VALUES (, 5); - adds (1,5)
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
2
3
4
4
5

Col2
----------5
6
5
5
6
5

Note, the recycle begins again.

Rules and Restrictions Of Identity Columns

Identity Column as an Attribute

An Identity Column (IC) is considered an attribute of a column.

You cannot drop (or modify) the IC attribute of a column.

You can drop an IC column from a table.

An IC attribute cannot co-exist with any of the following attributes:


DEFAULT
BETWEEN
COMPRESS
CHECK
REFERENCES

Some Rules for Identity Columns (ICs)

Only one IC is permitted per table.

It can be any column in the table with some exceptions.

IC's cannot be any part of any of the following:

Composite indexes (primary or secondary)

Hash or Join Indexes

Partitioned Primary Indexes

Value-ordered Indexes

Upserts are not supported on tables where the PI is an IC.

Column compression cannot be specified with ICs.

Bulk Row Inserts with Identity Columns (ICs)

IC's are not supported for load utilities Fastload and Multiload.

IC's with multi-session BTEQ or multi-statement TPUMP are permitted.

Bulk inserts across multiple sessions cannot guarantee that the sequence of
the IC numbers will correlate to the sequence of the specified INSERT
statements.

Bulk inserts done via INSERT SELECT also cannot guarantee that the
sequence of the assigned IC's will be unbroken. This is because each AMP
pre-allocates a range of numbers based on a pre-defined interval (specified
in the DBS Control Record). Consequently each AMP will provide its own
sequence independently of the others.

Auto-Generated Key Retrieval

History
Prior to the auto-generated key retrieval feature, there was no simple way to
determine the value assigned to the identity column for an INSERTed row of a
table. If there was a unique column, or a unique combination of columns, with a
USI assigned, then the user could do a SELECT of the identity column, qualifying
on the unique column(s). This required an additional query request and would, in
some cases, require an all-AMP operation, and therefore was considered inefficient.
It also presented a bigger problem in the case of INSERT-SELECT which usually
adds multiple rows to the table in question.

Business Value
Having the IdCol values automatically returned enhances applications that require
quick or immediate retrieval of assigned identity values.

Examples

To enable this feature, use the new .SET command in BTEQ:

.[SET] AUTOKEYRETRIEVE [OFF|COLUMN|ROW]


Where:
OFF = Disabled
COLUMN = Enabled, display IDCol only
ROW = Enabled, display entire row

Example
Lets say we have the following CREATE TABLE statement:

CREATE TABLE customerDetails


(custID INTEGER GENERATED ALWAYS AS IDENTITY,
custName VARCHAR (30),
city VARCHAR (20),
phoneNo INTEGER
);
With Teradata Database V2R6.1 and prior, we enter the following:
INSERT INTO CustomerDetails
(, John, London, 919866234567);
And we receive the following results:
*** Insert completed. One row added.
*** Total elapsed time was 1 second.
We now have no idea what value was assigned to the customer without doing a
separate retrieve of this row. In Teradata V2R6.1 and prior, quick retrieval of the
new custID was possible only if a USI was present on the table. It then required a
separate SELECT of the identity column, using a qualifying condition on the unique
column(s).
In Teradata Database V2R6.2, we can enter the following:
.SET AUTOKEYRETRIEVE ROW;

INSERT INTO customerDetails VALUES


(, George, London, 919848123);
And we receive the following results:

*** Insert completed. One row added.


*** Total elapsed time was 1 second.
custID custName city
------ --------- ---------2 George
London

phoneno
-----------------919848123

Note: If no .SET AUTOKEYRETRIEVE statement is used, the system functions


as before no rows or IDcol values are displayed.

Considerations and Limitations

Considerations
INSERTs have the additional cost of row retrieval if Auto Generated Key Retrieval
(AGKR) is requested. However, the additional retrieval takes less time overall
compared with having to run a separate SELECT to retrieve the identity value.

Limitations
The following limitations apply:

Lab

This feature supports explicit single INSERT and INSERT SELECT


statement only. It does not support any other form of inserts, e.g., Upsert,
MERGE-INTO, Triggered Inserts, Multiload and Fastload.

Iterated INSERTs have to adhere to the 2048 spool limit of the Array
Support feature. A max of 1024 iterations is possible as each iteration uses
an AGKR spool and a response spool.

This feature is enabled through the Client, e.g., JDBC driver or BTEQ. If the
Client version does not include the AGKR capability, then there will be no
AGKR response for INSERT requests.

Try It!

For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).

Click on the buttons to the left to see the answers.


The lab problems in this modules use the following table which
exists in the Customer_Service database.
CREATE SET TABLE agent_sales ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(agent_id INTEGER,
sales_amt INTEGER)
UNIQUE PRIMARY INDEX ( agent_id );

In this lab, we will attempt to assign a badge to each agent. Each


badge must be given a sequentially unique id as it is assigned to an
agent. We will use an identity column to accomplish this.

1.) Create a new table in your own database using the following text:
Answers:
Lab 2
Lab 3

CREATE SET TABLE agent_badge ,NO FALLBACK ,


NO BEFORE JOURNAL,
NO AFTER JOURNAL
(badge_id INTEGER GENERATED ALWAYS AS IDENTITY,
agent_id INTEGER)
UNIQUE PRIMARY INDEX ( agent_id);
2a.) Populate this table from the agent_sales table in the Customer_Service
database using an INSERT SELECT. The first column should be assigned a
NULL and the second column should select the agent_id from the agent_sales
table. To keep the example concise, only populate this table with agents whose id
is less than 40.
2b.) Veryify the resulting table. Order the rows by badge_id. Are the identity
column values what you might have expected?
3a.) We will now try to recreate the agent_badge table with sequentially assigned
ids starting with number one and with no gaps. First, drop and recreate the table.
3b.) Now, set up and execute a BTEQ EXPORT

script which exports the

agent ids to a file called agent_exp.


3c.) Now, set up and execute a BTEQ IMPORT script which imports values from
the agent_exp file into the agent_badge table.
3d.) Verify the results by selecting all rows of the agent_badge table, ordered by
badge_id.

6.) Multi-Column Compression


Objectives
After completing this module, you should be able to:

Implement column compression on a table using the ALTER TABLE


command.

Recognize the benefits of column compression implemented with ALTER


TABLE.

Implement multi-value column compression for table columns.

Recognize certain benefits, considerations and limitaions of multi-value


column compression.

Multiple Value Column Compression

The COMPRESS phrase allows values in one or more columns of a permanent table to
be compressed to zero space, thus reducing the physical storage space required for
a table.
The COMPRESS phrase has three variations:

Compression Option

What Happens

COMPRESS

Nulls are compressed.

COMPRESS NULL

Nulls are compressed.

COMPRESS
<constant>,<constant>,..

Nulls and the specified <constant>


value(s) are compressed.

Note: COMPRESS & COMPRESS NULL mean the same thing.


The last of these three options allows up to 255 values to be compressed for a
single column.
Example 1
An example of single-value column compression:

CREATE TABLE bank_account_data


(customer_id INTEGER
,account_type CHAR(10) COMPRESS 'SAVINGS');
Things to notice in this example:

Both nulls and the string 'Savings' will be compressed when the row is
stored.

The value of 'Savings' is written in the table header for table


bank_account_data on each AMP in the system.

Example 2
An example of multiple-value column compression:
CREATE TABLE bank_account_data
(customer_id INTEGER
,account_type CHAR(10) COMPRESS ('SAVINGS','CHECKING','CD','MUTUAL
FUND'));
Things to notice in this example:

Each of the four specified strings are now compressed when the row is
stored.

Nulls are also compressed.

Each of these values is written to the table header on each AMP.

Zero space is taken in the physical row for each of these values.

This can significantly reduce the amount of space needed to contain the
table.

Compression has two primary benefits:

It reduces system storage costs.

It enhances system performance.

Impact Of Multiple Value Column Compression

System Storage Costs:


Compression reduces storage costs by storing more 'logical' data using fewer
'physical' resources. In general, compression causes physical rows to be smaller,
consequently permitting more rows per data block and thus fewer data blocks for
the table.
The amount of storage reduction is a function of the following factors:

The number of values compressed.

The size of the values compressed.

The percentage of rows in the table with these values.

System Performance:
System performance can be expected to improve as a result of value compression.
Because each data block can hold more rows, fewer blocks need to be read to
resolve a query, and thus fewer physical I/O's can be expected to take place. Also,
because the data remains compressed while in memory, more rows can be available
in cache for processing per I/O.

Compression Transparency:
Compression is transparent to all user applications, utilities, ETL tools, ad hoc
queries and views.

Compression Suggestions

Examples of highly compressible values:


Any of the following should be considered candidates for compression, when the
frequency of their occurrence is high:

Nulls

Zeroes

Spaces

Default Values

Suggested Application Columns For Compression:


Any column with a high frequency values, or with a relatively small number of
values should be considered a candidate for compression. The following is a list of
possible candidate columns.

State

City

Country

Automobile Make

Credit Card Type

Account Type

First Name

Last Name

Limitations On Multi-Value Column Compression

Limitations:

A maximum of 255 values may be compressed per column.

The maximum size of a compressed value is 255 bytes.

Only columns with a fixed physical length may be compressed - e.g. CHAR
but not VARCHAR.

Primary index columns cannot be compressed.

The aggregate of all compressed values may not exceed the maximum size
of a table header (64K).

What is Not Compressible


The following data types are not compressible:

INTERVAL

TIME

TIMESTAMP

VARCHAR

VARBYTE

VARGRAPHIC

ALTER TABLE Compression

The SQL ALTER TABLE command supports adding, changing, or deleting column
compression on one or more existing columns of a table, whether the table is
loaded with data or is empty.

History
Traditionally there has been a trade-off between the desire to store more data and
the cost of storing the additional data. The column compression feature of the
Teradata database has always provided an opportunity to reduce the amount of
data to be physically carried in the database by storing frequently repeating values
in a table header rather than repeating them for every row in the table. The column
compression feature thereby helps improve the trade-off between these competing
requirements, making it less expensive to store more data.
Without the ALTER TABLE Compression feature, when compression requirements
on a table change, it would be necessary to recreate the table with the new
compression requirements specified, and then reload the table. The ALTER TABLE
Compression feature makes it easier to add, change or delete compression
requirements from an existing table without the user having to recreate and reload
the table.

Examples

The following are examples of the ALTER TABLE Compression feature. Note,
whenever the COMPRESS attribute is applied to a nullable column, nulls will always
be compressed by default. If other compression values are specified, they will be
compressed in addition to null compression.
There is an implied sequence in the statements that follow:

Example Set 1:
ALTER TABLE Table1 ADD Col1 COMPRESS;

If the column Col1 exists and is nullable, then the column will be a
compressible column with NULL as the compress value.

If the column Col1 does not exist, then the column is added to the table and
the column will be a compressible column with NULL as the compress value.

ALTER TABLE Table1 ADD Col1 COMPRESS NULL;

Same as previous example.

ALTER TABLE Table1 ADD Col1 COMPRESS Savings;

The column will be compressed for nulls and for the constant value Savings.

If the column is already compressed the constant value Savings will replace
the existing compress value or list of values.

ALTER TABLE Table1 ADD Col1 COMPRESS NULL,Savings, Checking;

The column will be compressed on the specified compress list.

If the column is already compressed, the new compress list will replace the
existing compress value or list of values.

Null will be compressed in either case.

Example Set 2:
ALTER TABLE Table1 ADD Col2 COMPRESS 0;

Column will be compressed for one value - zero.

Nulls will also be compressed if the column is nullable (default is nullable).

ALTER TABLE Table1 ADD Col2 COMPRESS (0, 100, 1000);

Add compressed values (100, 1000).

Note value zero must be restated if this follows the previous ALTER

statement.

ALTER TABLE Table1 ADD Col2 COMPRESS (NULL,0,100,1000,10000);

Adds compressed values 10000.

Only 10000 is added to list since NULL, 0, 100 and 1,000 are already
compressed in prior example.

ALTER TABLE Table1 ADD Col2 COMPRESS (NULL, 0, 100);

Reduces the compression list to null and two values.

Values 1,000 and 10,000 are not compressed.

ALTER TABLE Table1 ADD Col2 NO COMPRESS;

All compressed values are now disabled.

Column is now uncompressed.

Compression Optimizations

ALTER TABLE Compression Optimizations


The following listing shows the mechanics of how compression is physically
implemented via the ALTER TABLE command and the optimizations used.

The table will actually be rebuilt at the time of the execution of the ALTER
TABLE statement.

Space overhead requirement for rebuilding a table is around 2 MB. The table
is rebuilt one cylinder at a time, so no matter how big or small a table is, the
overhead remains the same.

A full duplicate copy of the table is never created nor required.

The ALTER process is restartable via checkpoints in the Transient Journal.

No Rollback process is possible.

If a restore of the original table is desirable this can be accomplished by:


- Re-ALTERing the table with the original compression specifications.
- An archive of the original table followed by a restore.

Considerations and Limitations

Limitations
The ALTER TABLE ADD cname syntax allows certain other attributes to be
included in the same statement as the COMPRESS attribute. The following are
exceptions to this rule:

A column CONSTRAINT cannot be defined at the same time.

A COMPRESS modification with a NULL value in the compress list is


not allowed in conjunction with a NOT NULL attribute change.

Altering a non-compressible column to a compressible column is not


allowed if changing the column to an Identity column at the same
time. These changes may be implemented separately.

Additional Considerations

Compressing columns on which secondary indexes are defined is


allowed, unless the index is the either the PK or the FK of a
Referential Constraint.

An Exclusive lock is required on the table being compressed.

Compression Versus VARCHAR


Character data which has a significant length variability can be also stored using a
VARCHAR data type to save space. VARCHAR stores only the actual value with a twobyte length field in the physical row, while omitting trailing blanks. VARCHAR data
types are not eligible for compression because they are not fixed length.
When debating whether VARCHAR or compression is preferable for a character
column, three factors are to be considered:

The average field length of the character data.

The maximum field length of the character data.

The frequency of occurrence of the compressible values.

The following rule dictates which approach should be favored:


Choose VARCHAR - when the difference between maximum and average field
length is high and the frequency of occurrence is low.
Choose Compression - when the difference between maximum and average field

length is low and the frequency of occurrence is high.


Choose VARCHAR - when there is no clear winner between the two. This is
because VARCHAR uses slightly less CPU resource.

Summary

Multiple Value Column Compression


This feature permits up to 255 values to be compressed for a given column. The
benefits of using this features are:

Decreased usage of disk space.

Increase in table query performance.

ALTER TABLE Compression


This feature of the SQL ALTER TABLE command supports adding, changing, or
deleting column compression on one or more existing columns of a table,
regardless of whether the table is loaded with data or is empty. When ALTER TABLE
is used to change or initiate column compression, the table will be rebuilt internally.

Lab

Try It!

For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).
Click on the answer button to the left to see the answers.

1.) Display the table definition for the Customer_Service.accounts table.


2.) How many distinct values of the column 'city' are found in the accounts table
and how many occurrences are there for each value?
3.) How much space is taken by this table currently?
Answers:
Lab 1
Lab 2
Lab 3
Lab 4
Lab 5
Lab 6
Lab 7

4.) Create the table Accounts in your database with compression


specified for the following 'city' values:
'Culver City', 'Hermosa Beach', 'Los
Angeles', 'Santa Monica'
5.) Populate your table with the rows from
Customer_Service.accounts
6.) See how much space your table requires compared to the uncompressed version
in the Customer_Service database, as seen in lab #3.
7.) How many distinct cities, states and zip codes are contained in the accounts
table. Use a single query to answer this question.

7.) Join Indexes and NUSI's


Objectives

After completing this module, you should be able to:

Describe the purpose of value-ordered NUSI's and their implementation

Describe the purposes of Join Indexes and their implementation.

Distinguish between single-table and multi-table join indexes.

Add a NUSI to a Join Index.

NUSI Review

Non-Unique Secondary Indexes (NUSI's) are a Teradata index feature which permits
defining non-primary indexes on non-unique columns. Typically, this is done to
improve performance on queries which use the column or columns in the WHERE
clause selection criteria. NUSI's may be created either as a part of the CREATE
TABLE syntax, or they may be created after table creation using CREATE INDEX
syntax. NUSI's may be easily dropped when their presence is no longer needed by
using the DROP INDEX syntax.

Alternative 1 CREATE TABLE Syntax


Create an 'employee' table with a NUSI on the job code.

CREATE SET TABLE employee ,FALLBACK ,


(
employee_number INTEGER,
manager_employee_number INTEGER,
department_number INTEGER,
job_code INTEGER,
last_name CHAR(20) NOT NULL,
first_name VARCHAR(30) NOT NULL,
hire_date DATE FORMAT 'YY/MM/DD' NOT NULL,
birthdate DATE FORMAT 'YY/MM/DD' NOT NULL,
salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number )
INDEX (job_code);

Alternative 2 CREATE INDEX Syntax


Create a NUSI on the job code column for existing 'employee' table.

CREATE INDEX (job_code) ON employee;

Example DROP INDEX Syntax


Drop the NUSI on the job code column of the 'employee' table.

DROP INDEX (job_code) ON employee;


Upon creation of a NUSI, a subtable is built on each AMP. The subtable contains a
row for each NUSI value to be found on this AMP and the row-ids of the associated

base table rows which are co-located on the AMP.


Rows are sequenced in the subtable based on the hash of the NUSI value. While
this is convenient for finding all rows with a particular NUSI value, it is less useful
for doing 'range' searches. For example, the index created here would be useful in
finding all employee rows with a job code of 122100, but less useful in locating all
employee rows whose job code is between 122000 and 123000.

Value Ordered NUSIs

Value Ordered NUSI's allow NUSI subtable rows to be sorted based on a data
value, rather than on a hash of the value. This is extremely useful for range
processing where a sequence of values between an upper and lower limit is desired.

Alternative 1 CREATE TABLE Syntax


Create an 'employee' table with a value-ordered NUSI on the job code.

CREATE SET TABLE employee ,FALLBACK ,


(
employee_number INTEGER,
manager_employee_number INTEGER,
department_number INTEGER,
job_code INTEGER,
last_name CHAR(20) NOT NULL,
first_name VARCHAR(30) NOT NULL,
hire_date DATE FORMAT 'YY/MM/DD' NOT NULL,
birthdate DATE FORMAT 'YY/MM/DD' NOT NULL,
salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number )
INDEX (job_code) ORDER BY VALUES (job_code);

Alternative 2 CREATE INDEX Syntax


Create a value-ordered NUSI on the job code column of existing 'employee' table.

CREATE INDEX (job_code) ORDER BY VALUES (job_code) ON employee;


The optimizer may now choose this index to do range searches on job codes.
The NUSI's are automatically maintained by the Teradata database, that is, when a
base table row changes values, any corresponding values in the NUSI subtable are
also changed. It is never necessary to do anything to maintain a secondary index.
You can only create it, drop it and collect statistics on it.

Limitations of Value-Ordered NUSI's

A column defined as a value-ordered index column must be:

A single column

A column which is a part of or all of the index definition

A numeric column non-numerics are not allowed

No greater than four bytes in length INT, SMALLINT, BYTEINT, DATE,


DEC are valid

Note: Although DECIMAL data types are permitted, their storage length must not
exceed four bytes and they cannot have any precision digits.

Index Covering
If a query references only those columns that are contained within a given index,
the index is said to "cover" the query. In these cases, it is often more efficient for
the optimizer to access only the index subtable and avoid accessing the base table
rows altogether.
Covering will be considered for any query that references only columns defined in a
given NUSI. These columns can be specified anywhere in the query including the:
1. SELECT list
2. WHERE clause
3. aggregate functions
4. GROUP BY
5. expressions
The presence of a WHERE condition on each of the indexed columns is not a
guarantee for using the index to cover the query. The optimizer will consider the
appropriateness and cost of 'covering' versus other alternative access paths and
choose the optimal plan.
The potential performance gains from index covering require no user intervention
and will be transparent except for the improved access time. The use of the NUSI
can be validated by reviewing the execution plan returned by EXPLAIN.

Join Index

The Join Index feature provides indexing techniques that can improve the
performance of certain types of queries. The Join Index is a physical structure,
populated with rows that contain columns from one or more tables. Once created, it
becomes an option available to the optimizer but is never directly accessed by the

user.
Its purpose is to aid in the joining of tables by providing needed data from an index
rather than having to access the base rows of the table. By using a join index the
optimizer may be able to avoid having to access or redistribute many individual
tables and their base rows.
The Join Index supports syntax for the following types of indexes:
1. Multiple-table Join Index - Used to pre-join multiple tables
2. Single-table Join Index - Used to rehash and redistribute the rows of a
single table based on a specified column or columns
3. Aggregate Join Index - Used to create an aggregate index to be used as a
summary table
In this section we will be discussing the first two only. The third item, aggregate
indexes, are discussed in a separate module of this training.

Multiple-Table Join Indexes

Multiple-table Join Indexes are used to pre-join two or more tables. Consider the
following tables which are in the 'Student' database.

CREATE TABLE customers


( cust_id INTEGER NOT NULL,
cust_name CHAR(15),
cust_addr CHAR(25) )UNIQUE PRIMARY INDEX ( cust_id );
CREATE TABLE orders
( order_id INTEGER NOT NULL,
order_date DATE FORMAT 'yyyy-mm-dd',
cust_id INTEGER,
order_status CHAR(1)) UNIQUE PRIMARY INDEX ( order_id );
The relationship of these table is demonstrated by the following diagram:

There are 49 orders with valid customers.


There is 1 order that has an invalid customer.
There is 1 valid customer who has no orders.

Query 1 Without Join Index


How many orders have assigned customers?

SELECT COUNT(order_id)
FROM orders
WHERE cust_id IS NOT NULL;
Count(order_id)
--------------50
A join index will not help this query. The 'order' table covers the query.

Query 2 Without Join Index


How many orders have assigned valid customers?

SELECT COUNT(o.order_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id;
Count(order_id)
--------------49
A join index can help this query. Two tables are needed to cover the query.

Creating A Join Index

The following shows the creation of a join index which will improve the performance
of any joins it can cover.

CREATE JOIN INDEX cust_ord_ix


AS SELECT (c.cust_id, cust_name)
,(order_id, order_status, order_date)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
PRIMARY INDEX (cust_id);
The join index is comprised of a 'fixed portion' (first parenthesis) and a 'repeatable
portion' (second parenthesis). This represents a denormalization of the data and
logically looks like the following:
CUST_ID
1001

CUST_NAME
ABC Corp

ORDER_ID ORDER STATUS


501
C

ORDER_DATE
990120

502

990220

1002

BCD Corp

503

990320

504

990420

505

990520

506

990620

507

990122

508

990222

509

990322

Now, let's revisit the same query again.

Query 2 With Join Index


How many orders have assigned valid customers?

SELECT COUNT(o.order_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id;
Count(order_id)
--------------49

The join index helps this query because it covers the query and
therefore the
result may be generated without ever accessing the rows of the base
table.
Compare the costs as shown by EXPLAIN.

Without Join Index


With Join Index

.39 secs.
.17 secs.

This represents a 50% decrease in query time because of the join


index.

The join index is automatically maintained by the Teradata RDBMS, that


is,
when a base table row changes values, any corresponding values in the
join index
are also changed.

It is never necessary to do anything to maintain a

join
index.

You can only create it and drop it.

The join index seen above may also be used to cover other queries.

(Repeated For Convenience)


CUST_ID
1001

1002

CUST_NAME
ABC Corp

BCD Corp

ORDER_ID ORDER STATUS


501
C

ORDER_DATE
990120

502

990220

503

990320

504

990420

505

990520

506

990620

507

990122

508

990222

509

990322

Query 3 With Join Index

How many valid customers have assigned orders in January 1999?

SELECT COUNT(c.cust_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN 990101 AND 990131;
Count(cust_id)
--------------9

Because the 'order_date' column is included in the join index, once


again
this query is covered by it.

Compare the costs as shown by EXPLAIN.

Without Join Index .40 secs.


With Join Index
.17 secs.

This represents a greater than 50% decrease in query time because of


the join index.

Assigning the Primary Index For Join Indexes

Join Indexes are always assigned a primary index in order to hash distribute the
index rows across the AMPs. In the example created here, the Primary Index of the
Join Index is the column Cust_id.
We can explicitly specify the primary index for the Join Index or allow it to default
to the first column specified. In our upcoming look at Single-Table Join Indexes, we
will see the usefullness of being able to choose and specify a primary index on a
Join Index.

CREATE JOIN INDEX cust_ord_ix AS


SELECT (c.cust_id, cust_name)
,(order_id, order_status, order_date)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
PRIMARY INDEX (cust_id);

Value Ordering Join Index Rows


Optionally, you can specify the sequencing of the join index rows on each AMP.
Normally, the rows will be sequenced on each AMP by the hash value of the primary
index for the Join Index. Because this default sequencing limits the efficiency for
doing 'range' processing, an ORDER BY clause is available which allows this default
sequencing to be overridden.

CREATE JOIN INDEX cust_ord_ix AS


SELECT (c.cust_id, cust_name)
,(order_id, order_status, order_date)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
ORDER BY c.cust_id
PRIMARY INDEX (cust_id)

Query 3 With Join Index ORDERed BY cust_id

How many distinct valid customers with customer ids between 1001 and 1005 have
assigned orders?

SELECT COUNT(DISTINCT(c.cust_id))
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE c.cust_id BETWEEN 1001 AND 1005;
Count(Distinct(cust_id))
-----------------------5
Because this query accesses a range of customer ids, the optimizer can access the
rows of the Join Index more efficiently because the qualifying rows are already
sequenced by cust_id and thus easily located.
The rules for the ORDER BY column are the same as for Value-Ordered NUSI's.
The ORDER BY column must be:
A single column
A column which is a part of or all of the fixed-portion index definition
A numeric column non-numerics are not allowed
No greater than four bytes in length INT, SMALLINT, BYTEINT, DATE, DEC are
valid
Note: Although DECIMAL data types are permitted, their storage length must not
exceed four bytes and they cannot have any precision digits.

NUSIs On Join Indexes

A NUSI may be created on a join index and may be used to improve access to the
join index rows. In the example just seen, we ordered the rows of the join index by
'cust_id' in order to facilitate 'range' processing on customer numbers.
Because the rows of the join index can only be sequenced by one column, we need
to use another technique to facilitate 'range' processing for the order date.
We can solve this problem by adding a NUSI on the join index and value ordering it
on the order date. NUSI's on join indexes can be built as part of the CREATE JOIN
INDEX statement, or they can be added after join index creation using the CREATE
INDEX statement.

Alternative 1 CREATE TABLE Syntax


Create the same join index and also create a NUSI on the Join Index for the
'order_date' column. Value order the NUSI on the 'order_date' column.

CREATE JOIN INDEX cust_ord_ix AS

SELECT (c.cust_id, cust_name),(order_id, order_status, order_date)


FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
ORDER BY c.cust_id /* This ORDER BY controls how the rows of the
Join Index will be sorted on the AMPs */
PRIMARY INDEX (cust_id)
INDEX (order_date) ORDER BY (order_date);
/* This ORDER BY controls how the rows of the
NUSI will be sorted on the AMPs */ ;

Alternative 2 CREATE INDEX Syntax


Create a NUSI on the existing join index 'cust_ord_ix'.

CREATE INDEX (order_date) ORDER BY VALUES (order_date) ON


cust_ord_ix;
Note: The keyword VALUES is optional.

Single-Table Join Indexes

Single-Table Join Indexes are created to rehash and redistribute the rows of a
table by a column other than the Primary Index column. The redistributed index
table may be a subset of the columns (vertical subset) of the base table. It can
significantly reduce the costs associated with doing a table redistribution for join
processing.
In building join plans for two tables, the optimizer must first decide how to insure
that all joinable rows are co-located on the same AMP. If both tables are being
joined on their respective Primary Index columns, the joinable rows are already colocated on the same AMP, thus no redistribution of data is needed. If either table is
not using its Primary Index columns as the join column(s), then a redistribution
must occur.
Single-Table Join Indexes provide the ability to 'pre-distribute' the rows of a
table based on the hash of the join value. This will eliminate the need for the
optimizer to require a redistribution to perform the join - it can take advantage of
the already distributed rows of the Single-Table Join Index.
Consider the two tables 'employee' and 'department'.

CREATE SET TABLE employee ,FALLBACK


( employee_number INTEGER,
manager_employee_number INTEGER,
department_number INTEGER,
job_code INTEGER,
last_name CHAR(20) NOT NULL,
first_name VARCHAR(30) NOT NULL,
hire_date DATE FORMAT 'YY/MM/DD' NOT NULL,
birthdate DATE FORMAT 'YY/MM/DD' NOT NULL,
salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );

CREATE TABLE department, FALLBACK


(department_number SMALLINT
,department_name CHAR(30) NOT NULL
,budget_amount DECIMAL(10,2)
,manager_employee_number INTEGER )
UNIQUE PRIMARY INDEX (department_number);
Assume we would like to perform the following query.

Query 4
Select all employee numbers, their department number and department name.

SELECT e.employee_number
,d.department_number
,d.department_name
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Joining these two tables on the 'department_number' column might require a
redistribution of the rows of the employee table. The department table is already
distributed based on the PI column 'department_number', but the employee table is
distributed on the PI column 'employee_number'. Depending on the size of the
tables, the redistribution can become a costly operation.
One possible technique to expedite this join would be to create a Single-Table Join
Index on the employee table as follows:

CREATE JOIN INDEX emp_deptno


AS SELECT employee_number, department_number
FROM employee
PRIMARY INDEX (department_number);
Executing Query 4 again would give the optimizer the opportunity to use the join
index and thus enable it to avoid the cost of redistributing the rows of the employee
table. If we EXPLAIN the query, we can see that the join index was indeed used.

Query 4 (Explained)
Select all employee number, their department number and department name.

EXPLAIN SELECT e.employee_number


,d.department_number
,d.department_name
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
(Partial Listing)
4. We do an all-AMPs JOIN step from PED.d by way of a RowHash match scan
with no residual conditions, which is joined to PED.emp_deptno. PED.d and
PED.emp_deptno are joined using a merge join, with a join condition of
("PED.emp_deptno.department_number = PED.d.department_number").

The result goes into Spool 1, which is built locally on the AMPs.' with low
confidence to be 24 rows. The estimated time for this step is 0.18 seconds.

Summary

The following are index options available for query performance enhancement which
we have seen in this module.

Hash-ordered NUSI's traditional NUSI's

Value-ordered NUSI's to facilitate range seaches on the index value

Single-table join indexes to pre-hash-distribute rows of one table to colocate with joinable rows

Multi-table join indexes to pre-join existing table rows from multiple


tables

Join indexes may be further enhanced by applying the following features to them:

Define the PI of the join index to distribute the join index rows most
effectively

Define the ordering of the join index rows to sequence the join index rows

Define a NUSI on the join index to access rows in the join index more
effectively
(Multi-table join index only)

Define the ordering of the NUSI rows to sequence the NUSI rows, either
hash or value-based
(Multi-table join index only)

Lab
Try It!

For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Telnet button in the lower left
hand screen of the course. Two windows will pop-up: a BTEQ
Instruction Screen and your Telnet Window. Sometimes the BTEQ
Instructions get hidden behind the Telnet Window. You will need
these instructions to log on to Teradata.

Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE tdxxx;).

Click on the buttons to the left to see the answers.

Answers:
Lab A.1
Lab A.2
Lab A.3
Lab B.1
Lab B.2
Lab B.3
Lab B.4
Lab C.1
Lab C.2

A.1 Make copies in your own database of the city and state tables
which are located in the Student database. You may accomplish this
by using the following SQL:
CREATE TABLE state AS Student.state WITH DATA ;
CREATE TABLE

city AS Student.city WITH DATA

Look at the table definitions for the City table and the State table.
Construct a query which returns the following information by inner
joining these two tables. Order the results by city population within
state population.
City Name
...

City Population
...

State Name
...

State Population
...

A.2 Create a Join Index in your own database called citystateidx. The
fixed potion of the index should contain the state name and the state
population. The variable portion should contain the city name and
the city population.
A.3 Now rerun the query to see if you get the same result. If you do,
EXPLAIN the query to see if your Join Index was used.
B.1 Drop the Join Index citystateidx.
B.2 Modify the query to only show states whose population is
between one and three million. Run the query.
B.3 Recreate the Join Index, however this time insure that the index
rows will be sorted by the state population column.
B.4 Rerun the query to insure the same results, then EXPLAIN the
query to determine if the Join Index was used.
C.1 Add a NUSI on the Join Index on the city population column.
Value order the NUSI by city population.

C.2 Rerun the query to insure the same results, then EXPLAIN the
query to determine if the NUSI was used.
8.) Aggregate Join Indexes
Objectives

Upon completion of this module, you should be able to:

Create an aggregate join index.

Determine when an aggregate join index is advantageous.

Determine when an aggregate join index is being used by the optimizer.

Join Index Review

A Join Index is an optional index which may be created by the user for one of the
following three purposes:

Pre-join multiple tables

Distribute the rows of a single table on the hash value of a foreign key value

Aggregate one or more columns of a single table or multiple tables into a


summary table

The first two listed purposes are covered in an earlier module of this training
program.
In this module, we will concentrate on the last of the three purposes aggregating
columns into a join index that the optimizer may choose to use as a summary table.

Why An Aggregate Index

Summary Tables
Queries which involve counts, sums, or averages over large tables require
processing to perform the needed aggregations. If the tables are large, query
performance may be affected by the cost of performing the aggregations.
Traditionally, when these queries are run frequently, users have built summary
tables to expedite their performance. While summary tables do help query

performance there are disadvantages associated with them as well.

Summary Tables Limitations

Require the creation of a separate table

Require initial population of the table

Require refresh of changing data, either via update or reload

Require queries to be coded to access summary tables, not the base tables

Allow for multiple versions of the truth when the summary tables are not upto-date

Aggregate Indexes
Aggregate indexes provide a solution that enhances the performance of the query
while reducing the requirements placed on the user. All of the above listed
limitations are overcome with their use.
An aggregate index is created similarly to a join index with the difference that
sums, counts and date extracts may be used in the definition. A denormalized
summary table is internally created and populated as a result of creation. The index
can never be accessed directly by the user. It is available only to the optimizer as a
tool in its query planning.
Aggregate indexes do not require any user maintenance. When underlying base
table data is updated, the aggregate index totals are adjusted to reflect the
changes. While this requires additional processing overhead when a base table is
changed, it guarantees that the user will have up-to-date information in the index.

Aggregate Index Properties

Aggregate Indexes are similar to other Join Indexes in that they are:

Automatically kept up to date without user involvement.

Never accessed directly by the user.

Optional and provide an additional choice for the optimizer.

Multiload and Fastload may not be used to load tables for which join indexes
are defined.

Aggregate Indexes are different from other Join Indexes in that they:

Use the SUM and COUNT functions.

Permit use of EXTRACT YEAR and EXTRACT MONTH from dates.

You must have one of the following two privileges to create any join index:

DROP TABLE rights on each of the base tables (or the containing database)
or,

INDEX privilege on each of the base tables

Additionally, you must have this privilege:

CREATE TABLE on the database or user which will own the join index

The following table will be used in the subsequent examples:

CREATE SET TABLE PED.daily_sales ,NO FALLBACK ,


NO BEFORE JOURNAL,
NO AFTER JOURNAL
(itemid INTEGER
,salesdate DATE FORMAT 'YY/MM/DD'
,sales DECIMAL(9,2))
PRIMARY INDEX ( itemid );

Without An Aggregate Index

Consider the following problem, without the use of an aggregate index.


Example

SELECT EXTRACT(YEAR FROM salesdate)AS Yr


, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND Yr IN ('2009', '2010')
GROUP BY 1,2
ORDER BY 1,2;
Yr
Mon Sum(sales)
----------- ----------- ----------2009
1
2150.00
2009
2
1950.00
2009
8
1950.00
2009
9
2100.00
2010
1
1950.00
2010
2
2100.00
2010
8
2200.00
2010
9
2550.00

EXPLAINing Without An Aggregate Index

Explaining the previous query shows us that this is a primary index access against
the 'daily_sales_2004' table. (Note that because the cost of aggregation is not
calculated, no final cost for the query is generated.)

EXPLAIN SELECT EXTRACT(YEAR FROM salesdate)AS Yr


, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND Yr IN ('2009', '2010')
GROUP BY 1,2
ORDER BY 1,2;

Explanation

1. First, we do a SUM step to aggregate from PED1.daily_sales_2010 by way


of the primary index "PED1.daily_sales_2010.itemid = 10" with a residual
condition of ("((EXTRACT(YEAR FROM
(PED1.daily_sales_2010.salesdate )))= 2009) OR ((EXTRACT(YEAR FROM
(PED1.daily_sales_2010.salesdate )))= 2010)"), and the grouping identifier
in field 1. Aggregate Intermediate Results are computed locally, then placed
in Spool 2. The size of Spool 2 is estimated with high confidence to be 1 to 1
rows.
2. Next, we do a single-AMP RETRIEVE step from Spool 2 (Last Use) by way of
the primary index "PED1.daily_sales_2010.itemid = 10" into Spool 1, which
is built locally on that AMP. Then we do a SORT to order Spool 1 by the sort
key in spool field1. The size of Spool 1 is estimated with high confidence to
be 1 row. The estimated time for this step is 0.17 seconds.
3. Finally, we send out an END TRANSACTION step to all AMPs involved in
processing the request. -> The contents of Spool 1 are sent back to the user
as the result of statement 1.

Creating An Aggregate Index

Creating a join index gives the optimizer the option of using the 'pre-aggregated'
information kept in the index, thus avoiding the need to generate a separate
aggregation step.

CREATE JOIN INDEX monthly_sales AS


SELECT itemid AS Item
,EXTRACT(YEAR FROM salesdate) AS Yr
,EXTRACT(MONTH FROM salesdate) AS Mon
,SUM(sales) AS SumSales
FROM daily_sales_2010
GROUP BY 1,2,3;
If we run the exact query as previously, with no changes, we will get the same

result as before, however it will take advantage of the aggregate index.

SELECT EXTRACT(YEAR FROM salesdate)AS Yr


, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND Yr IN ('2009','2010')
GROUP BY 1,2
ORDER BY 1,2;
Yr
Mon
Sum(sales)
--- -----------2009
1
2150.00
2009
2
1950.00
2009
8
1950.00
2009
9
2100.00
2010
1
1950.00
2010
2
2100.00
2010
8
2200.00
2010
9
2550.00

EXPLAINing The Use Of Aggregate Index

Explaining the previous query shows us that this time the aggregate index is
employed.

EXPLAIN SELECT EXTRACT(YEAR FROM salesdate)AS Yr


, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND Yr IN ('2009', '2010')
GROUP BY 1,2
ORDER BY 1,2;

Explanation

1. First, we do a SUM step to aggregate from join index table


PED1.monthly_sales by way of the primary index
"PED1.monthly_sales.Item = 10", and the grouping identifier in field 1.
Aggregate Intermediate Results are computed locally, then placed in Spool
2. The size of Spool 2 is estimated with low confidence to be 4 to 4 rows.
2. Next, we do a single-AMP RETRIEVE step from Spool 2 (Last Use) by way of
the primary index "PED1.monthly_sales.Item = 10" into Spool 1, which is
built locally on that AMP. Then we do a SORT to order Spool 1 by the sort
key in spool field1. The size of Spool 1 is estimated with low confidence to
be 4 rows. The estimated time for this step is 0.17 seconds.
3. Finally, we send out an END TRANSACTION step to all AMPs involved in
processing the request.
-> The contents of Spool 1 are sent back to the user as the result of

statement 1.
Because the aggregations are already calculated and available in the index, the
costs associated with step one are reduced. The cost of step two is unchanged
(0.17).
Because aggregation costs are not currently carried in EXPLAIN text, the savings in
processing time for step one are not shown, however the response time reduction
for the user can and should be substantial.

SHOWing The Aggregate Index

Join index definitions may be seen using the SHOW JOIN INDEX construct.
Example
Show the aggregate index named monthly_sales;

SHOW JOIN INDEX monthly_sales;


CREATE JOIN INDEX PED1.monthly_sales ,NO FALLBACK AS
SELECT COUNT(*)(FLOAT, NAMED CountStar )
,PED1.daily_sales_2010.itemid
(NAMED Item )
,EXTRACT(YEAR FROM (PED1.daily_sales_2010.salesdate))
(NAMED Yr )
,EXTRACT(MONTH FROM (PED1.daily_sales_2010.salesdate))
(NAMED Mon )
,SUM(PED1.daily_sales_2010.sales )(NAMED SumSales )
FROM PED1.daily_sales_2010
GROUP BY 2,3,4
PRIMARY INDEX ( Item );
In showing an index definition, some changes should be noted:

All column names are fully qualified to the database level.

A count of rows is automatically added if not specified in the definition, thus


supporting count aggregations.

This can be seen in the SHOW result - (COUNT(*) NAMED CountStar).

If both COUNT and SUM are in the index, AVERAGE calculations may also
make use of the index.

Covering The Query - Example 1

Ultimately, just as with any join index, it is the optimizer's choice whether or not
the index is useful for a specific query. The index created previously is repeated
here for convenience.

CREATE JOIN INDEX monthly_sales AS


SELECT itemid AS Item
,EXTRACT(YEAR FROM salesdate) AS Yr
,EXTRACT(MONTH FROM salesdate) AS Mon
,SUM(sales) AS SumSales
FROM daily_sales_2010
GROUP BY 1,2,3;
Any query against the 'daily_sales_2010' table which requests an aggregation in
any of the following formats, will use the 'monthly_sales' index.

Sum of sales by year

Sum of sales by month

Sum of sales by month within year

Grand total sum of sales

An index is said to 'cover' the query (or cover part of the query) if the optimizer can
generate the query results using the index as a replacement for one or more of the
specified tables.
Example
Show the grand total sales for item 10 as contained in the daily_sales_2010 table.

SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10
GROUP BY 1;
itemid
Sum(sales)
---------------10
16950.00
EXPLAIN SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10
GROUP BY 1;

Explanation (Partial)

1. First, we do a SUM step to aggregate from join index table PED1.monthly_sales


by way of the primary index "PED1.monthly_sales.Item = 10" with no residual
conditions, and the grouping identifier in field 1. Aggregate Intermediate Results

are computed locally, then placed in Spool 2. The size of Spool 2 is estimated
with high confidence to be 1 to 1 rows.
Note: This is an example of the index 'monthly_sales' covering the query.

Covering The Query - Example 2

Show the total sales for item 10 for 2010.

SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010'
GROUP BY 1;
itemid Sum(sales)
------- --------10
8800.00
EXPLAIN SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010'
GROUP BY 1;

Explanation (Partial)

1. First, we do a SUM step to aggregate from join index table


PED1.monthly_sales by way of the primary index
"PED1.monthly_sales.Item = 10", and the grouping identifier in field 1.
Aggregate Intermediate Results are computed locally, then placed in Spool
2. The size of Spool 2 is estimated with high confidence to be 1 to 1 rows.

Aggregate Indexes With Functions

Aggregate indexes are not used in conjunction with queries using SUM Window,
COUNT Window, WITH or WITH BY functions. Because these functions must process
and display all qualifying detail rows, the value of the aggregate index is reduced.
Explaining any query using these functions will validate that the index is not used.

Lab

Try It!

For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).

Click on the buttons to the left to see the answers.


Answers:
Lab 1
Lab 2
Lab 3
Lab 4
Lab 5

1.) Prior to doing this lab exercise, it will be necessary to recreate a


copy of the employee table in your user space. Accomplish this with
the following commands:
DATABASE tdxxx;
CREATE TABLE employee AS Customer_Service.employee WITH
DATA;

Create an aggregate index called 'dept_sals' which sums all of the


salaries of the employees in the employee table by department.
2.) Write a query which shows each department number and the sum
of salaries for that department. Order results by department number.
3.) Explain the query in lab #2. Does it use the aggregate index?
4.) Modify the query in lab #2 and add a column which shows the
average salary in each department. Call this column 'Avg_Sal'
5.) Explain this query and note if the aggregate index was used.
9.) Hash Indexes
Objectives

After completing this module, you should be to:

Recognize the performance advantage of using hash indexes.

Create and implement hash indexes.

Determine when hash indexes are used by a query.

Indexes Revisited

In the next few pages, we will be looking at Hash Indexes and their properties.
Because they share in common many attributes of secondary indexes and join
indexes, let's first review the basics of secondary indexes and join indexes.

Secondary Indexes
Secondary indexes are defined to provide alternate access pathways to the base
rows of a single table. Users may define secondary indexes, but they cannot be
accessed directly by the user, nor can the user affect how the index rows are
distributed. Their use or non-use is a option to the optimizer in its query planning.
The following are properties of secondary indexes:

They contain pointers to the base rows of the table

Are always defined on a single table

Can 'cover' certain queries, but their primary purpose is locating base rows

Secondary Indexes exist in two formats:

Unique Secondary Index (USI) - there is a one-to-one relationship


between the index rows and the base table rows.

Non-Unique Secondary Index (NUSI) - there is a one-to-many


relationship between the index rows and the base table rows. NUSI rows
may be either 'hash' or 'value' ordered.

Join Indexes
Join indexes are defined to reduce the number of rows processed in generating
result sets from certain types of queries, especially joins. Like secondary indexes,
users may not directly access join indexes. They are an option available to the
optimizer in query planning. The following are properties of join indexes:

Are used to replicate and 'pre-join' information from several tables into a
single structure.

Are designed to cover queries, reducing or eliminating the need for access to

the base table rows.

Usually do not contain pointers to base table rows (unless user defined to do
so).

Are distributed based on the user choice of a Primary Index on the Join
Index.

Permit Secondary Indexes to be defined on the Join Index (except for Single
Table Join Indexes), with either 'hash' or 'value' ordering.

Join Indexes exist in three general formats:

Single Table Join Index (STJI) o

Defined on a single table, usually for the purpose of redistributing the


table rows based on the hash value of a foreign key column (or
columns).

Facilitates the ability to join the foreign key table with the primary
key table.

(Multi-Table) Join Indexes (JI) o

A join index which contains 'pre-joined' data from two or more tables.

Facilitates join operations by reducing or eliminating the need to


redistribute and join base table rows.

Aggregate Join Index (AJI) o

A join index which contains an aggregation operator such as COUNT or


SUM.

Facilitates aggregation queries wherein the pre-aggregated values


contained in the AJI may be used instead of relying on base table
calculations

Hash Index Definition

Hash Indexes are database objects that are user-defined for the purpose of
improving query performance. They are file structures which contain properties of
both secondary indexes and join indexes.

Like Secondary Indexes


Hash Indexes are similar to secondary indexes in the following ways:

They are created for a single table only.

They contain information which allows access to base table rows.

The CREATE syntax is very similar to a secondary index.

They may sometimes cover a query without use of the base table rows.

Like Join Indexes


Hash Indexes are similar to join indexes in the following ways:

They 'pre-locate' joinable rows to a common location.

The distribution and sequencing of the rows are user specified.

They are very similar to single-table join indexes (STJI), however with added
functionality.

Unlike Join Indexes


Hash Indexes are unike join indexes in the following ways:

No aggregation operators are permitted.

They are always defined on a single table.

Automatically contains base table PI value as part of the hash index subtable
row.

Contains additional information needed to locate the base table row (e.g.
uniqueness value).

No secondary indexes may be built on the hash index.

Note:

All indexes, whether secondary, join or hash, are automatically updated by


the system when the underlying table rows are changed.

Hash Index Examples

Creating Hash Indexes


Example 1

Consider the following Hash Index definition:

CREATE HASH INDEX hash_1


(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY HASH (employee_number);
This index is built for the table 'emp1' which is defined as follows:

CREATE SET TABLE emp1


(employee_number
INTEGER
, manager_employee_number INTEGER
, department_number
INTEGER
, job_code
INTEGER
, last_name
CHAR(20) NOT NULL
, first_name VARCHAR(30) NOT NULL
, hire_date
DATE NOT NULL
, birthdate
DATE NOT NULL
, salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );
Points to consider about this hash index definition:

Each hash index row contains the employee number, the department
number.

Specifying the employee number is unnecessary, since it is the primary


index of the base table and will therefore be automatically included.

The BY clause indicates that the rows of this index will be distributed by the
employee_number hash value.

The ORDER BY clause indicates that the index rows will be ordered on each
AMP in sequence by the employee_number hash value.

Example 2
The same hash index definition could have been abbreviated as follows:

CREATE HASH INDEX hash_1


(employee_number, department_number) ON emp1;
This is essentially the same definition because of the defaults for hash indexes.

The BY

The ORDER BY clause defaults to the order of the base table rows.

clause defaults to the primary index of the base table.

Hash Index Definition Rules


There are two key rules which govern the use of the BY and ORDER BY clauses:

The column(s) specified in the BY clause must be a subset of the columns which
make up the hash index.

When the BY clause is specified, the ORDER BY clause must also be specified.

Covered Query
The following is an examply of a simple query which is covered by this index:
SELECT employee_number, department_number FROM emp1;

Normally, this query would result in a full table scan of the employee table. With the
existence of the hash index, the optimizer can pick a less costly approach, namely
retrieve the necessary information directly from the index rather than accessing the
lengthier (and costlier) base rows.
Consider the explain of this query:
EXPLAIN SELECT employee_number, department_number FROM emp1;
1) First, we lock a distinct TD000."pseudo table" for read on a
RowHash to prevent global deadlock for TD000.hash_1.
2) Next, we lock TD000.hash_1 for read.
3) We do an all-AMPs RETRIEVE step from TD000.hash_1 by way of an allrows scan with no residual conditions into Spool 1, which is built
locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 8 rows. The estimated time for this step is 0.15
seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request. -> The contents of Spool 1 are sent back to
the user as the result of statement 1. The total estimated time is
0.15 seconds.

Example 3
The following is an alternate definition of the hash index 'hash_1'.
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY VALUES(employee_number);

Points to consider about this hash index definition:

This definition produces the same hash index, however the index rows are
ordered based on employee_number value rather than the hash value.

This might be more useful for certain 'range processing' queries.

This definition would be equally helpful in covering the query indicated

previously. The order of index rows would be of no significance.

Hash Indexes and Joins

Creating Hash Indexes For Joins


Example 1
Consider the following Hash Index definition:
CREATE HASH INDEX hash_2
(employee_number, department_number) ON emp1
BY (department_number)
ORDER BY HASH (department_number);
This hash index is to be used for the purpose of facilitating joins between the
'employee' and 'department' tables, based on the PK/FK relationship on
'department_number'.
EXPLAIN SELECT employee_number, department_name
FROM emp1 e INNER JOIN dept1 d
ON e.department_number = d.department_number;
1) First, we lock a distinct TD000."pseudo table" for read on a
RowHash to prevent global deadlock for TD000.hash_2.
2) Next, we lock a distinct TD000."pseudo table" for read on a RowHash
to prevent global deadlock for TD000.d. 3) We lock TD000.hash_2 for
read, and we lock TD000.d for read.
4) We do an all-AMPs JOIN step from TD000.hash_2 by way of a RowHash
match scan. with no residual conditions, which is joined to TD000.d.
TD000.hash_2 and TD000.d are joined using a merge join, with a join
condition of ("TD000.hash_2.department_number =
TD000.d.department_number"). The result goes into Spool 1, which is
built locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 16 rows. The estimated time for this step is 0.18
seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request. -> The contents of Spool 1 are sent back to
the user as the result of statement 1. The total estimated time is
0.18 seconds.
Points to consider about the effect of the hash index on this join plan:

No redistribution of emp1 rows is needed.

No sort of emp1 rows is needed.

The merge join step (#4) is able to proceed directly after lock steps.

In this case, the hash index functions much the same as a single table join

index (STJI).

Hash Indexes and ROWID Pointers

Using the ROWID in Hash Indexes


Because the primary index is automatically carried in a Hash Index (as is the
uniqueness value associated with the base row-id), the system many easily
calculate the row-id of the base row. This permits columns values not explicitly
contained in the hash index definition to be accessed and returned as part of a
covered query.
Example 1
Consider again the following Hash Index definition:
CREATE HASH INDEX hash_2
(employee_number, department_number) ON emp1
BY (department_number)
ORDER BY HASH (department_number);
Perform the same join on the two tables, however this time add the column 'jobcode' to the SELECT. Note, this column isn't part of the hash index.
EXPLAIN
SELECT e.employee_number
, d.department_name
, e.job_code
FROM emp1 e INNER JOIN dept1 d
ON e.department_number = d.department_number;
:
:
5) We do an all-AMPs JOIN step from TD000.d by way of a RowHash match
scan. with no residual conditions, which is joined to TD000.hash_2.
TD000.d and TD000.hash_2 are joined using a merge join, with a join
condition of ("TD000.hash_2.department_number =
TD000.d.department_number"). The result goes into Spool 2, which is
duplicated on all AMPs. The size of Spool 2 is estimated with low
confidence to be 64 rows. The estimated time for this step is 0.18
seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to TD000.e. Spool 2 and TD000.e are
joined using a product join, with a join condition of ( "(Field_2 =
TD000.e.employee_number) AND (Field_3 = (SUBSTRING(TD000.e.RowID FROM
5 FOR 4 )))"). The result goes into Spool 1, which is built locally on
the AMPs. The size of Spool 1 is estimated with low confidence to be 8
rows. The estimated time for this step is 4.92 seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request. -> The contents of Spool 1 are sent back to
the user as the result of statement 1. The total estimated time is

5.10 seconds. )
Points to consider about this query plan:

The hash index is used in place of the employee table

No redistribution of rows is needed.

Job-code values are returned by using the ROWID pointer to the base row
(Step #6).

Row hash locks are used to access the base rows of employee table.

This plan assumes that both tables are fairly large.

A similar effect could have been achieved with a single table join index (STJI) by
adding an explicit ROWID to the index definition as follows:
CREATE JOIN INDEX ji_emp AS SELECT employee_number, department_number,
ROWID FROM emp1;
The following page lists advantages of Hash Indexes over STJI's.

Hash Index Advantages and Limitations

Hash Index Advantages


Hash indexes, by comparison, have the following advantages over Single Table Join
Indexes.

They automatically contain the primary index of the base table.

They automatically contain ancillary information (such as uniqueness


number) needed to calculate the row-id of the base table row.

Their syntax is similar to secondary index syntax, thus simpler.

They are automatically compressed for storage.

Hash Index Limitations


The following are limitations of using Hash Indexes:

A total maximum of 32 hash indexes, join indexes and secondary indexes


can be associated with a table.

A hash index can consist of no more than 16 columns.

Hash indexes are not supported with the following Teradata features and
utilities:
o

Multiload

Fastload

Archive/Recovery

Triggers

Permanent Journal

Upsert Processing

Lab
Try It!

For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).

Lab 1

1.) Prior to doing this lab exercise, it will be necessary to recreate


two tables in your user space. Accomplish this with the following
commands:
CREATE TABLE loc1 AS
Customer_Service.location WITH DATA;
CREATE TABLE loc_emp1 AS Customer_Service.location_employee WITH
DATA;

1a.) Create a hash index which will facilitate joins between the 'loc1'
and 'loc_emp1' tables. The hash index should:

- Contain the columns 'employee_number' and 'location_number'.


- Be distributed based on the hash of location_number.
- Be ordered based on the hash of location_number.
1b.) Once the hash index is successfully created, execute the
following join. EXPLAIN the join to see if the hash index was used.
SELECT l.location_number
, employee_number
, customer_number
FROM loc1 l INNER JOIN loc_emp1 l_e
ON l.location_number = l_e.location_number
ORDER BY 3;

10.) Materialized Views


Objectives

After completing this module, you should be able to do the following:

Implement a materialized view as a partially covered join index

Implement a materialized view as a sparse join index

Materialized Views and Hash Index Review

What Are Materialized Views?


Materialized views refer to precomputing and maintaining query results in a
database management system. In the Teradata database, materialized view
features are based on a range of capabilities built upon the existing join index
technology. Said differently, Teradata implements materialized views as join
indexes. Before looking at the materialized view features available with Teradata, it
will be helpful to review the concept of a hash index.

Hash Indexes
First, let's review a little bit about hash indexes as seen in a previous section.
CREATE HASH INDEX hash_2
(employee_number, department_number) ON emp1
BY (department_number)
ORDER BY HASH (department_number);
Hash indexes, by definition are defined on a single table. This hash index 'hash_2'

can be useful to the optimizer in handling the following query:


SELECT employee_number, department_name
FROM emp1 e INNER JOIN dept1 d
ON e.department_number = d.department_number;
This query is partially covered by the hash index (HI). Partially covered means that
the optimizer cannot resolve the query with the hash index alone. It must bring in
another table - in this case the department table - in order to retrieve the
department_name column. Because the index is ordered on the hash of
department number, it can join the HI directly to the department table based on
same primary indexes (PI)s.
The join step of the EXPLAIN of this query is seen here:
4) We do an all-AMPs JOIN step from TD000.hash_2 by way of a RowHash
match scan. with no residual conditions, which is joined to TD000.d.
TD000.hash_2 and TD000.d are joined using a merge join, with a join
condition of ("TD000.hash_2.department_number =
TD000.d.department_number"). The result goes into Spool 1, which is
built locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 16 rows. The estimated time for this step is 0.18
seconds.
As can be seen, this is simply a join between the HI and the department table. The
optimizer picked this plan because it did not have to prepare either side of the join the rows are already in department number hash sequence for both the HI and the
department table. Accessing the employee table was not necessary to resolve this
query.

Join Backs
Note that the following query, which additionally selects the job_code column, is
also able to use the HI. This is due to the availability of the ROWID which is
implicitly included in all hash indexes. The implicit ROWID allows the optimizer to 'join

back' to the base employee row to pick up additional information (i.e., job_code), not
available in the HI itself.
SELECT e.employee_number
, d.department_name
, e.job_code
FROM emp1 e INNER JOIN dept1 d
ON e.department_number = d.department_number;
5) We do an all-AMPs JOIN step from TD000.d by way of a RowHash match
scan. with no residual conditions, which is joined to TD000.hash_2.
TD000.d and TD000.hash_2 are joined using a merge join, with a join
condition of ("TD000.hash_2.department_number =
TD000.d.department_number"). The result goes into Spool 2, which is
duplicated on all AMPs. The size of Spool 2 is estimated with low
confidence to be 64 rows. The estimated time for this step is 0.18
seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to TD000.e. Spool 2 and TD000.e are

joined using a product join, with a join condition of ( "(Field_2 =


TD000.e.employee_number) AND (Field_3 = (SUBSTRING(TD000.e.RowID
FROM 5 FOR 4 )))"). The result goes into Spool 1, which is built
locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 8 rows.
The estimated time for this step is 4.92 seconds.

Step 5 joins the HI to the department table, and step 6 uses the implicit ROWID in the HI
to locate the base row for each employee to extract its job code. We consider the HI to
partially cover the query because it still requires information from another table.
A limitation of HI's is that they are by definition single table indexes.

Partial Covering Join Indexes

Multi-Table Partial Covering Join Indexes


We can also create a join index on multiple tables with the same 'partial covering'
capability of hash indexes.
CREATE JOIN INDEX ji_emp_dept1 AS
SELECT e.employee_number
, d.department_name
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Note that the following query is the same query we executed before, however in
this case, the join index completely covers the query.
EXPLAIN SELECT employee_number, department_name
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
1) First, we lock a distinct SQL00."pseudo table" for read on a
RowHash to prevent global deadlock for SQL00.ji_emp_dept1.
2) Next, we lock SQL00.ji_emp_dept1 for read.
3) We do an all-AMPs RETRIEVE step from SQL00.ji_emp_dept1 by way of
an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The size of Spool 1
is estimated with low confidence to be 40 rows. The estimated time for
this step is 0.06 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.06 seconds.
Note how simple the EXPLAIN becomes in this case. The query is completely
covered by a scan of the join index. No join is needed to get the answer set. The
real work of getting the answer set all occurs in step 3 of the EXPLAIN text.

Now create a second join index which includes the ROWID for the employee table.
CREATE JOIN INDEX ji_emp_dept2 AS
SELECT e.employee_number
, d.department_name
, e.ROWID
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Now in addition to selecting the employee number and department name, let's add
the job code to the select list. This column exists on the employee table and is not
an explicit part of the join index.
SELECT e.employee_number, d.department_name, e.job_code
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
The optimizer may choose to cover this query using the join index to acquire the
employee number and the department name, and it may use the rowid of the
employee table to 'join back' to the base employee row to acquire the job code.
Thus, it works similarly to a hash index, however the join index has the added
property of being defined on multiple tables. This provides the oppotunity to 'join
back' to either table, assuming both ROWID's are specified in the join index
definition.
It is still considered a 'partial covering' index because it still had to join back to the
employee table to fully resolve the query, but it did not have to scan the entire
employee table.
The join back capability provided by the ROWID syntax is typically chosen by the
optimizer when the number of rows in the table is fairly large. Smaller table joins
may not demonstrate this approach.

Multiple Join Back Capability

Specifying Multiple Rowids In a Join Index


We could also have coded the join index as follows:
CREATE JOIN INDEX ji_emp_dept3 AS
SELECT e.employee_number
, d.department_name
, e.ROWID AS e_rowid
, d.ROWID AS d_rowid
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Now we are including the rowid for both tables, thus giving the optimizer the ability
to join back in either direction. Thus, a query like the following may use the join
index to join back to the department table this time, to pick up the department

manager's number.
SELECT e.employee_number
, d.department_name
, d.manager_employee_number
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Note, to use this option, aliases must be assigned to each rowid column selected in
the join index definition. If the columns are not 'renamed' using an alias, the
syntaxer will not allow more than one column named 'ROWID' in the same query.

Sparse Join Indexes

Definition of Sparse Join Index


A Spare Index is a Join Index with a WHERE clause which restricts the participating
rows from the base tables. A Sparse Index can significantly reduce the size of the
join index which must be built and maintained by the system.
A sparse index makes sense when a definable subset of the rows in the underlying
tables are needed to satisfy a large percentage of the queries which will use it.
For example, a join index might be defined on a join of two tables, one a history
table containing 5 years of order data and the other a table providing the customer
name. Since the history table might be large, maintenance of this join index might
be costly. Furthermore, if 90% of the queries are typically requesting information
on the current year, it might make sense to create a sparse index with entries for
that year only. Queries for the remaining four years could access the base tables
since their frequency is much lower.
Example of reasons for creating a Sparse Index, might include:

Tables with lots of nulls which are ignored for purposes of most queries

Tables with frequent access for rows which contain quantities above or below
a certain limit.

Tables which are time oriented and where the most frequent accesses are for
current information.

Example
Create a non-sparse join index between the customers and orders tables.
CREATE JOIN INDEX cust_ord_ix
AS SELECT (c.cust_id, cust_name)
,(order_id, order_status, order_date)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id

PRIMARY INDEX (cust_id);


How many orders have assigned valid customers?
SELECT COUNT(o.order_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id;

Count(order_id)
--------------86
This query represents a simple join of the two tables to produce an aggregation.
The defined join index is able to resolve this query as seen in the extract of the
EXPLAIN seen here.
3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of
an all-rows scan with no residual conditions, and the
grouping identifier in field 1. Aggregate Intermediate Results are computed
globally, then placed in Spool 4. The size of Spool
4 is estimated with high confidence to be 1 row. The estimated time for this
step is 0.08 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an allrows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with high confidence to be 1
row. The estimated time for this step is 0.03
seconds.

Now, let's try another query, this time restricting the time interval.
How many valid customers have assigned orders in January 2002?
SELECT COUNT(c.cust_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-0131');
Once again, we see that the join index is able to cover the query. The number of
rows participating is however, expected to be much fewer.
3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of
an all-rows scan with a condition of (
"(SQL00.cust_ord_ix.order_date <= DATE '2009-01-31') AND
(SQL00.cust_ord_ix.order_date >= DATE '2009-01-01')"), and the
grouping identifier in field 1. Aggregate Intermediate Results are computed
globally, then placed in Spool 4. The size of Spool
4 is estimated with high confidence to be 1 row. The estimated time for this
step is 0.06 seconds.

Since we anticipate that most of the queries against this table will involve rows
from the year 2009, we may wish to create a sparse index with only those rows
represented in the index.

Creating a Sparse Join Index

The following creates a new sparse join index which only includes rows for the year
2009.

CREATE JOIN INDEX cust_ord_ix_2009


AS SELECT (c.cust_id, cust_name)
,(order_id, order_status, order_date)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE EXTRACT (YEAR FROM order_date) = '2009'
PRIMARY INDEX (cust_id);
Now, we can run the same query again and see if the optimizer uses the sparse
index.

SELECT COUNT(c.cust_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-0131');
Count(order_id)
--------------86
3) We do an all-AMPs SUM step to aggregate from
SQL00.cust_ord_ix_2009 by way of an all-rows scan with a condition
of ("(((EXTRACT(YEAR FROM
(SQL00.cust_ord_ix_2009.order_date )))< )))= 2009) AND
((EXTRACT(MONTH FROM <= 1 ))) AND ((EXTRACT(YEAR
FROM (SQL00.cust_ord_ix_2009.order_date )))>= 2009)"), and the are
computed globally, then placed in Spool 4. The size of Spool
time for this step is 0.08 seconds.
Any query for the year 2009 which is covered by the sparse index will be optimized
to use the sparse index instead of the base tables.
Example

SELECT c.cust_id, cust_name, order_id, order_status, order_date


FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE c.cust_id > 600
AND EXTRACT (YEAR FROM order_date) = '2009'
AND EXTRACT (MONTH FROM order_date) IN (2,3);
cust_id
order_date
-------------------1009
08
1004
17
1008
04

cust_name
---------------

order_id

order_status

-----------

------------

YZA Corp

648

2009-02-

JKL Corp

620

2009-02-

VWX Corp

645

2009-03-

1005

MNO Corp

627

2009-03-

1006

PQR Corp

633

2009-03-

1009

YZA Corp

649

2009-03-

1004

JKL Corp

621

2009-03-

1008

VWX Corp

644

2009-02-

1005

MNO Corp

626

2009-02-

1006

PQR Corp

632

2009-02-

1003

GHI Corp

614

2009-02-

1007

STU Corp

639

2009-03-

1003

GHI Corp

615

2009-03-

1007

STU Corp

638

2009-02-

27
24
08
17
04
27
24
12
14
12
14

The following is an EXPLAIN of this query. Note the use of the sparse index.
Explanation
----------------------------------------------------------------------1) First, we lock a distinct SQL00."pseudo table" for read
on a RowHash to prevent global deadlock for
SQL00.CUST_ORD_IX_2009.
2) Next, we lock SQL00.CUST_ORD_IX_2009 for read.
3) We do an all-AMPs RETRIEVE step from
SQL00.CUST_ORD_IX_2009 by way of an all-rows scan with a
condition of (
"(SQL00.CUST_ORD_IX_2009.cust_id > 600) AND (((EXTRACT(YEAR
FROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2009) AND
(((EXTRACT(MONTH
FROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2) OR
((EXTRACT(MONTH FROM
(SQL00.CUST_ORD_IX_2009.order_date )))=3 )))") into Spool 1
(group_amps), which is built locally on the AMPs. The size
of Spool 1 is estimated with no confidence to be 1 row. The
estimated time for this step is 0.06 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs
involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the
result of statement 1. The total estimated time is 0.06
seconds.

Sparse Index Advantages and Limitations

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that
reduces the number of rows which would otherwise be included in the index. All
types of join indexes, including single-table, multi-table, simple or aggregate can be
sparse.
A sparse index makes sense when a definable subset of the rows in the join index
are needed to satisfy a large percentage of the queries which will use it.
By default, any join index, including sparse join index, has a NUPI on the first
column specified. You can explicitly define other columns to be the primary index.
Any combination of AND,OR,IN conditions may be applied to the sparse index WHERE
clause.

Sparse Index Advantages:

Reduces the storage requirements for a join index subtable.

Access is faster since the size of the subtable is smaller.

Better maintenance performance since not all changes to the base table will
affect the sparse index.

Sparse Index Limitations:

Sparse indexes have the same restrictions as any join index.

They require additional space and maintenance resources over and above
the base table requirements.

Summary

Materialized views are a cross between an index and a view.


Like an index, a materialized view has subtable rows, it can carry a row-id for joinback purposes and it requires maintenance when base table rows change.
Like a view, a materialized view represents a subset of the data in a table and the
view changes along with changes in the underlying base rows.
Materialized views are typically implemented to improve query performance. We
have discussed the following materialized view features in this module.
Partial-Covering Multi-Table Join Indexes - These are multi-table join indexes
which include a ROWID specification for one or more of the base tables. This permits
the optimizer to 'join-back' to the base row when information is needed which is not
provided in the join index. Ultimately, the optimizer will decide whether or not to
use this approach in place of another, depending on the associated query costs.
Sparse Join Indexes - These are join indexes which include a WHERE clause

which limits the base table rows that will be reflected in the join index. This permits
a join index to be built only for the rows which are most frequently accessed by
queries, such as current year or current month. The size of the join index is thereby
smaller and the maintenance costs are subsequently less. The optimizer will decide
if it can use a sparse index to reduce the costs associated with a given query.

Lab
Try It!

For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).

Click on the buttons to the left to see the answers.

Answers:
Lab 1b
Lab 2
Lab 3

1a.) Create and populate the following two tables in your database,
then run the UPDATE statement.
CREATE TABLE orders AS Student.orders WITH DATA;
CREATE TABLE customers AS Student.customers WITH DATA;
UPDATE orders
SET order_date = order_date + INTERVAL '10' YEAR;

1b.) Create a sparse join index named cust_ord_ix_2009 with the


following properties:

Fixed index columns cust_id, cust_name

Variable index columns order_id, order_status, order_date

Inner join on customers and orders tables

Join condition on the cust_id columns

Where condition to include only open orders (order_status =


'O')

NUPI index is on cust_id.

2.) Create a query that returns a count of all open orders held by
valid customers.
3

You might also like