Join Index

4.
) Column Value Management

Objectives
Upon completion of this module, the student should be able to:
Insert into a table using the DEFAULT VALUES feature.
Perform string functions using the POSITION feature.
Test case sensitivity with the LOWER attribute.
Rename a column of a table.
Using DEFAULT VALUES
Teradata has the ability to insert a row using only the DEFAULT VALUES keywords.
For this feature to work successfully, one of the following statements must be true
for each column of the table:
- the column has defined default values
- the column has a default system value specified
- the column permits nulls
If none of these statements is true, an insert using DEFAULT VALUES will fail. Note
that such an insert may be executed multiple times as long as no uniqueness
attributes of the table are violated.
Column Attributes:
NOT NULL - Nulls are not permitted for this column
DEFAULT 22 - Unless otherwise specified, assign the column a value of 22
DEFAULT DATE '2010-01-01' - Unless otherwise specified, assign a date of

Jan 1, 2010
WITH DEFAULT - Assign the system default, spaces for char strings, zero for
numeric data types and current date for date data type
DEFAULT TIME - Assign the current time to the integer data type
DEFAULT USER - Assign the user id of the session to the character string
The command - INSERT INTO tablename DEFAULT VALUES;
Will insert defined default values into each column.
Will insert a null if no default is defined.
Will fail if no default is defined and a null is not allowed.
Create a Test table

CREATE TABLE test_tbl
(cola SMALLINT NOT NULL DEFAULT 22
,colb CHAR(1)
,colc DATE DEFAULT DATE '2010-01-01'
,cold DEC(3,2) NOT NULL WITH DEFAULT
,cole TIME(0) DEFAULT CURRENT_TIME
,colf INT DEFAULT TIME
,colg CHAR(8) DEFAULT USER);
Populate the test table

INSERT
SELECT
cola
-----22
INSERT
SELECT
cola
-----22
22
INTO test_tbl DEFAULT

* FROM test_tbl;
colb
colc
cold
---- -------- ----?
10/01/01
.00
INTO test_tbl DEFAULT
* FROM test_tbl;
colb
colc
cold
---- -------- ----?
10/01/01
.00
?
10/01/01
.00
VALUES;
cole
-------15:27:31
colf
----------152731
colg
-------TD036
cole
-------15:27:31
15:27:43
colf
----------152731
152743
colg
-------TD036
TD036
VALUES;
Defaulting Methods
Defaulting of data values may also be accomplished by the use of positional

commas in an INSERT statement. The positional commas indicate the use of a
default value if one is specified and a null if one is not. If neither outcome is
possible, an error is returned.
Let's look at the traditional method of defaulting values.
The INSERT
INSERT INTO test_tbl VALUES (,,,,,,);
The SELECT
SELECT * FROM test_tbl;
cola colb
colc
cold
------ ---- -------- -----
cole
--------
colf
-----------
colg
--------
22
22
22
?
?
?
10/01/01
10/01/01
10/01/01
.00
.00
.00
15:27:31
15:27:43
15:33:13
152731
152743
153313
TD036
TD036
TD036
While it is possible to alter a column definition via an ALTER TABLE statement, care
must be used to not add attributes which are in conflict with existing attributes or
existing data. Adding a new column with a NOT NULL attribute to a table returns an
error if the table already has rows defined. The new column must initially be set to
either a null or a value for the existing rows.
The ALTER TABLE

ALTER TABLE test_tbl ADD colh SMALLINT NOT NULL;
***Failure 3559 Column COLH is not NULL and it has no default value.
By adding the WITH DEFAULT phrase, the NOT NULL attribute is permitted. The new
column being added will carry the system default value initially.
The Correct ALTER TABLE

ALTER TABLE test_tbl ADD colh SMALLINT NOT NULL WITH DEFAULT;
The INSERT
INSERT INTO test_tbl DEFAULT VALUES;
The SELECT
SELECT
cola
-----22
22
22
22
* FROM test_tbl;
colb
colc
cold
---- -------- ----?
10/01/01
.00
?
10/01/01
.00
?
10/01/01
.00
?
10/01/01
.00
cole
-------15:27:31
15:27:43
15:33:13
15:38:42
colf
----------152731
152743
153313
153842
colg
-------TD036
TD036
TD036
TD036
colh
-----0
0
0
0
Creating a Table
CREATE TABLE abc (a INT NOT NULL)
The INSERT
INSERT INTO abc DEFAULT VALUES;
***Failure 3811 Column 'a' is NOT NULL. Give the column a value.
Creating Tables in Teradata Mode
Tables may be created in either Teradata mode or ANSI mode. Tables created in
Teradata mode will have all character columns defined as NON CASESPECIFIC by
default. This means that the data will be stored in the column in the same case that
it was entered. It will also be returned to the user in this stored case, however all
testing against the column will ignore case specificity.
The default comparison operation for non-casespecific columns is always noncasespecific.
CREATE a Table
CREATE TABLE case_nsp_test
(col1 CHAR(7)
,col2 CHAR(7));
(Columns created in Teradata mode are defaulted to NOT CASESPECIFIC)
SHOW the Table

SHOW TABLE case_nsp_test;
CREATE SET TABLE PED.case_nsp_test ,NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
col1 CHAR(7) CHARACTER SET LATIN NOT CASESPECIFIC,
col2 CHAR(7) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX (col1);
INSERT into the Table

INSERT INTO case_nsp_test VALUES('LAPTOP','laptop');
SELECT from the Table

SELECT * FROM case_nsp_test WHERE col1=col2;
Result
col1
LAPTOP
col2
laptop
Because both columns are defined as NCS, all comparision tests will be done NCS.
Creating ANSI Mode Tables
Tables created in ANSI mode will have character columns defaulted to

CASESPECIFIC. As before, data will be stored and retrieved in the case of the
original INSERT, however all tests and comparisons of the data will be done in
casespecific mode.
In order to do non-casespecific testing, also called case blind testing, it is necessary to

apply a function such as the UPPER function to both sides of the comparison, thus
rendering case a non-factor in the test.
Example
Initiate an ANSI session.
.SET SESSION TRANSACTION ANSI;
.LOGON L7544/tdxxx;
CREATE a Table
CREATE TABLE case_sp_test
(col1 CHAR(7)
,col2 CHAR(7));
(Columns created in ANSI mode are defaulted to Casespecific.)

SHOW the Table
SHOW TABLE case_sp_test;
CREATE MULTISET TABLE tdxxx.case_sp_test,NO FALLBACK,

NO BEFORE JOURNAL,
NO AFTER JOURNAL
( col1 CHAR(7) CHARACTER SET LATIN CASESPECIFIC,
col2 CHAR(7) CHARACTER SET LATIN CASESPECIFIC)
PRIMARY INDEX (col1);
INSERT into the Table
INSERT INTO case_sp_test VALUES('LAPTOP','laptop');
SELECT from the Table

SELECT * FROM case_sp_test WHERE col1=col2;
***Query completed. No rows found.
SELECT using UPPER

SELECT * FROM case_sp_test WHERE UPPER(col1)=UPPER(col2);
col1
col2
------- ------LAPTOP
laptop
In ANSI, it is necessary to perform a case blind test to do non-case specific testing.
The LOWER Function (1 of 2)
The LOWER function behaves similarly to the UPPER function but in the reverse
direction. LOWER may be used as the choice for case blind test, just as UPPER may.
The LOWER function:
Allows case blind testing on case specific strings.
Allows storage and retrieval of lower case characters.
Example
SELECT *
col1
------LAPTOP
FROM case_sp_test WHERE LOWER(col1)=LOWER(col2);

col2
------laptop
Note: Case blind test.
UPDATE the Table

UPDATE case_sp_test SET col2=col1;
SELECT *
col1
------LAPTOP
FROM case_sp_test;
col2
------LAPTOP
Both LOWER and UPPER may be used to change the stored contents of a column from
one case to another. This may be done via an UPDATE statement which applies the
function to the updated column values.
UPDATE the Table

UPDATE case_sp_Test SET col1=LOWER(col1);
SELECT * FROM case_sp_test;
col1
col2
------- ------laptop
LAPTOP
The LOWER function provides the reverse capabilities of the UPPER function.
The LOWER Function (2 of 2)

A third way to accomplish a case blind test is via use of the NOT CASESPECIFIC
attribute applied to the column test. This may be abbreviated to NOT CS and is
applied to the column within parenthesis following the column name. Unlike UPPER
and LOWER, NOT CS is not a function but rather an attribute of the column, to be
applied to the column as part of the test in this situation.

Note that the use of NOT CS, while accomplishing the same case blind test, is not
considered ANSI standard syntax. If ANSI standard compliance is required, the case
blind test should be done using either LOWER or UPPER.
Example
Let's INSERT a second row into the table.
INSERT INTO case_sp_test VALUES ('LAPTOP','LAPTOP');

SELECT *
col1
------laptop
LAPTOP
FROM case_sp_test;
col2
------LAPTOP
LAPTOP
SELECT * FROM case_sp_test WHERE col1=col2;

col1
col2
------- ------LAPTOP
LAPTOP
Note: Case sensitive result.
SELECT * FROM case_sp_test WHERE col1=LOWER(col2);

col1
col2
------- ------laptop
LAPTOP
SELECT *
col1
------laptop
LAPTOP
FROM case_sp_test WHERE col1(NOT CS)=col2(NOT CS);

col2
------LAPTOP
LAPTOP
Note: Case blind test but non-ANSI syntax.
SELECT LOWER (col1) FROM case_sp_test;

Lower(col1)
----------laptop
laptop
Note: Convert display to lowercase.
POSITION Function
The POSITION function is the ANSI standard form of the INDEX function of Teradata
SQL. They are both used for locating the position of a string within a string.
Both functions require two arguments, the column or character string to be tested,
and the character or string of characters to be located.
With the INDEX function, the two arguments are separated with a comma.
With the POSITION function, the more English-like IN keyword is used in place of a
comma.
While both functions will continue to be available for compatibility purposes, it is
suggested that future coding be done with POSITION.
Example
SELECT INDEX ('laptop','p');

Index('laptop','p')
------------------3
SELECT INDEX ('laptop','top');
Index('laptop','top')
--------------------4
The POSITION function is the ANSI standard function for locating a string within a
string.
SELECT POSITION ('p' IN 'laptop');

Position('p' in 'laptop')
------------------------3
SELECT POSITION ('top' IN 'laptop');
Position('top' in 'laptop')
--------------------------4
Both POSITION and INDEX are available functions, but only POSITION is
ANSI standard.
POSITION and Case Sensitivity
Care must be exercised in executing the same scripts in both ANSI and Teradata
session mode, particularly where issues of case sensitivity are involved. This may
become very significant in using the POSITION function which will or will not
attempt to match case depending on the session type.
Note that in three of the five examples shown here, the ANSI session produces a
different result than the Teradata session.
Examples
Note: "ANSI result" implies an ANSI transaction session mode. "Teradata result"
implies a BTET transaction session mode.
SELECT POSITION ('top' IN 'laptop');
ANSI result Teradata result

4
4
SELECT POSITION ('TOP' IN 'laptop');

0
4
SELECT POSITION (upper('top') IN 'laptop');

0
4
SELECT POSITION (lower('TOP') IN 'laptop');

4
4
SELECT POSITION (lower('TOP') IN 'LAPTOP');

0
4
Renaming Columns
Columns in a table may be renamed using the ALTER TABLE command. In order to
qualify for renaming, a column must not be referenced by any external objects and
must be assigned a new name not already in use by the table.
A column which participates in any index is not a candidate for renaming. Likewise,
a column which is either a referenced or referencing column in a referential
integrity constraint may not be renamed.
Note that renaming a column does not cascade the new name to any macros or
views which reference it. The views and macros will no longer function properly
until they have been updated to reflect the new name.
A column may be renamed to a different name provided that:
The new name doesn't already exist in the table.
The column is not part of an index.
The column is not part of any referential integrity contraints.
The affected column is not referenced in the UPDATE OF clause of a trigger.
Example
CREATE a Table
CREATE TABLE rename_test
(col1 INT
,col2 INT
,col3 INT)
UNIQUE PRIMARY INDEX (col1)
,INDEX (col3);
Now ALTER it
ALTER TABLE rename_test RENAME col2 TO colb;
SHOW the Table

SHOW TABLE rename_test;
CREATE SET TABLE PED.rename_test ,NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
col1 INTEGER,
colb INTEGER,
col3 INTEGER)
UNIQUE PRIMARY INDEX (col1)
INDEX (col3);
ALTER it Again
ALTER TABLE rename_test RENAME col1 TO cola;
Result
Failure:
Column COL1 is an index column and cannot be modified.
ALTER it Again
ALTER TABLE rename_test RENAME col3 TO colc;
Result
Failure:
Lab
Column COL3 is an index column and cannot be modified.
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Be sure to change your default database to the Customer_Service

database in order to run these labs.
Click on the buttons to the left to see the answers.
Answers:
Lab A
Lab B
Lab C
Lab D
Lab E
A. Show the first and last name of any employee in the employee table who
uses an initial followed by a period, instead of his/her first name. Use
position to solve this.
B. Show any employee who has an upper case letter B in his first name, but
in a position other than the first position. Use the POSITION function in
an ANSI mode session to solve this. While in ANSI mode, use the LIKE
operator to solve this problem again. Try these solutions in Teradata
(BTET) mode. Do they produce the same result? Is there a solution to this
problem in Teradata mode?
C. Show first and last name of any employee who has the letter 'a' in the
same position in both their first and last name. (Return to BTET mode
first.)
D. Display first names in lowercase and last names in uppercase for
employees whose last name begins with the letter R. Use Position to solve
this.
E. Create a small table according to the following definition.
Hint: For the remainder of these labs, it will be helpful to reset your
default database to your userid. (DATABASE tdxxx;)
F.
G.
H.
I.
CREATE TABLE rename_tbl

(col1
INT
,col2
INT
,col3
INT);
Populate the table with one row as follows. INSERT INTO
rename_tbl (1,2,3);
Create a view to access this table as follows:
CREATE VIEW rename_view AS

SELECT col1 AS vcol1 ,col2 AS vcol2
FROM rename_tbl;
Select the row via the view.
Attempt to rename col1 of rename_tbl to be colA.
Attempt to rename col2 of the rename_tbl to be colB.
Attempt to select the row from the view again.
Replace the view with the renamed column.
Attempt to select the row from the view again.
5.) Identity Columns and Key Retrieval

Objectives
After completing this module, you should be able to:
Use the Identity Column feature to generate an identity column for a table.
Use this feature to implement a Primary Key column.
Use this feature to implement a Unique column.
Automatically retrieve generated identity column values after row insertion.
Identity Column Features
The Generated Identity Column feature permits the automated generation of a

column value based on a prescribed sequencing and interval set. A typical
application of this feature would be to generated the values associated with a
system assigned primary key.
The following are options which can be used with this feature:
GENERATED ALWAYS - will always generate a value for this column, whether or not
a value has been specified.
GENERATED BY DEFAULT - will generate a value for this column only when
defaulting or a null is specified for the column value.
START WITH - the value which will be used to start the generated sequence.
(default is 1)
INCREMENT BY - the interval which is used for each new generated value. (default
is 1)
MINVALUE - the smallest value which can be place in this column (default is
smallest value supported by the data type of the column)
MAXVALUE - the largest value which can be place in this column (default is largest
value supported by the data type of the column)
CYCLE - after the maxvalue has been generated, restart the generated values
using the MINVALUE.
Only numeric data types from the following list may be used for identity columns:
INTEGER
SMALLINT
BYTEINT
DECIMAL
NUMERIC
The identity column feature only supports whole numbers.
Use of ALWAYS, MAXVALUE and CYCLE

Create a table with an ALWAYS system-generated unique primary index with a
maximum value of 3.
CREATE SET TABLE test_idCol ,FALLBACK
(
Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 3)
,Col2 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX ( Col1 );
Populate the table as follows:
INSERT INTO test_idcol VALUES (, 9);
INSERT INTO test_idcol VALUES (NULL, 9);
INSERT INTO test_idcol VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a

system-generated number
SELECT * FROM test_idCol ORDER BY 1;

Col1
----------1
2
3
Col2
----------9
9
9
Things to notice:
A value is generated regardless of whether:
a default is specified
a null is specified
a value is specified
When a value is specified and replaced, a warning is given.

Now, let's add another row.
***Failure 5753 Numbering for Identity Column Col1 is over its limit.
It was not possible to add another row, because the MAXVALUE has been achieved.
In fact, no new row may be added to this table ever again.
Remove the rows from the table.
DELETE FROM test_idCol;
Now try to add the additional row.
***Failure 5753 Numbering for Identity Column Col1 is over its limit.
Note that even after you delete all rows from the table, you cannot add new rows if
you have exceeded MAXVALUE and you have not specifed the CYCLE option. You
must drop and recreate this table in order to add rows.
Create another table similar to the previous one, but with the CYCLE option.
CREATE SET TABLE TEST_ID3 ,FALLBACK
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 3
Col2 INTEGER)
Again, insert the first three rows as follows:
INSERT INTO test_id3 VALUES (, 9);
CYCLE),
INSERT INTO test_id3 VALUES (NULL, 9);

INSERT INTO test_id3 VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a system-generated
number
SELECT * FROM test_id3 ORDER BY 1;

Col1
Col2
----------- ----------1
9
2
9
3
9
Now, attempt to execute the same three insert statements a second time.
*** Warning: 5789 Value for Identity Column is replaced by a

system-generated number
Col1
-----------2147483647
-2147483646
-2147483645
1
2
3
Col2
----------9
9
9
9
9
9
Notice what happens here:
After hitting the MAXVALUE, it reverts to the default minimum value.
It does not revert to the START WITH value, which is 1.
The default minimum value for an integer is approximately negative two

billion.
Each subsequent insert increments the value by the default increment of 1.
Use of ALWAYS, MINVALUE and CYCLE

Drop and recreate table with a MINVALUE of 1 specified.
DROP TABLE test_id3;
CREATE SET TABLE test_id3
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 3
CYCLE),
Col2 INTEGER)
Insert the same three starter rows.
*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number
Col1
----------1
2
3
Col2
----------9
9
9
Now, insert a 4th row.

***Failure 2801 Duplicate unique prime key error in PaulD.TEST_ID3.
What happened here?
The fourth row would have made Col1 = 1 due to the CYCLE option.
This violates the uniqueness of the primary index Col1, thus it

is rejected.
Drop and recreate the table with a MINVALUE specified and with a nonunique primary index.
CREATE SET TABLE TEST_ID3
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 3
CYCLE),
Col2 INTEGER)
PRIMARY INDEX ( Col1 );
*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number
Now try to add the 4th row.
INSERT INTO test_idcol VALUES (NULL, 9);

***Failure 2802 Duplicate row error in PaulD.TEST_ID3.
What happened this time?
The fourth row would have had values (1,9) due to the CYCLE
option..
This violates the 'no duplicate row' rule, thus it is rejected.
The following rows are inserted.

*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number.
SELECT * FROM test_id3 ORDER BY 2,1;
Col1
----------1
2
3
1
2
3
Col2
----------7
7
7
9
9
9
All rows are inserted successfully and the identity column recycles.
Handling Gaps in the Sequence

Now, let's drop and recreate the table as previously with a Unique Primary Index.
CREATE SET TABLE test_id3
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1
MAXVALUE 5 CYCLE),
Col2 INTEGER)
*** Warning: 5789 Value for Identity Column is replaced by a system-generated number

Col1
----------1
2
3
Col2
----------9
9
9
Remove the second row inserted.

DELETE FROM test_id3 WHERE Col1 = 2;
Insert an additional row.
Col1
----------1
3
4
Col2
----------9
9
9
What happened here?
Gaps in the sequence are not filled.
The column always knows what its last used value is.
It increments that value and attempts to assign the next value.
Now, add the following two additional rows.

INSERT INTO test_id3 VALUES (, 9); - adds the (5,9) row
INSERT INTO test_id3 VALUES (, 9); - fails
*** Failure 2802 Duplicate row error in TEST_ID3.
SELECT
* FROM test_id3 ORDER BY 1;
Col1
----------1
3
4
5
Col2
----------9
9
9
9
Things to notice:
The final insert intended to insert a (1,9) row.
This is because the MAXVALUE is 5 and CYCLE was specified.
It couldn't do that since the (1,9) row already exists.
Insert an additional row.
INSERT INTO test_id3 VALUES (NULL, 8); - adds (2,8)

Col1
----------1
2
3
4
5
Col2
----------9
8
9
9
9
Note that the increment occurs, even though the insert of 1 previously
failed.
Generating BY DEFAULT (1 OF 2)
Now let's switch to BY DEFAULT mode. This mode will generate a value only when a
value is not explicitly expressed.
CREATE SET TABLE TEST_ID3 ,FALLBACK
(
Col1 INTEGER GENERATED BY DEFAULT AS IDENTITY
(MINVALUE 1 MAXVALUE 5 CYCLE),
Col2 INTEGER )
PRIMARY INDEX ( Col1 );
Add two rows to the table.

Col1
----------1
3
Col2
----------9
9
Note that no identity columns were generated. The values were provided
explicitly.
*** Failure 2802 Duplicate row error in TEST_ID3.
Things to notice:
This insert attempted to add the row (1,9) again.
This is because it used the START WITH value of 1.
Because this row already exists, the insert fails.
Add the following row.

INSERT INTO test_id3 VALUES (, 8); - adds (2,8)
Col1
Col2
----------- ----------1
9
2
8
3
9
Notice, 2 is the generated value.
The insert the value 1 previously failed and it will not be reused
until it cycles.
Now, add the following row.
INSERT INTO test_id3 VALUES (6, 9); - adds (6,9)
Col1
----------1
2
3
6
Col2
----------9
8
9
9
Note, added values are explicit, thus no defaulting

Insert the following three additional rows.
Col1
----------1
2
3
3
5
6
Col2
----------9
8
8
9
8
8
9
Note, last generated value, prior to these inserts, was 2.

Add the following row.
Col1
----------1
1
2
3
3
4
5
6
Col2
----------7
9
8
8
9
8
8
9
Note that the recycle has begun by reverting back to the MINVALUE OF
1.
Generating BY DEFAULT (2 OF 2)
Consider that the last row generated for this table is the (1,7) row.

Col1
-----------
Col2
-----------
1
1
2
3
3
4
5
6
7
9
8
8
9
8
8
9
Now remove all rows from the table.
DELETE FROM test_id3;

Add the following row to the table.

Col1
----------2
Col2
----------6
Things To Notice:
The id generation picks up where it left off at 2.
Emptying the table does not reset to the START WITH value.
Add the following row to the table.

Col1
----------2
4
Col2
----------6
6
Note that no identity columns were generated. The value 4 was provided
explicitly.
Now insert the following three rows.
Col1
Col2
----------- ----------2
6
3
5
4
5
4
6
Note that another 4 row is generated even though one already exists.
The generating mechanism operates independently or pre-existing rows.
Col1
----------1
2
3
4
4
5
Col2
----------5
6
5
5
6
5
Note, the recycle begins again.
Rules and Restrictions Of Identity Columns
Identity Column as an Attribute
An Identity Column (IC) is considered an attribute of a column.
You cannot drop (or modify) the IC attribute of a column.
You can drop an IC column from a table.
An IC attribute cannot co-exist with any of the following attributes:

DEFAULT
BETWEEN
COMPRESS
CHECK
REFERENCES
Some Rules for Identity Columns (ICs)
Only one IC is permitted per table.
It can be any column in the table with some exceptions.
IC's cannot be any part of any of the following:
Composite indexes (primary or secondary)
Hash or Join Indexes
Partitioned Primary Indexes
Value-ordered Indexes
Upserts are not supported on tables where the PI is an IC.
Column compression cannot be specified with ICs.
Bulk Row Inserts with Identity Columns (ICs)
IC's are not supported for load utilities Fastload and Multiload.
IC's with multi-session BTEQ or multi-statement TPUMP are permitted.
Bulk inserts across multiple sessions cannot guarantee that the sequence of
the IC numbers will correlate to the sequence of the specified INSERT
statements.
Bulk inserts done via INSERT SELECT also cannot guarantee that the
sequence of the assigned IC's will be unbroken. This is because each AMP
pre-allocates a range of numbers based on a pre-defined interval (specified
in the DBS Control Record). Consequently each AMP will provide its own
sequence independently of the others.
Auto-Generated Key Retrieval
History
Prior to the auto-generated key retrieval feature, there was no simple way to
determine the value assigned to the identity column for an INSERTed row of a
table. If there was a unique column, or a unique combination of columns, with a
USI assigned, then the user could do a SELECT of the identity column, qualifying
on the unique column(s). This required an additional query request and would, in
some cases, require an all-AMP operation, and therefore was considered inefficient.
It also presented a bigger problem in the case of INSERT-SELECT which usually
adds multiple rows to the table in question.
Business Value
Having the IdCol values automatically returned enhances applications that require
quick or immediate retrieval of assigned identity values.
Examples
To enable this feature, use the new .SET command in BTEQ:
.[SET] AUTOKEYRETRIEVE [OFF|COLUMN|ROW]

Where:
OFF = Disabled
COLUMN = Enabled, display IDCol only
ROW = Enabled, display entire row
Example
Lets say we have the following CREATE TABLE statement:
CREATE TABLE customerDetails

(custID INTEGER GENERATED ALWAYS AS IDENTITY,
custName VARCHAR (30),
city VARCHAR (20),
phoneNo INTEGER
);
With Teradata Database V2R6.1 and prior, we enter the following:
INSERT INTO CustomerDetails
(, John, London, 919866234567);
And we receive the following results:
*** Insert completed. One row added.
*** Total elapsed time was 1 second.
We now have no idea what value was assigned to the customer without doing a
separate retrieve of this row. In Teradata V2R6.1 and prior, quick retrieval of the
new custID was possible only if a USI was present on the table. It then required a
separate SELECT of the identity column, using a qualifying condition on the unique
column(s).
In Teradata Database V2R6.2, we can enter the following:
.SET AUTOKEYRETRIEVE ROW;
INSERT INTO customerDetails VALUES

(, George, London, 919848123);
And we receive the following results:
*** Insert completed. One row added.

*** Total elapsed time was 1 second.
custID custName city
------ --------- ---------2 George
London
phoneno
-----------------919848123
Note: If no .SET AUTOKEYRETRIEVE statement is used, the system functions

as before no rows or IDcol values are displayed.
Considerations and Limitations
Considerations
INSERTs have the additional cost of row retrieval if Auto Generated Key Retrieval
(AGKR) is requested. However, the additional retrieval takes less time overall
compared with having to run a separate SELECT to retrieve the identity value.
Limitations
The following limitations apply:
Lab
This feature supports explicit single INSERT and INSERT SELECT

statement only. It does not support any other form of inserts, e.g., Upsert,
MERGE-INTO, Triggered Inserts, Multiload and Fastload.
Iterated INSERTs have to adhere to the 2048 spool limit of the Array
Support feature. A max of 1024 iterations is possible as each iteration uses
an AGKR spool and a response spool.
This feature is enabled through the Client, e.g., JDBC driver or BTEQ. If the
Client version does not include the AGKR capability, then there will be no
AGKR response for INSERT requests.
Try It!
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).

The lab problems in this modules use the following table which
exists in the Customer_Service database.
CREATE SET TABLE agent_sales ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(agent_id INTEGER,
sales_amt INTEGER)
UNIQUE PRIMARY INDEX ( agent_id );
In this lab, we will attempt to assign a badge to each agent. Each

badge must be given a sequentially unique id as it is assigned to an
agent. We will use an identity column to accomplish this.
1.) Create a new table in your own database using the following text:
Answers:
Lab 2
Lab 3
CREATE SET TABLE agent_badge ,NO FALLBACK ,

NO BEFORE JOURNAL,
NO AFTER JOURNAL
(badge_id INTEGER GENERATED ALWAYS AS IDENTITY,
agent_id INTEGER)
UNIQUE PRIMARY INDEX ( agent_id);
2a.) Populate this table from the agent_sales table in the Customer_Service
database using an INSERT SELECT. The first column should be assigned a
NULL and the second column should select the agent_id from the agent_sales
table. To keep the example concise, only populate this table with agents whose id
is less than 40.
2b.) Veryify the resulting table. Order the rows by badge_id. Are the identity
column values what you might have expected?
3a.) We will now try to recreate the agent_badge table with sequentially assigned
ids starting with number one and with no gaps. First, drop and recreate the table.
3b.) Now, set up and execute a BTEQ EXPORT
script which exports the
agent ids to a file called agent_exp.

3c.) Now, set up and execute a BTEQ IMPORT script which imports values from
the agent_exp file into the agent_badge table.
3d.) Verify the results by selecting all rows of the agent_badge table, ordered by
badge_id.
6.) Multi-Column Compression

Objectives
Implement column compression on a table using the ALTER TABLE

command.
Recognize the benefits of column compression implemented with ALTER

TABLE.
Implement multi-value column compression for table columns.
Recognize certain benefits, considerations and limitaions of multi-value

column compression.
Multiple Value Column Compression
The COMPRESS phrase allows values in one or more columns of a permanent table to
be compressed to zero space, thus reducing the physical storage space required for
a table.
The COMPRESS phrase has three variations:
Compression Option
What Happens
COMPRESS
Nulls are compressed.
COMPRESS NULL
Nulls are compressed.
COMPRESS
<constant>,<constant>,..
Nulls and the specified <constant>

value(s) are compressed.
Note: COMPRESS & COMPRESS NULL mean the same thing.

The last of these three options allows up to 255 values to be compressed for a
single column.
Example 1
An example of single-value column compression:
CREATE TABLE bank_account_data

(customer_id INTEGER
,account_type CHAR(10) COMPRESS 'SAVINGS');
Things to notice in this example:
Both nulls and the string 'Savings' will be compressed when the row is
stored.
The value of 'Savings' is written in the table header for table

bank_account_data on each AMP in the system.
Example 2
An example of multiple-value column compression:
CREATE TABLE bank_account_data
(customer_id INTEGER
,account_type CHAR(10) COMPRESS ('SAVINGS','CHECKING','CD','MUTUAL
FUND'));
Things to notice in this example:
Each of the four specified strings are now compressed when the row is
stored.
Nulls are also compressed.
Each of these values is written to the table header on each AMP.
Zero space is taken in the physical row for each of these values.
This can significantly reduce the amount of space needed to contain the
table.
Compression has two primary benefits:
It reduces system storage costs.
It enhances system performance.
Impact Of Multiple Value Column Compression
System Storage Costs:

Compression reduces storage costs by storing more 'logical' data using fewer
'physical' resources. In general, compression causes physical rows to be smaller,
consequently permitting more rows per data block and thus fewer data blocks for
the table.
The amount of storage reduction is a function of the following factors:
The number of values compressed.
The size of the values compressed.
The percentage of rows in the table with these values.
System Performance:
System performance can be expected to improve as a result of value compression.
Because each data block can hold more rows, fewer blocks need to be read to
resolve a query, and thus fewer physical I/O's can be expected to take place. Also,
because the data remains compressed while in memory, more rows can be available
in cache for processing per I/O.
Compression Transparency:
Compression is transparent to all user applications, utilities, ETL tools, ad hoc
queries and views.
Compression Suggestions
Examples of highly compressible values:

Any of the following should be considered candidates for compression, when the
frequency of their occurrence is high:
Nulls
Zeroes
Spaces
Default Values
Suggested Application Columns For Compression:

Any column with a high frequency values, or with a relatively small number of
values should be considered a candidate for compression. The following is a list of
possible candidate columns.
State
City
Country
Automobile Make
Credit Card Type
Account Type
First Name
Last Name
Limitations On Multi-Value Column Compression
Limitations:
A maximum of 255 values may be compressed per column.
The maximum size of a compressed value is 255 bytes.
Only columns with a fixed physical length may be compressed - e.g. CHAR
but not VARCHAR.
Primary index columns cannot be compressed.
The aggregate of all compressed values may not exceed the maximum size
of a table header (64K).
What is Not Compressible

The following data types are not compressible:
INTERVAL
TIME
TIMESTAMP
VARCHAR
VARBYTE
VARGRAPHIC
ALTER TABLE Compression
The SQL ALTER TABLE command supports adding, changing, or deleting column
compression on one or more existing columns of a table, whether the table is
loaded with data or is empty.
History
Traditionally there has been a trade-off between the desire to store more data and
the cost of storing the additional data. The column compression feature of the
Teradata database has always provided an opportunity to reduce the amount of
data to be physically carried in the database by storing frequently repeating values
in a table header rather than repeating them for every row in the table. The column
compression feature thereby helps improve the trade-off between these competing
requirements, making it less expensive to store more data.
Without the ALTER TABLE Compression feature, when compression requirements
on a table change, it would be necessary to recreate the table with the new
compression requirements specified, and then reload the table. The ALTER TABLE
Compression feature makes it easier to add, change or delete compression
requirements from an existing table without the user having to recreate and reload
the table.
Examples
The following are examples of the ALTER TABLE Compression feature. Note,
whenever the COMPRESS attribute is applied to a nullable column, nulls will always
be compressed by default. If other compression values are specified, they will be
compressed in addition to null compression.
There is an implied sequence in the statements that follow:
Example Set 1:
ALTER TABLE Table1 ADD Col1 COMPRESS;
If the column Col1 exists and is nullable, then the column will be a
compressible column with NULL as the compress value.
If the column Col1 does not exist, then the column is added to the table and
the column will be a compressible column with NULL as the compress value.
ALTER TABLE Table1 ADD Col1 COMPRESS NULL;
Same as previous example.
ALTER TABLE Table1 ADD Col1 COMPRESS Savings;
The column will be compressed for nulls and for the constant value Savings.
If the column is already compressed the constant value Savings will replace
the existing compress value or list of values.
ALTER TABLE Table1 ADD Col1 COMPRESS NULL,Savings, Checking;
The column will be compressed on the specified compress list.
If the column is already compressed, the new compress list will replace the
existing compress value or list of values.
Null will be compressed in either case.
Example Set 2:
ALTER TABLE Table1 ADD Col2 COMPRESS 0;
Column will be compressed for one value - zero.
Nulls will also be compressed if the column is nullable (default is nullable).
ALTER TABLE Table1 ADD Col2 COMPRESS (0, 100, 1000);
Add compressed values (100, 1000).
Note value zero must be restated if this follows the previous ALTER
statement.
ALTER TABLE Table1 ADD Col2 COMPRESS (NULL,0,100,1000,10000);
Adds compressed values 10000.
Only 10000 is added to list since NULL, 0, 100 and 1,000 are already
compressed in prior example.
ALTER TABLE Table1 ADD Col2 COMPRESS (NULL, 0, 100);
Reduces the compression list to null and two values.
Values 1,000 and 10,000 are not compressed.
ALTER TABLE Table1 ADD Col2 NO COMPRESS;
All compressed values are now disabled.
Column is now uncompressed.
Compression Optimizations
ALTER TABLE Compression Optimizations

The following listing shows the mechanics of how compression is physically
implemented via the ALTER TABLE command and the optimizations used.
The table will actually be rebuilt at the time of the execution of the ALTER
TABLE statement.
Space overhead requirement for rebuilding a table is around 2 MB. The table
is rebuilt one cylinder at a time, so no matter how big or small a table is, the
overhead remains the same.
A full duplicate copy of the table is never created nor required.
The ALTER process is restartable via checkpoints in the Transient Journal.
No Rollback process is possible.
If a restore of the original table is desirable this can be accomplished by:

- Re-ALTERing the table with the original compression specifications.
- An archive of the original table followed by a restore.
Considerations and Limitations
Limitations
The ALTER TABLE ADD cname syntax allows certain other attributes to be
included in the same statement as the COMPRESS attribute. The following are
exceptions to this rule:
A column CONSTRAINT cannot be defined at the same time.
A COMPRESS modification with a NULL value in the compress list is

not allowed in conjunction with a NOT NULL attribute change.
Altering a non-compressible column to a compressible column is not

allowed if changing the column to an Identity column at the same
time. These changes may be implemented separately.
Additional Considerations
Compressing columns on which secondary indexes are defined is

allowed, unless the index is the either the PK or the FK of a
Referential Constraint.
An Exclusive lock is required on the table being compressed.
Compression Versus VARCHAR

Character data which has a significant length variability can be also stored using a
VARCHAR data type to save space. VARCHAR stores only the actual value with a twobyte length field in the physical row, while omitting trailing blanks. VARCHAR data
types are not eligible for compression because they are not fixed length.
When debating whether VARCHAR or compression is preferable for a character
column, three factors are to be considered:
The average field length of the character data.
The maximum field length of the character data.
The frequency of occurrence of the compressible values.
The following rule dictates which approach should be favored:

Choose VARCHAR - when the difference between maximum and average field
length is high and the frequency of occurrence is low.
Choose Compression - when the difference between maximum and average field
length is low and the frequency of occurrence is high.

Choose VARCHAR - when there is no clear winner between the two. This is
because VARCHAR uses slightly less CPU resource.
Summary
Multiple Value Column Compression

This feature permits up to 255 values to be compressed for a given column. The
benefits of using this features are:
Decreased usage of disk space.
Increase in table query performance.
ALTER TABLE Compression

This feature of the SQL ALTER TABLE command supports adding, changing, or
deleting column compression on one or more existing columns of a table,
regardless of whether the table is loaded with data or is empty. When ALTER TABLE
is used to change or initiate column compression, the table will be rebuilt internally.
Lab
Try It!
Click on the answer button to the left to see the answers.
1.) Display the table definition for the Customer_Service.accounts table.

2.) How many distinct values of the column 'city' are found in the accounts table
and how many occurrences are there for each value?
3.) How much space is taken by this table currently?
Answers:
Lab 1
Lab 2
Lab 3
Lab 4
Lab 5
Lab 6
Lab 7
4.) Create the table Accounts in your database with compression

specified for the following 'city' values:
'Culver City', 'Hermosa Beach', 'Los
Angeles', 'Santa Monica'
5.) Populate your table with the rows from
Customer_Service.accounts
6.) See how much space your table requires compared to the uncompressed version
in the Customer_Service database, as seen in lab #3.
7.) How many distinct cities, states and zip codes are contained in the accounts
table. Use a single query to answer this question.
7.) Join Indexes and NUSI's

Objectives
Describe the purpose of value-ordered NUSI's and their implementation
Describe the purposes of Join Indexes and their implementation.
Distinguish between single-table and multi-table join indexes.
Add a NUSI to a Join Index.
NUSI Review
Non-Unique Secondary Indexes (NUSI's) are a Teradata index feature which permits
defining non-primary indexes on non-unique columns. Typically, this is done to
improve performance on queries which use the column or columns in the WHERE
clause selection criteria. NUSI's may be created either as a part of the CREATE
TABLE syntax, or they may be created after table creation using CREATE INDEX
syntax. NUSI's may be easily dropped when their presence is no longer needed by
using the DROP INDEX syntax.
Alternative 1 CREATE TABLE Syntax

Create an 'employee' table with a NUSI on the job code.
CREATE SET TABLE employee ,FALLBACK ,

(
employee_number INTEGER,
manager_employee_number INTEGER,
department_number INTEGER,
job_code INTEGER,
last_name CHAR(20) NOT NULL,
first_name VARCHAR(30) NOT NULL,
hire_date DATE FORMAT 'YY/MM/DD' NOT NULL,
birthdate DATE FORMAT 'YY/MM/DD' NOT NULL,
salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number )
INDEX (job_code);
Alternative 2 CREATE INDEX Syntax

Create a NUSI on the job code column for existing 'employee' table.
CREATE INDEX (job_code) ON employee;
Example DROP INDEX Syntax

Drop the NUSI on the job code column of the 'employee' table.
DROP INDEX (job_code) ON employee;

Upon creation of a NUSI, a subtable is built on each AMP. The subtable contains a
row for each NUSI value to be found on this AMP and the row-ids of the associated
base table rows which are co-located on the AMP.

Rows are sequenced in the subtable based on the hash of the NUSI value. While
this is convenient for finding all rows with a particular NUSI value, it is less useful
for doing 'range' searches. For example, the index created here would be useful in
finding all employee rows with a job code of 122100, but less useful in locating all
employee rows whose job code is between 122000 and 123000.
Value Ordered NUSIs
Value Ordered NUSI's allow NUSI subtable rows to be sorted based on a data
value, rather than on a hash of the value. This is extremely useful for range
processing where a sequence of values between an upper and lower limit is desired.

Create an 'employee' table with a value-ordered NUSI on the job code.
CREATE SET TABLE employee ,FALLBACK ,

(
employee_number INTEGER,
job_code INTEGER,
UNIQUE PRIMARY INDEX ( employee_number )
INDEX (job_code) ORDER BY VALUES (job_code);

Create a value-ordered NUSI on the job code column of existing 'employee' table.
CREATE INDEX (job_code) ORDER BY VALUES (job_code) ON employee;

The optimizer may now choose this index to do range searches on job codes.
The NUSI's are automatically maintained by the Teradata database, that is, when a
base table row changes values, any corresponding values in the NUSI subtable are
also changed. It is never necessary to do anything to maintain a secondary index.
You can only create it, drop it and collect statistics on it.
Limitations of Value-Ordered NUSI's
A column defined as a value-ordered index column must be:
A single column
A column which is a part of or all of the index definition
A numeric column non-numerics are not allowed
No greater than four bytes in length INT, SMALLINT, BYTEINT, DATE,

DEC are valid
Note: Although DECIMAL data types are permitted, their storage length must not
exceed four bytes and they cannot have any precision digits.
Index Covering
If a query references only those columns that are contained within a given index,
the index is said to "cover" the query. In these cases, it is often more efficient for
the optimizer to access only the index subtable and avoid accessing the base table
rows altogether.
Covering will be considered for any query that references only columns defined in a
given NUSI. These columns can be specified anywhere in the query including the:
1. SELECT list
2. WHERE clause
3. aggregate functions
4. GROUP BY
5. expressions
The presence of a WHERE condition on each of the indexed columns is not a
guarantee for using the index to cover the query. The optimizer will consider the
appropriateness and cost of 'covering' versus other alternative access paths and
choose the optimal plan.
The potential performance gains from index covering require no user intervention
and will be transparent except for the improved access time. The use of the NUSI
can be validated by reviewing the execution plan returned by EXPLAIN.
Join Index
The Join Index feature provides indexing techniques that can improve the
performance of certain types of queries. The Join Index is a physical structure,
populated with rows that contain columns from one or more tables. Once created, it
becomes an option available to the optimizer but is never directly accessed by the
user.
Its purpose is to aid in the joining of tables by providing needed data from an index
rather than having to access the base rows of the table. By using a join index the
optimizer may be able to avoid having to access or redistribute many individual
tables and their base rows.
The Join Index supports syntax for the following types of indexes:
1. Multiple-table Join Index - Used to pre-join multiple tables
2. Single-table Join Index - Used to rehash and redistribute the rows of a
single table based on a specified column or columns
3. Aggregate Join Index - Used to create an aggregate index to be used as a
summary table
In this section we will be discussing the first two only. The third item, aggregate
indexes, are discussed in a separate module of this training.
Multiple-Table Join Indexes
Multiple-table Join Indexes are used to pre-join two or more tables. Consider the
following tables which are in the 'Student' database.
CREATE TABLE customers

( cust_id INTEGER NOT NULL,
cust_name CHAR(15),
cust_addr CHAR(25) )UNIQUE PRIMARY INDEX ( cust_id );
CREATE TABLE orders
( order_id INTEGER NOT NULL,
order_date DATE FORMAT 'yyyy-mm-dd',
cust_id INTEGER,
order_status CHAR(1)) UNIQUE PRIMARY INDEX ( order_id );
The relationship of these table is demonstrated by the following diagram:
There are 49 orders with valid customers.

There is 1 order that has an invalid customer.
There is 1 valid customer who has no orders.
Query 1 Without Join Index

How many orders have assigned customers?
SELECT COUNT(order_id)
FROM orders
WHERE cust_id IS NOT NULL;
Count(order_id)
--------------50
A join index will not help this query. The 'order' table covers the query.
Query 2 Without Join Index

How many orders have assigned valid customers?
SELECT COUNT(o.order_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id;
Count(order_id)
--------------49
A join index can help this query. Two tables are needed to cover the query.
Creating A Join Index
The following shows the creation of a join index which will improve the performance
of any joins it can cover.
CREATE JOIN INDEX cust_ord_ix

AS SELECT (c.cust_id, cust_name)
,(order_id, order_status, order_date)
ON c.cust_id = o.cust_id
PRIMARY INDEX (cust_id);
The join index is comprised of a 'fixed portion' (first parenthesis) and a 'repeatable
portion' (second parenthesis). This represents a denormalization of the data and
logically looks like the following:
CUST_ID
1001
CUST_NAME
ABC Corp
ORDER_ID ORDER STATUS

501
C
ORDER_DATE
990120
502
990220
1002
BCD Corp
503
990320
504
990420
505
990520
506
990620
507
990122
508
990222
509
990322
Now, let's revisit the same query again.
Query 2 With Join Index

Count(order_id)
--------------49
The join index helps this query because it covers the query and
therefore the
result may be generated without ever accessing the rows of the base
table.
Compare the costs as shown by EXPLAIN.
Without Join Index

With Join Index
.39 secs.
.17 secs.
This represents a 50% decrease in query time because of the join

index.
The join index is automatically maintained by the Teradata RDBMS, that

is,
when a base table row changes values, any corresponding values in the
join index
are also changed.
It is never necessary to do anything to maintain a
join
index.
You can only create it and drop it.
The join index seen above may also be used to cover other queries.
(Repeated For Convenience)

CUST_ID
1001
1002
CUST_NAME
ABC Corp
BCD Corp
ORDER_ID ORDER STATUS

501
C
ORDER_DATE
990120
502
990220
503
990320
504
990420
505
990520
506
990620
507
990122
508
990222
509
990322
Query 3 With Join Index
How many valid customers have assigned orders in January 1999?
SELECT COUNT(c.cust_id)
WHERE o.order_date BETWEEN 990101 AND 990131;
Count(cust_id)
--------------9
Because the 'order_date' column is included in the join index, once

again
this query is covered by it.
Compare the costs as shown by EXPLAIN.
Without Join Index .40 secs.

With Join Index
.17 secs.
This represents a greater than 50% decrease in query time because of

the join index.
Assigning the Primary Index For Join Indexes
Join Indexes are always assigned a primary index in order to hash distribute the
index rows across the AMPs. In the example created here, the Primary Index of the
Join Index is the column Cust_id.
We can explicitly specify the primary index for the Join Index or allow it to default
to the first column specified. In our upcoming look at Single-Table Join Indexes, we
will see the usefullness of being able to choose and specify a primary index on a
Join Index.
CREATE JOIN INDEX cust_ord_ix AS

SELECT (c.cust_id, cust_name)
Value Ordering Join Index Rows

Optionally, you can specify the sequencing of the join index rows on each AMP.
Normally, the rows will be sequenced on each AMP by the hash value of the primary
index for the Join Index. Because this default sequencing limits the efficiency for
doing 'range' processing, an ORDER BY clause is available which allows this default
sequencing to be overridden.

SELECT (c.cust_id, cust_name)
ORDER BY c.cust_id
PRIMARY INDEX (cust_id)
Query 3 With Join Index ORDERed BY cust_id
How many distinct valid customers with customer ids between 1001 and 1005 have
assigned orders?
SELECT COUNT(DISTINCT(c.cust_id))
WHERE c.cust_id BETWEEN 1001 AND 1005;
Count(Distinct(cust_id))
-----------------------5
Because this query accesses a range of customer ids, the optimizer can access the
rows of the Join Index more efficiently because the qualifying rows are already
sequenced by cust_id and thus easily located.
The rules for the ORDER BY column are the same as for Value-Ordered NUSI's.
The ORDER BY column must be:
A single column
A column which is a part of or all of the fixed-portion index definition
A numeric column non-numerics are not allowed
No greater than four bytes in length INT, SMALLINT, BYTEINT, DATE, DEC are
valid
Note: Although DECIMAL data types are permitted, their storage length must not
exceed four bytes and they cannot have any precision digits.
NUSIs On Join Indexes
A NUSI may be created on a join index and may be used to improve access to the
join index rows. In the example just seen, we ordered the rows of the join index by
'cust_id' in order to facilitate 'range' processing on customer numbers.
Because the rows of the join index can only be sequenced by one column, we need
to use another technique to facilitate 'range' processing for the order date.
We can solve this problem by adding a NUSI on the join index and value ordering it
on the order date. NUSI's on join indexes can be built as part of the CREATE JOIN
INDEX statement, or they can be added after join index creation using the CREATE
INDEX statement.

Create the same join index and also create a NUSI on the Join Index for the
'order_date' column. Value order the NUSI on the 'order_date' column.
SELECT (c.cust_id, cust_name),(order_id, order_status, order_date)

ORDER BY c.cust_id /* This ORDER BY controls how the rows of the
Join Index will be sorted on the AMPs */
PRIMARY INDEX (cust_id)
INDEX (order_date) ORDER BY (order_date);
/* This ORDER BY controls how the rows of the
NUSI will be sorted on the AMPs */ ;

Create a NUSI on the existing join index 'cust_ord_ix'.
CREATE INDEX (order_date) ORDER BY VALUES (order_date) ON

cust_ord_ix;
Note: The keyword VALUES is optional.
Single-Table Join Indexes
Single-Table Join Indexes are created to rehash and redistribute the rows of a
table by a column other than the Primary Index column. The redistributed index
table may be a subset of the columns (vertical subset) of the base table. It can
significantly reduce the costs associated with doing a table redistribution for join
processing.
In building join plans for two tables, the optimizer must first decide how to insure
that all joinable rows are co-located on the same AMP. If both tables are being
joined on their respective Primary Index columns, the joinable rows are already colocated on the same AMP, thus no redistribution of data is needed. If either table is
not using its Primary Index columns as the join column(s), then a redistribution
must occur.
Single-Table Join Indexes provide the ability to 'pre-distribute' the rows of a
table based on the hash of the join value. This will eliminate the need for the
optimizer to require a redistribution to perform the join - it can take advantage of
the already distributed rows of the Single-Table Join Index.
Consider the two tables 'employee' and 'department'.
CREATE SET TABLE employee ,FALLBACK

( employee_number INTEGER,
job_code INTEGER,
UNIQUE PRIMARY INDEX ( employee_number );
CREATE TABLE department, FALLBACK

(department_number SMALLINT
,department_name CHAR(30) NOT NULL
,budget_amount DECIMAL(10,2)
,manager_employee_number INTEGER )
UNIQUE PRIMARY INDEX (department_number);
Assume we would like to perform the following query.
Query 4
Select all employee numbers, their department number and department name.
SELECT e.employee_number
,d.department_number
,d.department_name
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Joining these two tables on the 'department_number' column might require a
redistribution of the rows of the employee table. The department table is already
distributed based on the PI column 'department_number', but the employee table is
distributed on the PI column 'employee_number'. Depending on the size of the
tables, the redistribution can become a costly operation.
One possible technique to expedite this join would be to create a Single-Table Join
Index on the employee table as follows:
CREATE JOIN INDEX emp_deptno

AS SELECT employee_number, department_number
FROM employee
PRIMARY INDEX (department_number);
Executing Query 4 again would give the optimizer the opportunity to use the join
index and thus enable it to avoid the cost of redistributing the rows of the employee
table. If we EXPLAIN the query, we can see that the join index was indeed used.
Query 4 (Explained)
Select all employee number, their department number and department name.
EXPLAIN SELECT e.employee_number

,d.department_number
,d.department_name
(Partial Listing)
4. We do an all-AMPs JOIN step from PED.d by way of a RowHash match scan
with no residual conditions, which is joined to PED.emp_deptno. PED.d and
PED.emp_deptno are joined using a merge join, with a join condition of
("PED.emp_deptno.department_number = PED.d.department_number").
The result goes into Spool 1, which is built locally on the AMPs.' with low
confidence to be 24 rows. The estimated time for this step is 0.18 seconds.
Summary
The following are index options available for query performance enhancement which
we have seen in this module.
Hash-ordered NUSI's traditional NUSI's
Value-ordered NUSI's to facilitate range seaches on the index value
Single-table join indexes to pre-hash-distribute rows of one table to colocate with joinable rows
Multi-table join indexes to pre-join existing table rows from multiple

tables
Join indexes may be further enhanced by applying the following features to them:
Define the PI of the join index to distribute the join index rows most
effectively
Define the ordering of the join index rows to sequence the join index rows
Define a NUSI on the join index to access rows in the join index more
effectively
(Multi-table join index only)
Define the ordering of the NUSI rows to sequence the NUSI rows, either
hash or value-based
(Multi-table join index only)
Lab
Try It!
To start the online labs, click on the Telnet button in the lower left
hand screen of the course. Two windows will pop-up: a BTEQ
Instruction Screen and your Telnet Window. Sometimes the BTEQ
Instructions get hidden behind the Telnet Window. You will need
these instructions to log on to Teradata.
your own user id (i.e. DATABASE tdxxx;).
Answers:
Lab A.1
Lab A.2
Lab A.3
Lab B.1
Lab B.2
Lab B.3
Lab B.4
Lab C.1
Lab C.2
A.1 Make copies in your own database of the city and state tables
which are located in the Student database. You may accomplish this
by using the following SQL:
CREATE TABLE state AS Student.state WITH DATA ;
CREATE TABLE
city AS Student.city WITH DATA
Look at the table definitions for the City table and the State table.
Construct a query which returns the following information by inner
joining these two tables. Order the results by city population within
state population.
City Name
...
City Population
...
State Name
...
State Population
...
A.2 Create a Join Index in your own database called citystateidx. The
fixed potion of the index should contain the state name and the state
population. The variable portion should contain the city name and
the city population.
A.3 Now rerun the query to see if you get the same result. If you do,
EXPLAIN the query to see if your Join Index was used.
B.1 Drop the Join Index citystateidx.
B.2 Modify the query to only show states whose population is
between one and three million. Run the query.
B.3 Recreate the Join Index, however this time insure that the index
rows will be sorted by the state population column.
B.4 Rerun the query to insure the same results, then EXPLAIN the
query to determine if the Join Index was used.
C.1 Add a NUSI on the Join Index on the city population column.
Value order the NUSI by city population.
C.2 Rerun the query to insure the same results, then EXPLAIN the
query to determine if the NUSI was used.
8.) Aggregate Join Indexes
Objectives
Upon completion of this module, you should be able to:
Create an aggregate join index.
Determine when an aggregate join index is advantageous.
Determine when an aggregate join index is being used by the optimizer.
Join Index Review
A Join Index is an optional index which may be created by the user for one of the
following three purposes:
Pre-join multiple tables
Distribute the rows of a single table on the hash value of a foreign key value
Aggregate one or more columns of a single table or multiple tables into a

summary table
The first two listed purposes are covered in an earlier module of this training
program.
In this module, we will concentrate on the last of the three purposes aggregating
columns into a join index that the optimizer may choose to use as a summary table.
Why An Aggregate Index
Summary Tables
Queries which involve counts, sums, or averages over large tables require
processing to perform the needed aggregations. If the tables are large, query
performance may be affected by the cost of performing the aggregations.
Traditionally, when these queries are run frequently, users have built summary
tables to expedite their performance. While summary tables do help query
performance there are disadvantages associated with them as well.
Summary Tables Limitations
Require the creation of a separate table
Require initial population of the table
Require refresh of changing data, either via update or reload
Require queries to be coded to access summary tables, not the base tables
Allow for multiple versions of the truth when the summary tables are not upto-date
Aggregate Indexes
Aggregate indexes provide a solution that enhances the performance of the query
while reducing the requirements placed on the user. All of the above listed
limitations are overcome with their use.
An aggregate index is created similarly to a join index with the difference that
sums, counts and date extracts may be used in the definition. A denormalized
summary table is internally created and populated as a result of creation. The index
can never be accessed directly by the user. It is available only to the optimizer as a
tool in its query planning.
Aggregate indexes do not require any user maintenance. When underlying base
table data is updated, the aggregate index totals are adjusted to reflect the
changes. While this requires additional processing overhead when a base table is
changed, it guarantees that the user will have up-to-date information in the index.
Aggregate Index Properties
Aggregate Indexes are similar to other Join Indexes in that they are:
Automatically kept up to date without user involvement.
Never accessed directly by the user.
Optional and provide an additional choice for the optimizer.
Multiload and Fastload may not be used to load tables for which join indexes
are defined.
Aggregate Indexes are different from other Join Indexes in that they:
Use the SUM and COUNT functions.
Permit use of EXTRACT YEAR and EXTRACT MONTH from dates.
You must have one of the following two privileges to create any join index:
DROP TABLE rights on each of the base tables (or the containing database)
or,
INDEX privilege on each of the base tables
Additionally, you must have this privilege:
CREATE TABLE on the database or user which will own the join index
The following table will be used in the subsequent examples:
CREATE SET TABLE PED.daily_sales ,NO FALLBACK ,

NO BEFORE JOURNAL,
NO AFTER JOURNAL
(itemid INTEGER
,salesdate DATE FORMAT 'YY/MM/DD'
,sales DECIMAL(9,2))
PRIMARY INDEX ( itemid );
Without An Aggregate Index
Consider the following problem, without the use of an aggregate index.

Example
SELECT EXTRACT(YEAR FROM salesdate)AS Yr

, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND Yr IN ('2009', '2010')
GROUP BY 1,2
ORDER BY 1,2;
Yr
Mon Sum(sales)
----------- ----------- ----------2009
1
2150.00
2009
2
1950.00
2009
8
1950.00
2009
9
2100.00
2010
1
1950.00
2010
2
2100.00
2010
8
2200.00
2010
9
2550.00
EXPLAINing Without An Aggregate Index
Explaining the previous query shows us that this is a primary index access against
the 'daily_sales_2004' table. (Note that because the cost of aggregation is not
calculated, no final cost for the query is generated.)
EXPLAIN SELECT EXTRACT(YEAR FROM salesdate)AS Yr

, SUM(sales)
GROUP BY 1,2
ORDER BY 1,2;
Explanation
1. First, we do a SUM step to aggregate from PED1.daily_sales_2010 by way

of the primary index "PED1.daily_sales_2010.itemid = 10" with a residual
condition of ("((EXTRACT(YEAR FROM
(PED1.daily_sales_2010.salesdate )))= 2009) OR ((EXTRACT(YEAR FROM
(PED1.daily_sales_2010.salesdate )))= 2010)"), and the grouping identifier
in field 1. Aggregate Intermediate Results are computed locally, then placed
in Spool 2. The size of Spool 2 is estimated with high confidence to be 1 to 1
rows.
2. Next, we do a single-AMP RETRIEVE step from Spool 2 (Last Use) by way of
the primary index "PED1.daily_sales_2010.itemid = 10" into Spool 1, which
is built locally on that AMP. Then we do a SORT to order Spool 1 by the sort
key in spool field1. The size of Spool 1 is estimated with high confidence to
be 1 row. The estimated time for this step is 0.17 seconds.
3. Finally, we send out an END TRANSACTION step to all AMPs involved in
processing the request. -> The contents of Spool 1 are sent back to the user
as the result of statement 1.
Creating An Aggregate Index
Creating a join index gives the optimizer the option of using the 'pre-aggregated'
information kept in the index, thus avoiding the need to generate a separate
aggregation step.
CREATE JOIN INDEX monthly_sales AS

SELECT itemid AS Item
,EXTRACT(YEAR FROM salesdate) AS Yr
,EXTRACT(MONTH FROM salesdate) AS Mon
,SUM(sales) AS SumSales
GROUP BY 1,2,3;
If we run the exact query as previously, with no changes, we will get the same
result as before, however it will take advantage of the aggregate index.
SELECT EXTRACT(YEAR FROM salesdate)AS Yr

, SUM(sales)
WHERE itemid = 10 AND Yr IN ('2009','2010')
GROUP BY 1,2
ORDER BY 1,2;
Yr
Mon
Sum(sales)
--- -----------2009
1
2150.00
2009
2
1950.00
2009
8
1950.00
2009
9
2100.00
2010
1
1950.00
2010
2
2100.00
2010
8
2200.00
2010
9
2550.00
EXPLAINing The Use Of Aggregate Index
Explaining the previous query shows us that this time the aggregate index is
employed.
EXPLAIN SELECT EXTRACT(YEAR FROM salesdate)AS Yr

, SUM(sales)
GROUP BY 1,2
ORDER BY 1,2;
Explanation
1. First, we do a SUM step to aggregate from join index table

PED1.monthly_sales by way of the primary index
"PED1.monthly_sales.Item = 10", and the grouping identifier in field 1.
Aggregate Intermediate Results are computed locally, then placed in Spool
2. The size of Spool 2 is estimated with low confidence to be 4 to 4 rows.
2. Next, we do a single-AMP RETRIEVE step from Spool 2 (Last Use) by way of
the primary index "PED1.monthly_sales.Item = 10" into Spool 1, which is
built locally on that AMP. Then we do a SORT to order Spool 1 by the sort
key in spool field1. The size of Spool 1 is estimated with low confidence to
be 4 rows. The estimated time for this step is 0.17 seconds.
3. Finally, we send out an END TRANSACTION step to all AMPs involved in
processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.
Because the aggregations are already calculated and available in the index, the
costs associated with step one are reduced. The cost of step two is unchanged
(0.17).
Because aggregation costs are not currently carried in EXPLAIN text, the savings in
processing time for step one are not shown, however the response time reduction
for the user can and should be substantial.
SHOWing The Aggregate Index
Join index definitions may be seen using the SHOW JOIN INDEX construct.
Example
Show the aggregate index named monthly_sales;
SHOW JOIN INDEX monthly_sales;

CREATE JOIN INDEX PED1.monthly_sales ,NO FALLBACK AS
SELECT COUNT(*)(FLOAT, NAMED CountStar )
,PED1.daily_sales_2010.itemid
(NAMED Item )
,EXTRACT(YEAR FROM (PED1.daily_sales_2010.salesdate))
(NAMED Yr )
,EXTRACT(MONTH FROM (PED1.daily_sales_2010.salesdate))
(NAMED Mon )
,SUM(PED1.daily_sales_2010.sales )(NAMED SumSales )
FROM PED1.daily_sales_2010
GROUP BY 2,3,4
PRIMARY INDEX ( Item );
In showing an index definition, some changes should be noted:
All column names are fully qualified to the database level.
A count of rows is automatically added if not specified in the definition, thus

supporting count aggregations.
This can be seen in the SHOW result - (COUNT(*) NAMED CountStar).
If both COUNT and SUM are in the index, AVERAGE calculations may also
make use of the index.
Covering The Query - Example 1
Ultimately, just as with any join index, it is the optimizer's choice whether or not
the index is useful for a specific query. The index created previously is repeated
here for convenience.
CREATE JOIN INDEX monthly_sales AS

SELECT itemid AS Item
,EXTRACT(YEAR FROM salesdate) AS Yr
,EXTRACT(MONTH FROM salesdate) AS Mon
,SUM(sales) AS SumSales
GROUP BY 1,2,3;
Any query against the 'daily_sales_2010' table which requests an aggregation in
any of the following formats, will use the 'monthly_sales' index.
Sum of sales by year
Sum of sales by month
Sum of sales by month within year
Grand total sum of sales
An index is said to 'cover' the query (or cover part of the query) if the optimizer can
generate the query results using the index as a replacement for one or more of the
specified tables.
Example
Show the grand total sales for item 10 as contained in the daily_sales_2010 table.
SELECT itemid
,SUM(sales)
WHERE itemid = 10
GROUP BY 1;
itemid
Sum(sales)
---------------10
16950.00
EXPLAIN SELECT itemid
,SUM(sales)
WHERE itemid = 10
GROUP BY 1;
Explanation (Partial)
1. First, we do a SUM step to aggregate from join index table PED1.monthly_sales

by way of the primary index "PED1.monthly_sales.Item = 10" with no residual
conditions, and the grouping identifier in field 1. Aggregate Intermediate Results
are computed locally, then placed in Spool 2. The size of Spool 2 is estimated
with high confidence to be 1 to 1 rows.
Note: This is an example of the index 'monthly_sales' covering the query.
Covering The Query - Example 2
Show the total sales for item 10 for 2010.
SELECT itemid
,SUM(sales)
WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010'
GROUP BY 1;
itemid Sum(sales)
------- --------10
8800.00
EXPLAIN SELECT itemid
,SUM(sales)
WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010'
GROUP BY 1;
Explanation (Partial)
1. First, we do a SUM step to aggregate from join index table

PED1.monthly_sales by way of the primary index
"PED1.monthly_sales.Item = 10", and the grouping identifier in field 1.
Aggregate Intermediate Results are computed locally, then placed in Spool
2. The size of Spool 2 is estimated with high confidence to be 1 to 1 rows.
Aggregate Indexes With Functions
Aggregate indexes are not used in conjunction with queries using SUM Window,
COUNT Window, WITH or WITH BY functions. Because these functions must process
and display all qualifying detail rows, the value of the aggregate index is reduced.
Explaining any query using these functions will validate that the index is not used.
Lab
Try It!

Answers:
Lab 1
Lab 2
Lab 3
Lab 4
Lab 5
1.) Prior to doing this lab exercise, it will be necessary to recreate a

copy of the employee table in your user space. Accomplish this with
the following commands:
DATABASE tdxxx;
CREATE TABLE employee AS Customer_Service.employee WITH
DATA;
Create an aggregate index called 'dept_sals' which sums all of the

salaries of the employees in the employee table by department.
2.) Write a query which shows each department number and the sum
of salaries for that department. Order results by department number.
3.) Explain the query in lab #2. Does it use the aggregate index?
4.) Modify the query in lab #2 and add a column which shows the
average salary in each department. Call this column 'Avg_Sal'
5.) Explain this query and note if the aggregate index was used.
9.) Hash Indexes
Objectives
After completing this module, you should be to:
Recognize the performance advantage of using hash indexes.
Create and implement hash indexes.
Determine when hash indexes are used by a query.
Indexes Revisited
In the next few pages, we will be looking at Hash Indexes and their properties.
Because they share in common many attributes of secondary indexes and join
indexes, let's first review the basics of secondary indexes and join indexes.
Secondary Indexes
Secondary indexes are defined to provide alternate access pathways to the base
rows of a single table. Users may define secondary indexes, but they cannot be
accessed directly by the user, nor can the user affect how the index rows are
distributed. Their use or non-use is a option to the optimizer in its query planning.
The following are properties of secondary indexes:
They contain pointers to the base rows of the table
Are always defined on a single table
Can 'cover' certain queries, but their primary purpose is locating base rows
Secondary Indexes exist in two formats:
Unique Secondary Index (USI) - there is a one-to-one relationship

between the index rows and the base table rows.
Non-Unique Secondary Index (NUSI) - there is a one-to-many

relationship between the index rows and the base table rows. NUSI rows
may be either 'hash' or 'value' ordered.
Join Indexes
Join indexes are defined to reduce the number of rows processed in generating
result sets from certain types of queries, especially joins. Like secondary indexes,
users may not directly access join indexes. They are an option available to the
optimizer in query planning. The following are properties of join indexes:
Are used to replicate and 'pre-join' information from several tables into a
single structure.
Are designed to cover queries, reducing or eliminating the need for access to
the base table rows.
Usually do not contain pointers to base table rows (unless user defined to do
so).
Are distributed based on the user choice of a Primary Index on the Join
Index.
Permit Secondary Indexes to be defined on the Join Index (except for Single
Table Join Indexes), with either 'hash' or 'value' ordering.
Join Indexes exist in three general formats:
Single Table Join Index (STJI) o
Defined on a single table, usually for the purpose of redistributing the

table rows based on the hash value of a foreign key column (or
columns).
Facilitates the ability to join the foreign key table with the primary
key table.
(Multi-Table) Join Indexes (JI) o
A join index which contains 'pre-joined' data from two or more tables.
Facilitates join operations by reducing or eliminating the need to

redistribute and join base table rows.
Aggregate Join Index (AJI) o
A join index which contains an aggregation operator such as COUNT or

SUM.
Facilitates aggregation queries wherein the pre-aggregated values

contained in the AJI may be used instead of relying on base table
calculations
Hash Index Definition
Hash Indexes are database objects that are user-defined for the purpose of
improving query performance. They are file structures which contain properties of
both secondary indexes and join indexes.
Like Secondary Indexes

Hash Indexes are similar to secondary indexes in the following ways:
They are created for a single table only.
They contain information which allows access to base table rows.
The CREATE syntax is very similar to a secondary index.
They may sometimes cover a query without use of the base table rows.
Like Join Indexes

Hash Indexes are similar to join indexes in the following ways:
They 'pre-locate' joinable rows to a common location.
The distribution and sequencing of the rows are user specified.
They are very similar to single-table join indexes (STJI), however with added
functionality.
Unlike Join Indexes

Hash Indexes are unike join indexes in the following ways:
No aggregation operators are permitted.
They are always defined on a single table.
Automatically contains base table PI value as part of the hash index subtable
row.
Contains additional information needed to locate the base table row (e.g.
uniqueness value).
No secondary indexes may be built on the hash index.
Note:
All indexes, whether secondary, join or hash, are automatically updated by

the system when the underlying table rows are changed.
Hash Index Examples
Creating Hash Indexes

Example 1
Consider the following Hash Index definition:
CREATE HASH INDEX hash_1

(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY HASH (employee_number);
This index is built for the table 'emp1' which is defined as follows:
CREATE SET TABLE emp1

(employee_number
INTEGER
, manager_employee_number INTEGER
, department_number
INTEGER
, job_code
INTEGER
, last_name
CHAR(20) NOT NULL
, first_name VARCHAR(30) NOT NULL
, hire_date
DATE NOT NULL
, birthdate
DATE NOT NULL
, salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );
Points to consider about this hash index definition:
Each hash index row contains the employee number, the department
number.
Specifying the employee number is unnecessary, since it is the primary

index of the base table and will therefore be automatically included.
The BY clause indicates that the rows of this index will be distributed by the
employee_number hash value.
The ORDER BY clause indicates that the index rows will be ordered on each
AMP in sequence by the employee_number hash value.
Example 2
The same hash index definition could have been abbreviated as follows:

(employee_number, department_number) ON emp1;
This is essentially the same definition because of the defaults for hash indexes.
The BY
The ORDER BY clause defaults to the order of the base table rows.
clause defaults to the primary index of the base table.
Hash Index Definition Rules

There are two key rules which govern the use of the BY and ORDER BY clauses:
The column(s) specified in the BY clause must be a subset of the columns which
make up the hash index.
When the BY clause is specified, the ORDER BY clause must also be specified.
Covered Query
The following is an examply of a simple query which is covered by this index:
SELECT employee_number, department_number FROM emp1;
Normally, this query would result in a full table scan of the employee table. With the
existence of the hash index, the optimizer can pick a less costly approach, namely
retrieve the necessary information directly from the index rather than accessing the
lengthier (and costlier) base rows.
Consider the explain of this query:
EXPLAIN SELECT employee_number, department_number FROM emp1;
1) First, we lock a distinct TD000."pseudo table" for read on a
RowHash to prevent global deadlock for TD000.hash_1.
2) Next, we lock TD000.hash_1 for read.
3) We do an all-AMPs RETRIEVE step from TD000.hash_1 by way of an allrows scan with no residual conditions into Spool 1, which is built
locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 8 rows. The estimated time for this step is 0.15
seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request. -> The contents of Spool 1 are sent back to
the user as the result of statement 1. The total estimated time is
0.15 seconds.
Example 3
The following is an alternate definition of the hash index 'hash_1'.
BY (employee_number)
ORDER BY VALUES(employee_number);
Points to consider about this hash index definition:
This definition produces the same hash index, however the index rows are
ordered based on employee_number value rather than the hash value.
This might be more useful for certain 'range processing' queries.
This definition would be equally helpful in covering the query indicated
previously. The order of index rows would be of no significance.
Hash Indexes and Joins
Creating Hash Indexes For Joins

Example 1
Consider the following Hash Index definition:
BY (department_number)
ORDER BY HASH (department_number);
This hash index is to be used for the purpose of facilitating joins between the
'employee' and 'department' tables, based on the PK/FK relationship on
'department_number'.
EXPLAIN SELECT employee_number, department_name
FROM emp1 e INNER JOIN dept1 d
1) First, we lock a distinct TD000."pseudo table" for read on a
RowHash to prevent global deadlock for TD000.hash_2.
2) Next, we lock a distinct TD000."pseudo table" for read on a RowHash
to prevent global deadlock for TD000.d. 3) We lock TD000.hash_2 for
read, and we lock TD000.d for read.
4) We do an all-AMPs JOIN step from TD000.hash_2 by way of a RowHash
match scan. with no residual conditions, which is joined to TD000.d.
TD000.hash_2 and TD000.d are joined using a merge join, with a join
condition of ("TD000.hash_2.department_number =
TD000.d.department_number"). The result goes into Spool 1, which is
built locally on the AMPs. The size of Spool 1 is estimated with low
seconds.
0.18 seconds.
Points to consider about the effect of the hash index on this join plan:
No redistribution of emp1 rows is needed.
No sort of emp1 rows is needed.
The merge join step (#4) is able to proceed directly after lock steps.
In this case, the hash index functions much the same as a single table join
index (STJI).
Hash Indexes and ROWID Pointers
Using the ROWID in Hash Indexes

Because the primary index is automatically carried in a Hash Index (as is the
uniqueness value associated with the base row-id), the system many easily
calculate the row-id of the base row. This permits columns values not explicitly
contained in the hash index definition to be accessed and returned as part of a
covered query.
Example 1
Consider again the following Hash Index definition:
Perform the same join on the two tables, however this time add the column 'jobcode' to the SELECT. Note, this column isn't part of the hash index.
EXPLAIN
, d.department_name
, e.job_code
:
:
5) We do an all-AMPs JOIN step from TD000.d by way of a RowHash match
scan. with no residual conditions, which is joined to TD000.hash_2.
TD000.d and TD000.hash_2 are joined using a merge join, with a join
duplicated on all AMPs. The size of Spool 2 is estimated with low
seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to TD000.e. Spool 2 and TD000.e are
joined using a product join, with a join condition of ( "(Field_2 =
TD000.e.employee_number) AND (Field_3 = (SUBSTRING(TD000.e.RowID FROM
5 FOR 4 )))"). The result goes into Spool 1, which is built locally on
the AMPs. The size of Spool 1 is estimated with low confidence to be 8
rows. The estimated time for this step is 4.92 seconds.
5.10 seconds. )
Points to consider about this query plan:
The hash index is used in place of the employee table
No redistribution of rows is needed.
Job-code values are returned by using the ROWID pointer to the base row
(Step #6).
Row hash locks are used to access the base rows of employee table.
This plan assumes that both tables are fairly large.
A similar effect could have been achieved with a single table join index (STJI) by
adding an explicit ROWID to the index definition as follows:
CREATE JOIN INDEX ji_emp AS SELECT employee_number, department_number,
ROWID FROM emp1;
The following page lists advantages of Hash Indexes over STJI's.
Hash Index Advantages and Limitations
Hash Index Advantages

Hash indexes, by comparison, have the following advantages over Single Table Join
Indexes.
They automatically contain the primary index of the base table.
They automatically contain ancillary information (such as uniqueness

number) needed to calculate the row-id of the base table row.
Their syntax is similar to secondary index syntax, thus simpler.
They are automatically compressed for storage.
Hash Index Limitations

The following are limitations of using Hash Indexes:
A total maximum of 32 hash indexes, join indexes and secondary indexes

can be associated with a table.
A hash index can consist of no more than 16 columns.
Hash indexes are not supported with the following Teradata features and
utilities:
o
Multiload
Fastload
Archive/Recovery
Triggers
Permanent Journal
Upsert Processing
Lab
Try It!
Lab 1
1.) Prior to doing this lab exercise, it will be necessary to recreate

two tables in your user space. Accomplish this with the following
commands:
CREATE TABLE loc1 AS
Customer_Service.location WITH DATA;
CREATE TABLE loc_emp1 AS Customer_Service.location_employee WITH
DATA;
1a.) Create a hash index which will facilitate joins between the 'loc1'
and 'loc_emp1' tables. The hash index should:
- Contain the columns 'employee_number' and 'location_number'.

- Be distributed based on the hash of location_number.
- Be ordered based on the hash of location_number.
1b.) Once the hash index is successfully created, execute the
following join. EXPLAIN the join to see if the hash index was used.
SELECT l.location_number
, employee_number
, customer_number
FROM loc1 l INNER JOIN loc_emp1 l_e
ON l.location_number = l_e.location_number
ORDER BY 3;
10.) Materialized Views

Objectives
After completing this module, you should be able to do the following:
Implement a materialized view as a partially covered join index
Implement a materialized view as a sparse join index
Materialized Views and Hash Index Review
What Are Materialized Views?

Materialized views refer to precomputing and maintaining query results in a
database management system. In the Teradata database, materialized view
features are based on a range of capabilities built upon the existing join index
technology. Said differently, Teradata implements materialized views as join
indexes. Before looking at the materialized view features available with Teradata, it
will be helpful to review the concept of a hash index.
Hash Indexes
First, let's review a little bit about hash indexes as seen in a previous section.
Hash indexes, by definition are defined on a single table. This hash index 'hash_2'
can be useful to the optimizer in handling the following query:

SELECT employee_number, department_name
This query is partially covered by the hash index (HI). Partially covered means that
the optimizer cannot resolve the query with the hash index alone. It must bring in
another table - in this case the department table - in order to retrieve the
department_name column. Because the index is ordered on the hash of
department number, it can join the HI directly to the department table based on
same primary indexes (PI)s.
The join step of the EXPLAIN of this query is seen here:
4) We do an all-AMPs JOIN step from TD000.hash_2 by way of a RowHash
match scan. with no residual conditions, which is joined to TD000.d.
TD000.hash_2 and TD000.d are joined using a merge join, with a join
built locally on the AMPs. The size of Spool 1 is estimated with low
seconds.
As can be seen, this is simply a join between the HI and the department table. The
optimizer picked this plan because it did not have to prepare either side of the join the rows are already in department number hash sequence for both the HI and the
department table. Accessing the employee table was not necessary to resolve this
query.
Join Backs
Note that the following query, which additionally selects the job_code column, is
also able to use the HI. This is due to the availability of the ROWID which is
implicitly included in all hash indexes. The implicit ROWID allows the optimizer to 'join
back' to the base employee row to pick up additional information (i.e., job_code), not
available in the HI itself.
, d.department_name
, e.job_code
5) We do an all-AMPs JOIN step from TD000.d by way of a RowHash match
scan. with no residual conditions, which is joined to TD000.hash_2.
TD000.d and TD000.hash_2 are joined using a merge join, with a join
duplicated on all AMPs. The size of Spool 2 is estimated with low
seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to TD000.e. Spool 2 and TD000.e are
joined using a product join, with a join condition of ( "(Field_2 =

TD000.e.employee_number) AND (Field_3 = (SUBSTRING(TD000.e.RowID
FROM 5 FOR 4 )))"). The result goes into Spool 1, which is built
locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 8 rows.
The estimated time for this step is 4.92 seconds.
Step 5 joins the HI to the department table, and step 6 uses the implicit ROWID in the HI
to locate the base row for each employee to extract its job code. We consider the HI to
partially cover the query because it still requires information from another table.
A limitation of HI's is that they are by definition single table indexes.
Partial Covering Join Indexes
Multi-Table Partial Covering Join Indexes

We can also create a join index on multiple tables with the same 'partial covering'
capability of hash indexes.
CREATE JOIN INDEX ji_emp_dept1 AS
, d.department_name
Note that the following query is the same query we executed before, however in
this case, the join index completely covers the query.
EXPLAIN SELECT employee_number, department_name
1) First, we lock a distinct SQL00."pseudo table" for read on a
RowHash to prevent global deadlock for SQL00.ji_emp_dept1.
2) Next, we lock SQL00.ji_emp_dept1 for read.
3) We do an all-AMPs RETRIEVE step from SQL00.ji_emp_dept1 by way of
an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The size of Spool 1
is estimated with low confidence to be 40 rows. The estimated time for
this step is 0.06 seconds.
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.06 seconds.
Note how simple the EXPLAIN becomes in this case. The query is completely
covered by a scan of the join index. No join is needed to get the answer set. The
real work of getting the answer set all occurs in step 3 of the EXPLAIN text.
Now create a second join index which includes the ROWID for the employee table.
, d.department_name
, e.ROWID
Now in addition to selecting the employee number and department name, let's add
the job code to the select list. This column exists on the employee table and is not
an explicit part of the join index.
SELECT e.employee_number, d.department_name, e.job_code
The optimizer may choose to cover this query using the join index to acquire the
employee number and the department name, and it may use the rowid of the
employee table to 'join back' to the base employee row to acquire the job code.
Thus, it works similarly to a hash index, however the join index has the added
property of being defined on multiple tables. This provides the oppotunity to 'join
back' to either table, assuming both ROWID's are specified in the join index
definition.
It is still considered a 'partial covering' index because it still had to join back to the
employee table to fully resolve the query, but it did not have to scan the entire
employee table.
The join back capability provided by the ROWID syntax is typically chosen by the
optimizer when the number of rows in the table is fairly large. Smaller table joins
may not demonstrate this approach.
Multiple Join Back Capability
Specifying Multiple Rowids In a Join Index

We could also have coded the join index as follows:
, d.department_name
, e.ROWID AS e_rowid
, d.ROWID AS d_rowid
Now we are including the rowid for both tables, thus giving the optimizer the ability
to join back in either direction. Thus, a query like the following may use the join
index to join back to the department table this time, to pick up the department
manager's number.
, d.department_name
, d.manager_employee_number
Note, to use this option, aliases must be assigned to each rowid column selected in
the join index definition. If the columns are not 'renamed' using an alias, the
syntaxer will not allow more than one column named 'ROWID' in the same query.
Sparse Join Indexes
Definition of Sparse Join Index

A Spare Index is a Join Index with a WHERE clause which restricts the participating
rows from the base tables. A Sparse Index can significantly reduce the size of the
join index which must be built and maintained by the system.
A sparse index makes sense when a definable subset of the rows in the underlying
tables are needed to satisfy a large percentage of the queries which will use it.
For example, a join index might be defined on a join of two tables, one a history
table containing 5 years of order data and the other a table providing the customer
name. Since the history table might be large, maintenance of this join index might
be costly. Furthermore, if 90% of the queries are typically requesting information
on the current year, it might make sense to create a sparse index with entries for
that year only. Queries for the remaining four years could access the base tables
since their frequency is much lower.
Example of reasons for creating a Sparse Index, might include:
Tables with lots of nulls which are ignored for purposes of most queries
Tables with frequent access for rows which contain quantities above or below
a certain limit.
Tables which are time oriented and where the most frequent accesses are for
current information.
Example
Create a non-sparse join index between the customers and orders tables.
CREATE JOIN INDEX cust_ord_ix

Count(order_id)
--------------86
This query represents a simple join of the two tables to produce an aggregation.
The defined join index is able to resolve this query as seen in the extract of the
EXPLAIN seen here.
3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of
an all-rows scan with no residual conditions, and the
grouping identifier in field 1. Aggregate Intermediate Results are computed
globally, then placed in Spool 4. The size of Spool
4 is estimated with high confidence to be 1 row. The estimated time for this
step is 0.08 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an allrows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with high confidence to be 1
row. The estimated time for this step is 0.03
seconds.
Now, let's try another query, this time restricting the time interval.
How many valid customers have assigned orders in January 2002?
WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-0131');
Once again, we see that the join index is able to cover the query. The number of
rows participating is however, expected to be much fewer.
3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of
an all-rows scan with a condition of (
"(SQL00.cust_ord_ix.order_date <= DATE '2009-01-31') AND
(SQL00.cust_ord_ix.order_date >= DATE '2009-01-01')"), and the
grouping identifier in field 1. Aggregate Intermediate Results are computed
globally, then placed in Spool 4. The size of Spool
4 is estimated with high confidence to be 1 row. The estimated time for this
step is 0.06 seconds.
Since we anticipate that most of the queries against this table will involve rows
from the year 2009, we may wish to create a sparse index with only those rows
represented in the index.
Creating a Sparse Join Index
The following creates a new sparse join index which only includes rows for the year
2009.
CREATE JOIN INDEX cust_ord_ix_2009

WHERE EXTRACT (YEAR FROM order_date) = '2009'
Now, we can run the same query again and see if the optimizer uses the sparse
index.
WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-0131');
Count(order_id)
--------------86
3) We do an all-AMPs SUM step to aggregate from
SQL00.cust_ord_ix_2009 by way of an all-rows scan with a condition
of ("(((EXTRACT(YEAR FROM
(SQL00.cust_ord_ix_2009.order_date )))< )))= 2009) AND
((EXTRACT(MONTH FROM <= 1 ))) AND ((EXTRACT(YEAR
FROM (SQL00.cust_ord_ix_2009.order_date )))>= 2009)"), and the are
computed globally, then placed in Spool 4. The size of Spool
time for this step is 0.08 seconds.
Any query for the year 2009 which is covered by the sparse index will be optimized
to use the sparse index instead of the base tables.
Example
SELECT c.cust_id, cust_name, order_id, order_status, order_date

WHERE c.cust_id > 600
AND EXTRACT (YEAR FROM order_date) = '2009'
AND EXTRACT (MONTH FROM order_date) IN (2,3);
cust_id
order_date
-------------------1009
08
1004
17
1008
04
cust_name
---------------
order_id
order_status
-----------
------------
YZA Corp
648
2009-02-
JKL Corp
620
2009-02-
VWX Corp
645
2009-03-
1005
MNO Corp
627
2009-03-
1006
PQR Corp
633
2009-03-
1009
YZA Corp
649
2009-03-
1004
JKL Corp
621
2009-03-
1008
VWX Corp
644
2009-02-
1005
MNO Corp
626
2009-02-
1006
PQR Corp
632
2009-02-
1003
GHI Corp
614
2009-02-
1007
STU Corp
639
2009-03-
1003
GHI Corp
615
2009-03-
1007
STU Corp
638
2009-02-
27
24
08
17
04
27
24
12
14
12
14
The following is an EXPLAIN of this query. Note the use of the sparse index.
Explanation
----------------------------------------------------------------------1) First, we lock a distinct SQL00."pseudo table" for read
on a RowHash to prevent global deadlock for
SQL00.CUST_ORD_IX_2009.
2) Next, we lock SQL00.CUST_ORD_IX_2009 for read.
3) We do an all-AMPs RETRIEVE step from
SQL00.CUST_ORD_IX_2009 by way of an all-rows scan with a
condition of (
"(SQL00.CUST_ORD_IX_2009.cust_id > 600) AND (((EXTRACT(YEAR
FROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2009) AND
(((EXTRACT(MONTH
FROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2) OR
((EXTRACT(MONTH FROM
(SQL00.CUST_ORD_IX_2009.order_date )))=3 )))") into Spool 1
(group_amps), which is built locally on the AMPs. The size
of Spool 1 is estimated with no confidence to be 1 row. The
estimated time for this step is 0.06 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs
involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the
result of statement 1. The total estimated time is 0.06
seconds.
Sparse Index Advantages and Limitations
Sparse Join Indexes are a type of Join Index which contains a WHERE clause that
reduces the number of rows which would otherwise be included in the index. All
types of join indexes, including single-table, multi-table, simple or aggregate can be
sparse.
A sparse index makes sense when a definable subset of the rows in the join index
are needed to satisfy a large percentage of the queries which will use it.
By default, any join index, including sparse join index, has a NUPI on the first
column specified. You can explicitly define other columns to be the primary index.
Any combination of AND,OR,IN conditions may be applied to the sparse index WHERE
clause.
Sparse Index Advantages:
Reduces the storage requirements for a join index subtable.
Access is faster since the size of the subtable is smaller.
Better maintenance performance since not all changes to the base table will
affect the sparse index.
Sparse Index Limitations:
Sparse indexes have the same restrictions as any join index.
They require additional space and maintenance resources over and above
the base table requirements.
Summary
Materialized views are a cross between an index and a view.

Like an index, a materialized view has subtable rows, it can carry a row-id for joinback purposes and it requires maintenance when base table rows change.
Like a view, a materialized view represents a subset of the data in a table and the
view changes along with changes in the underlying base rows.
Materialized views are typically implemented to improve query performance. We
have discussed the following materialized view features in this module.
Partial-Covering Multi-Table Join Indexes - These are multi-table join indexes
which include a ROWID specification for one or more of the base tables. This permits
the optimizer to 'join-back' to the base row when information is needed which is not
provided in the join index. Ultimately, the optimizer will decide whether or not to
use this approach in place of another, depending on the associated query costs.
Sparse Join Indexes - These are join indexes which include a WHERE clause
which limits the base table rows that will be reflected in the join index. This permits
a join index to be built only for the rows which are most frequently accessed by
queries, such as current year or current month. The size of the join index is thereby
smaller and the maintenance costs are subsequently less. The optimizer will decide
if it can use a sparse index to reduce the costs associated with a given query.
Lab
Try It!
Answers:
Lab 1b
Lab 2
Lab 3
1a.) Create and populate the following two tables in your database,
then run the UPDATE statement.
CREATE TABLE orders AS Student.orders WITH DATA;
CREATE TABLE customers AS Student.customers WITH DATA;
UPDATE orders
SET order_date = order_date + INTERVAL '10' YEAR;
1b.) Create a sparse join index named cust_ord_ix_2009 with the

following properties:
Fixed index columns cust_id, cust_name
Variable index columns order_id, order_status, order_date
Inner join on customers and orders tables
Join condition on the cust_id columns
Where condition to include only open orders (order_status =

'O')
NUPI index is on cust_id.
2.) Create a query that returns a count of all open orders held by
valid customers.
3

Join Index

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Join Index

Uploaded by

Copyright:

Available Formats

4.

) Column Value Management

Upon completion of this module, the student should be able to:

Insert into a table using the DEFAULT VALUES feature.

Perform string functions using the POSITION feature.

Test case sensitivity with the LOWER attribute.

Rename a column of a table.

Using DEFAULT VALUES

- the column has defined default values

- the column has a default system value specified

- the column permits nulls

NOT NULL - Nulls are not permitted for this column

DEFAULT 22 - Unless otherwise specified, assign the column a value of 22

DEFAULT DATE '2010-01-01' - Unless otherwise specified, assign a date of

The command - INSERT INTO tablename DEFAULT VALUES;

Will insert defined default values into each column.

Will insert a null if no default is defined.

Will fail if no default is defined and a null is not allowed.

Create a Test table

Populate the test table

INTO test_tbl DEFAULT

Defaulting of data values may also be accomplished by the use of positional

The ALTER TABLE

The Correct ALTER TABLE

Creating Tables in Teradata Mode

SHOW the Table

INSERT into the Table

SELECT from the Table

Creating ANSI Mode Tables

Tables created in ANSI mode will have character columns defaulted to

In order to do non-casespecific testing, also called case blind testing, it is necessary to

(Columns created in ANSI mode are defaulted to Casespecific.)

CREATE MULTISET TABLE tdxxx.case_sp_test,NO FALLBACK,

SELECT from the Table

SELECT using UPPER

In ANSI, it is necessary to perform a case blind test to do non-case specific testing.

The LOWER Function (1 of 2)

Allows case blind testing on case specific strings.

Allows storage and retrieval of lower case characters.

FROM case_sp_test WHERE LOWER(col1)=LOWER(col2);

Note: Case blind test.

UPDATE the Table

UPDATE the Table

The LOWER Function (2 of 2)

applied to the column as part of the test in this situation.

INSERT INTO case_sp_test VALUES ('LAPTOP','LAPTOP');

SELECT * FROM case_sp_test WHERE col1=col2;

SELECT * FROM case_sp_test WHERE col1=LOWER(col2);

FROM case_sp_test WHERE col1(NOT CS)=col2(NOT CS);

Note: Case blind test but non-ANSI syntax.

SELECT LOWER (col1) FROM case_sp_test;

SELECT INDEX ('laptop','p');

SELECT POSITION ('p' IN 'laptop');

POSITION and Case Sensitivity

SELECT POSITION ('top' IN 'laptop');

ANSI result Teradata result

ANSI result Teradata result

ANSI result Teradata result

ANSI result Teradata result

ANSI result Teradata result

The new name doesn't already exist in the table.

The column is not part of an index.

The column is not part of any referential integrity contraints.

The affected column is not referenced in the UPDATE OF clause of a trigger.