Professional Documents
Culture Documents
Teradata has the ability to insert a row using only the DEFAULT VALUES keywords.
For this feature to work successfully, one of the following statements must be true
for each column of the table:
If none of these statements is true, an insert using DEFAULT VALUES will fail. Note
that such an insert may be executed multiple times as long as no uniqueness
attributes of the table are violated.
Column Attributes:
WITH DEFAULT - Assign the system default, spaces for char strings, zero for
numeric data types and current date for date data type
DEFAULT TIME - Assign the current time to the integer data type
DEFAULT USER - Assign the user id of the session to the character string
VALUES;
cole
-------15:27:31
colf
----------152731
colg
-------TD036
cole
-------15:27:31
15:27:43
colf
----------152731
152743
colg
-------TD036
TD036
VALUES;
Defaulting Methods
The INSERT
INSERT INTO test_tbl VALUES (,,,,,,);
The SELECT
SELECT * FROM test_tbl;
cola colb
colc
cold
------ ---- -------- -----
cole
--------
colf
-----------
colg
--------
22
22
22
?
?
?
10/01/01
10/01/01
10/01/01
.00
.00
.00
15:27:31
15:27:43
15:33:13
152731
152743
153313
TD036
TD036
TD036
While it is possible to alter a column definition via an ALTER TABLE statement, care
must be used to not add attributes which are in conflict with existing attributes or
existing data. Adding a new column with a NOT NULL attribute to a table returns an
error if the table already has rows defined. The new column must initially be set to
either a null or a value for the existing rows.
The INSERT
INSERT INTO test_tbl DEFAULT VALUES;
The SELECT
SELECT
cola
-----22
22
22
22
* FROM test_tbl;
colb
colc
cold
---- -------- ----?
10/01/01
.00
?
10/01/01
.00
?
10/01/01
.00
?
10/01/01
.00
cole
-------15:27:31
15:27:43
15:33:13
15:38:42
colf
----------152731
152743
153313
153842
colg
-------TD036
TD036
TD036
TD036
colh
-----0
0
0
0
Creating a Table
CREATE TABLE abc (a INT NOT NULL)
The INSERT
INSERT INTO abc DEFAULT VALUES;
***Failure 3811 Column 'a' is NOT NULL. Give the column a value.
Tables may be created in either Teradata mode or ANSI mode. Tables created in
Teradata mode will have all character columns defined as NON CASESPECIFIC by
default. This means that the data will be stored in the column in the same case that
it was entered. It will also be returned to the user in this stored case, however all
testing against the column will ignore case specificity.
The default comparison operation for non-casespecific columns is always noncasespecific.
CREATE a Table
CREATE TABLE case_nsp_test
(col1 CHAR(7)
,col2 CHAR(7));
(Columns created in Teradata mode are defaulted to NOT CASESPECIFIC)
Result
col1
LAPTOP
col2
laptop
Because both columns are defined as NCS, all comparision tests will be done NCS.
casespecific mode.
CREATE a Table
CREATE TABLE case_sp_test
(col1 CHAR(7)
,col2 CHAR(7));
The LOWER function behaves similarly to the UPPER function but in the reverse
direction. LOWER may be used as the choice for case blind test, just as UPPER may.
The LOWER function:
Example
SELECT *
col1
------LAPTOP
FROM case_sp_test;
col2
------LAPTOP
Both LOWER and UPPER may be used to change the stored contents of a column from
one case to another. This may be done via an UPDATE statement which applies the
function to the updated column values.
FROM case_sp_test;
col2
------LAPTOP
LAPTOP
POSITION Function
The POSITION function is the ANSI standard form of the INDEX function of Teradata
SQL. They are both used for locating the position of a string within a string.
Both functions require two arguments, the column or character string to be tested,
and the character or string of characters to be located.
With the INDEX function, the two arguments are separated with a comma.
With the POSITION function, the more English-like IN keyword is used in place of a
comma.
While both functions will continue to be available for compatibility purposes, it is
suggested that future coding be done with POSITION.
Example
Care must be exercised in executing the same scripts in both ANSI and Teradata
session mode, particularly where issues of case sensitivity are involved. This may
become very significant in using the POSITION function which will or will not
attempt to match case depending on the session type.
Note that in three of the five examples shown here, the ANSI session produces a
different result than the Teradata session.
Examples
Note: "ANSI result" implies an ANSI transaction session mode. "Teradata result"
implies a BTET transaction session mode.
Columns in a table may be renamed using the ALTER TABLE command. In order to
qualify for renaming, a column must not be referenced by any external objects and
must be assigned a new name not already in use by the table.
A column which participates in any index is not a candidate for renaming. Likewise,
a column which is either a referenced or referencing column in a referential
integrity constraint may not be renamed.
Note that renaming a column does not cascade the new name to any macros or
views which reference it. The views and macros will no longer function properly
until they have been updated to reflect the new name.
A column may be renamed to a different name provided that:
Example
CREATE a Table
CREATE TABLE rename_test
(col1 INT
,col2 INT
,col3 INT)
UNIQUE PRIMARY INDEX (col1)
,INDEX (col3);
Now ALTER it
ALTER TABLE rename_test RENAME col2 TO colb;
ALTER it Again
ALTER TABLE rename_test RENAME col1 TO cola;
Result
Failure:
ALTER it Again
ALTER TABLE rename_test RENAME col3 TO colc;
Result
Failure:
Lab
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
A. Show the first and last name of any employee in the employee table who
uses an initial followed by a period, instead of his/her first name. Use
position to solve this.
B. Show any employee who has an upper case letter B in his first name, but
in a position other than the first position. Use the POSITION function in
an ANSI mode session to solve this. While in ANSI mode, use the LIKE
operator to solve this problem again. Try these solutions in Teradata
(BTET) mode. Do they produce the same result? Is there a solution to this
problem in Teradata mode?
C. Show first and last name of any employee who has the letter 'a' in the
same position in both their first and last name. (Return to BTET mode
first.)
D. Display first names in lowercase and last names in uppercase for
employees whose last name begins with the letter R. Use Position to solve
this.
E. Create a small table according to the following definition.
Hint: For the remainder of these labs, it will be helpful to reset your
default database to your userid. (DATABASE tdxxx;)
F.
G.
H.
I.
Use the Identity Column feature to generate an identity column for a table.
(default is 1)
INCREMENT BY - the interval which is used for each new generated value. (default
is 1)
MINVALUE - the smallest value which can be place in this column (default is
smallest value supported by the data type of the column)
MAXVALUE - the largest value which can be place in this column (default is largest
value supported by the data type of the column)
CYCLE - after the maxvalue has been generated, restart the generated values
using the MINVALUE.
Only numeric data types from the following list may be used for identity columns:
INTEGER
SMALLINT
BYTEINT
DECIMAL
NUMERIC
Col2
----------9
9
9
Things to notice:
A value is generated regardless of whether:
a default is specified
a null is specified
a value is specified
It was not possible to add another row, because the MAXVALUE has been achieved.
In fact, no new row may be added to this table ever again.
Remove the rows from the table.
DELETE FROM test_idCol;
Now try to add the additional row.
INSERT INTO test_idcol VALUES (7, 9);
***Failure 5753 Numbering for Identity Column Col1 is over its limit.
Note that even after you delete all rows from the table, you cannot add new rows if
you have exceeded MAXVALUE and you have not specifed the CYCLE option. You
must drop and recreate this table in order to add rows.
Create another table similar to the previous one, but with the CYCLE option.
CREATE SET TABLE TEST_ID3 ,FALLBACK
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 3
Col2 INTEGER)
UNIQUE PRIMARY INDEX ( Col1 );
Again, insert the first three rows as follows:
INSERT INTO test_id3 VALUES (, 9);
CYCLE),
Col2
----------9
9
9
9
9
9
CYCLE),
Col2 INTEGER)
UNIQUE PRIMARY INDEX ( Col1 );
Insert the same three starter rows.
INSERT INTO test_id3 VALUES (, 9);
INSERT INTO test_id3 VALUES (NULL, 9);
INSERT INTO test_id3 VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
2
3
Col2
----------9
9
9
The fourth row would have made Col1 = 1 due to the CYCLE option.
Drop and recreate the table with a MINVALUE specified and with a nonunique primary index.
DROP TABLE test_id3;
CREATE SET TABLE TEST_ID3
(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 3
CYCLE),
Col2 INTEGER)
PRIMARY INDEX ( Col1 );
Insert the same three starter rows.
INSERT INTO test_id3 VALUES (, 9);
INSERT INTO test_id3 VALUES (NULL, 9);
INSERT INTO test_id3 VALUES (6, 9);
*** Warning: 5789 Value for Identity Column is replaced by a systemgenerated number
Now try to add the 4th row.
The fourth row would have had values (1,9) due to the CYCLE
option..
Col2
----------7
7
7
9
9
9
All rows are inserted successfully and the identity column recycles.
Col2
----------9
9
9
Col2
----------9
9
9
The column always knows what its last used value is.
SELECT
* FROM test_id3 ORDER BY 1;
Col1
----------1
3
4
5
Col2
----------9
9
9
9
Things to notice:
Col2
----------9
8
9
9
9
Note that the increment occurs, even though the insert of 1 previously
failed.
Generating BY DEFAULT (1 OF 2)
Now let's switch to BY DEFAULT mode. This mode will generate a value only when a
value is not explicitly expressed.
CREATE SET TABLE TEST_ID3 ,FALLBACK
(
Col1 INTEGER GENERATED BY DEFAULT AS IDENTITY
(MINVALUE 1 MAXVALUE 5 CYCLE),
Col2 INTEGER )
PRIMARY INDEX ( Col1 );
Col2
----------9
9
Note that no identity columns were generated. The values were provided
explicitly.
INSERT INTO test_id3 VALUES (, 9);
*** Failure 2802 Duplicate row error in TEST_ID3.
Things to notice:
Col2
----------9
8
9
9
Col2
----------9
8
8
9
8
8
9
Col2
----------7
9
8
8
9
8
8
9
Note that the recycle has begun by reverting back to the MINVALUE OF
1.
Generating BY DEFAULT (2 OF 2)
Consider that the last row generated for this table is the (1,7) row.
Col2
-----------
1
1
2
3
3
4
5
6
7
9
8
8
9
8
8
9
Col2
----------6
Things To Notice:
Emptying the table does not reset to the START WITH value.
Col1
----------2
4
Col2
----------6
6
Note that no identity columns were generated. The value 4 was provided
explicitly.
Now insert the following three rows.
INSERT INTO test_id3 VALUES (, 5); - adds (3,5)
INSERT INTO test_id3 VALUES (, 5); - adds (4,5)
INSERT INTO test_id3 VALUES (, 5); - adds (5,5)
SELECT * FROM test_id3 ORDER BY 1;
Col1
Col2
----------- ----------2
6
3
5
4
5
4
6
Note that another 4 row is generated even though one already exists.
The generating mechanism operates independently or pre-existing rows.
INSERT INTO test_id3 VALUES (, 5); - adds (1,5)
SELECT * FROM test_id3 ORDER BY 1;
Col1
----------1
2
3
4
4
5
Col2
----------5
6
5
5
6
5
Value-ordered Indexes
IC's are not supported for load utilities Fastload and Multiload.
Bulk inserts across multiple sessions cannot guarantee that the sequence of
the IC numbers will correlate to the sequence of the specified INSERT
statements.
Bulk inserts done via INSERT SELECT also cannot guarantee that the
sequence of the assigned IC's will be unbroken. This is because each AMP
pre-allocates a range of numbers based on a pre-defined interval (specified
in the DBS Control Record). Consequently each AMP will provide its own
sequence independently of the others.
History
Prior to the auto-generated key retrieval feature, there was no simple way to
determine the value assigned to the identity column for an INSERTed row of a
table. If there was a unique column, or a unique combination of columns, with a
USI assigned, then the user could do a SELECT of the identity column, qualifying
on the unique column(s). This required an additional query request and would, in
some cases, require an all-AMP operation, and therefore was considered inefficient.
It also presented a bigger problem in the case of INSERT-SELECT which usually
adds multiple rows to the table in question.
Business Value
Having the IdCol values automatically returned enhances applications that require
quick or immediate retrieval of assigned identity values.
Examples
Example
Lets say we have the following CREATE TABLE statement:
phoneno
-----------------919848123
Considerations
INSERTs have the additional cost of row retrieval if Auto Generated Key Retrieval
(AGKR) is requested. However, the additional retrieval takes less time overall
compared with having to run a separate SELECT to retrieve the identity value.
Limitations
The following limitations apply:
Lab
Iterated INSERTs have to adhere to the 2048 spool limit of the Array
Support feature. A max of 1024 iterations is possible as each iteration uses
an AGKR spool and a response spool.
This feature is enabled through the Client, e.g., JDBC driver or BTEQ. If the
Client version does not include the AGKR capability, then there will be no
AGKR response for INSERT requests.
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).
1.) Create a new table in your own database using the following text:
Answers:
Lab 2
Lab 3
The COMPRESS phrase allows values in one or more columns of a permanent table to
be compressed to zero space, thus reducing the physical storage space required for
a table.
The COMPRESS phrase has three variations:
Compression Option
What Happens
COMPRESS
COMPRESS NULL
COMPRESS
<constant>,<constant>,..
Both nulls and the string 'Savings' will be compressed when the row is
stored.
Example 2
An example of multiple-value column compression:
CREATE TABLE bank_account_data
(customer_id INTEGER
,account_type CHAR(10) COMPRESS ('SAVINGS','CHECKING','CD','MUTUAL
FUND'));
Things to notice in this example:
Each of the four specified strings are now compressed when the row is
stored.
Zero space is taken in the physical row for each of these values.
This can significantly reduce the amount of space needed to contain the
table.
System Performance:
System performance can be expected to improve as a result of value compression.
Because each data block can hold more rows, fewer blocks need to be read to
resolve a query, and thus fewer physical I/O's can be expected to take place. Also,
because the data remains compressed while in memory, more rows can be available
in cache for processing per I/O.
Compression Transparency:
Compression is transparent to all user applications, utilities, ETL tools, ad hoc
queries and views.
Compression Suggestions
Nulls
Zeroes
Spaces
Default Values
State
City
Country
Automobile Make
Account Type
First Name
Last Name
Limitations:
Only columns with a fixed physical length may be compressed - e.g. CHAR
but not VARCHAR.
The aggregate of all compressed values may not exceed the maximum size
of a table header (64K).
INTERVAL
TIME
TIMESTAMP
VARCHAR
VARBYTE
VARGRAPHIC
The SQL ALTER TABLE command supports adding, changing, or deleting column
compression on one or more existing columns of a table, whether the table is
loaded with data or is empty.
History
Traditionally there has been a trade-off between the desire to store more data and
the cost of storing the additional data. The column compression feature of the
Teradata database has always provided an opportunity to reduce the amount of
data to be physically carried in the database by storing frequently repeating values
in a table header rather than repeating them for every row in the table. The column
compression feature thereby helps improve the trade-off between these competing
requirements, making it less expensive to store more data.
Without the ALTER TABLE Compression feature, when compression requirements
on a table change, it would be necessary to recreate the table with the new
compression requirements specified, and then reload the table. The ALTER TABLE
Compression feature makes it easier to add, change or delete compression
requirements from an existing table without the user having to recreate and reload
the table.
Examples
The following are examples of the ALTER TABLE Compression feature. Note,
whenever the COMPRESS attribute is applied to a nullable column, nulls will always
be compressed by default. If other compression values are specified, they will be
compressed in addition to null compression.
There is an implied sequence in the statements that follow:
Example Set 1:
ALTER TABLE Table1 ADD Col1 COMPRESS;
If the column Col1 exists and is nullable, then the column will be a
compressible column with NULL as the compress value.
If the column Col1 does not exist, then the column is added to the table and
the column will be a compressible column with NULL as the compress value.
The column will be compressed for nulls and for the constant value Savings.
If the column is already compressed the constant value Savings will replace
the existing compress value or list of values.
If the column is already compressed, the new compress list will replace the
existing compress value or list of values.
Example Set 2:
ALTER TABLE Table1 ADD Col2 COMPRESS 0;
Note value zero must be restated if this follows the previous ALTER
statement.
Only 10000 is added to list since NULL, 0, 100 and 1,000 are already
compressed in prior example.
Compression Optimizations
The table will actually be rebuilt at the time of the execution of the ALTER
TABLE statement.
Space overhead requirement for rebuilding a table is around 2 MB. The table
is rebuilt one cylinder at a time, so no matter how big or small a table is, the
overhead remains the same.
Limitations
The ALTER TABLE ADD cname syntax allows certain other attributes to be
included in the same statement as the COMPRESS attribute. The following are
exceptions to this rule:
Additional Considerations
Summary
Lab
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).
Click on the answer button to the left to see the answers.
NUSI Review
Non-Unique Secondary Indexes (NUSI's) are a Teradata index feature which permits
defining non-primary indexes on non-unique columns. Typically, this is done to
improve performance on queries which use the column or columns in the WHERE
clause selection criteria. NUSI's may be created either as a part of the CREATE
TABLE syntax, or they may be created after table creation using CREATE INDEX
syntax. NUSI's may be easily dropped when their presence is no longer needed by
using the DROP INDEX syntax.
Value Ordered NUSI's allow NUSI subtable rows to be sorted based on a data
value, rather than on a hash of the value. This is extremely useful for range
processing where a sequence of values between an upper and lower limit is desired.
A single column
Note: Although DECIMAL data types are permitted, their storage length must not
exceed four bytes and they cannot have any precision digits.
Index Covering
If a query references only those columns that are contained within a given index,
the index is said to "cover" the query. In these cases, it is often more efficient for
the optimizer to access only the index subtable and avoid accessing the base table
rows altogether.
Covering will be considered for any query that references only columns defined in a
given NUSI. These columns can be specified anywhere in the query including the:
1. SELECT list
2. WHERE clause
3. aggregate functions
4. GROUP BY
5. expressions
The presence of a WHERE condition on each of the indexed columns is not a
guarantee for using the index to cover the query. The optimizer will consider the
appropriateness and cost of 'covering' versus other alternative access paths and
choose the optimal plan.
The potential performance gains from index covering require no user intervention
and will be transparent except for the improved access time. The use of the NUSI
can be validated by reviewing the execution plan returned by EXPLAIN.
Join Index
The Join Index feature provides indexing techniques that can improve the
performance of certain types of queries. The Join Index is a physical structure,
populated with rows that contain columns from one or more tables. Once created, it
becomes an option available to the optimizer but is never directly accessed by the
user.
Its purpose is to aid in the joining of tables by providing needed data from an index
rather than having to access the base rows of the table. By using a join index the
optimizer may be able to avoid having to access or redistribute many individual
tables and their base rows.
The Join Index supports syntax for the following types of indexes:
1. Multiple-table Join Index - Used to pre-join multiple tables
2. Single-table Join Index - Used to rehash and redistribute the rows of a
single table based on a specified column or columns
3. Aggregate Join Index - Used to create an aggregate index to be used as a
summary table
In this section we will be discussing the first two only. The third item, aggregate
indexes, are discussed in a separate module of this training.
Multiple-table Join Indexes are used to pre-join two or more tables. Consider the
following tables which are in the 'Student' database.
SELECT COUNT(order_id)
FROM orders
WHERE cust_id IS NOT NULL;
Count(order_id)
--------------50
A join index will not help this query. The 'order' table covers the query.
SELECT COUNT(o.order_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id;
Count(order_id)
--------------49
A join index can help this query. Two tables are needed to cover the query.
The following shows the creation of a join index which will improve the performance
of any joins it can cover.
CUST_NAME
ABC Corp
ORDER_DATE
990120
502
990220
1002
BCD Corp
503
990320
504
990420
505
990520
506
990620
507
990122
508
990222
509
990322
SELECT COUNT(o.order_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id;
Count(order_id)
--------------49
The join index helps this query because it covers the query and
therefore the
result may be generated without ever accessing the rows of the base
table.
Compare the costs as shown by EXPLAIN.
.39 secs.
.17 secs.
join
index.
The join index seen above may also be used to cover other queries.
1002
CUST_NAME
ABC Corp
BCD Corp
ORDER_DATE
990120
502
990220
503
990320
504
990420
505
990520
506
990620
507
990122
508
990222
509
990322
SELECT COUNT(c.cust_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN 990101 AND 990131;
Count(cust_id)
--------------9
Join Indexes are always assigned a primary index in order to hash distribute the
index rows across the AMPs. In the example created here, the Primary Index of the
Join Index is the column Cust_id.
We can explicitly specify the primary index for the Join Index or allow it to default
to the first column specified. In our upcoming look at Single-Table Join Indexes, we
will see the usefullness of being able to choose and specify a primary index on a
Join Index.
How many distinct valid customers with customer ids between 1001 and 1005 have
assigned orders?
SELECT COUNT(DISTINCT(c.cust_id))
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE c.cust_id BETWEEN 1001 AND 1005;
Count(Distinct(cust_id))
-----------------------5
Because this query accesses a range of customer ids, the optimizer can access the
rows of the Join Index more efficiently because the qualifying rows are already
sequenced by cust_id and thus easily located.
The rules for the ORDER BY column are the same as for Value-Ordered NUSI's.
The ORDER BY column must be:
A single column
A column which is a part of or all of the fixed-portion index definition
A numeric column non-numerics are not allowed
No greater than four bytes in length INT, SMALLINT, BYTEINT, DATE, DEC are
valid
Note: Although DECIMAL data types are permitted, their storage length must not
exceed four bytes and they cannot have any precision digits.
A NUSI may be created on a join index and may be used to improve access to the
join index rows. In the example just seen, we ordered the rows of the join index by
'cust_id' in order to facilitate 'range' processing on customer numbers.
Because the rows of the join index can only be sequenced by one column, we need
to use another technique to facilitate 'range' processing for the order date.
We can solve this problem by adding a NUSI on the join index and value ordering it
on the order date. NUSI's on join indexes can be built as part of the CREATE JOIN
INDEX statement, or they can be added after join index creation using the CREATE
INDEX statement.
Single-Table Join Indexes are created to rehash and redistribute the rows of a
table by a column other than the Primary Index column. The redistributed index
table may be a subset of the columns (vertical subset) of the base table. It can
significantly reduce the costs associated with doing a table redistribution for join
processing.
In building join plans for two tables, the optimizer must first decide how to insure
that all joinable rows are co-located on the same AMP. If both tables are being
joined on their respective Primary Index columns, the joinable rows are already colocated on the same AMP, thus no redistribution of data is needed. If either table is
not using its Primary Index columns as the join column(s), then a redistribution
must occur.
Single-Table Join Indexes provide the ability to 'pre-distribute' the rows of a
table based on the hash of the join value. This will eliminate the need for the
optimizer to require a redistribution to perform the join - it can take advantage of
the already distributed rows of the Single-Table Join Index.
Consider the two tables 'employee' and 'department'.
Query 4
Select all employee numbers, their department number and department name.
SELECT e.employee_number
,d.department_number
,d.department_name
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Joining these two tables on the 'department_number' column might require a
redistribution of the rows of the employee table. The department table is already
distributed based on the PI column 'department_number', but the employee table is
distributed on the PI column 'employee_number'. Depending on the size of the
tables, the redistribution can become a costly operation.
One possible technique to expedite this join would be to create a Single-Table Join
Index on the employee table as follows:
Query 4 (Explained)
Select all employee number, their department number and department name.
The result goes into Spool 1, which is built locally on the AMPs.' with low
confidence to be 24 rows. The estimated time for this step is 0.18 seconds.
Summary
The following are index options available for query performance enhancement which
we have seen in this module.
Single-table join indexes to pre-hash-distribute rows of one table to colocate with joinable rows
Join indexes may be further enhanced by applying the following features to them:
Define the PI of the join index to distribute the join index rows most
effectively
Define the ordering of the join index rows to sequence the join index rows
Define a NUSI on the join index to access rows in the join index more
effectively
(Multi-table join index only)
Define the ordering of the NUSI rows to sequence the NUSI rows, either
hash or value-based
(Multi-table join index only)
Lab
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Telnet button in the lower left
hand screen of the course. Two windows will pop-up: a BTEQ
Instruction Screen and your Telnet Window. Sometimes the BTEQ
Instructions get hidden behind the Telnet Window. You will need
these instructions to log on to Teradata.
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE tdxxx;).
Answers:
Lab A.1
Lab A.2
Lab A.3
Lab B.1
Lab B.2
Lab B.3
Lab B.4
Lab C.1
Lab C.2
A.1 Make copies in your own database of the city and state tables
which are located in the Student database. You may accomplish this
by using the following SQL:
CREATE TABLE state AS Student.state WITH DATA ;
CREATE TABLE
Look at the table definitions for the City table and the State table.
Construct a query which returns the following information by inner
joining these two tables. Order the results by city population within
state population.
City Name
...
City Population
...
State Name
...
State Population
...
A.2 Create a Join Index in your own database called citystateidx. The
fixed potion of the index should contain the state name and the state
population. The variable portion should contain the city name and
the city population.
A.3 Now rerun the query to see if you get the same result. If you do,
EXPLAIN the query to see if your Join Index was used.
B.1 Drop the Join Index citystateidx.
B.2 Modify the query to only show states whose population is
between one and three million. Run the query.
B.3 Recreate the Join Index, however this time insure that the index
rows will be sorted by the state population column.
B.4 Rerun the query to insure the same results, then EXPLAIN the
query to determine if the Join Index was used.
C.1 Add a NUSI on the Join Index on the city population column.
Value order the NUSI by city population.
C.2 Rerun the query to insure the same results, then EXPLAIN the
query to determine if the NUSI was used.
8.) Aggregate Join Indexes
Objectives
A Join Index is an optional index which may be created by the user for one of the
following three purposes:
Distribute the rows of a single table on the hash value of a foreign key value
The first two listed purposes are covered in an earlier module of this training
program.
In this module, we will concentrate on the last of the three purposes aggregating
columns into a join index that the optimizer may choose to use as a summary table.
Summary Tables
Queries which involve counts, sums, or averages over large tables require
processing to perform the needed aggregations. If the tables are large, query
performance may be affected by the cost of performing the aggregations.
Traditionally, when these queries are run frequently, users have built summary
tables to expedite their performance. While summary tables do help query
Require queries to be coded to access summary tables, not the base tables
Allow for multiple versions of the truth when the summary tables are not upto-date
Aggregate Indexes
Aggregate indexes provide a solution that enhances the performance of the query
while reducing the requirements placed on the user. All of the above listed
limitations are overcome with their use.
An aggregate index is created similarly to a join index with the difference that
sums, counts and date extracts may be used in the definition. A denormalized
summary table is internally created and populated as a result of creation. The index
can never be accessed directly by the user. It is available only to the optimizer as a
tool in its query planning.
Aggregate indexes do not require any user maintenance. When underlying base
table data is updated, the aggregate index totals are adjusted to reflect the
changes. While this requires additional processing overhead when a base table is
changed, it guarantees that the user will have up-to-date information in the index.
Aggregate Indexes are similar to other Join Indexes in that they are:
Multiload and Fastload may not be used to load tables for which join indexes
are defined.
Aggregate Indexes are different from other Join Indexes in that they:
You must have one of the following two privileges to create any join index:
DROP TABLE rights on each of the base tables (or the containing database)
or,
CREATE TABLE on the database or user which will own the join index
Explaining the previous query shows us that this is a primary index access against
the 'daily_sales_2004' table. (Note that because the cost of aggregation is not
calculated, no final cost for the query is generated.)
Explanation
Creating a join index gives the optimizer the option of using the 'pre-aggregated'
information kept in the index, thus avoiding the need to generate a separate
aggregation step.
Explaining the previous query shows us that this time the aggregate index is
employed.
Explanation
statement 1.
Because the aggregations are already calculated and available in the index, the
costs associated with step one are reduced. The cost of step two is unchanged
(0.17).
Because aggregation costs are not currently carried in EXPLAIN text, the savings in
processing time for step one are not shown, however the response time reduction
for the user can and should be substantial.
Join index definitions may be seen using the SHOW JOIN INDEX construct.
Example
Show the aggregate index named monthly_sales;
If both COUNT and SUM are in the index, AVERAGE calculations may also
make use of the index.
Ultimately, just as with any join index, it is the optimizer's choice whether or not
the index is useful for a specific query. The index created previously is repeated
here for convenience.
An index is said to 'cover' the query (or cover part of the query) if the optimizer can
generate the query results using the index as a replacement for one or more of the
specified tables.
Example
Show the grand total sales for item 10 as contained in the daily_sales_2010 table.
SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10
GROUP BY 1;
itemid
Sum(sales)
---------------10
16950.00
EXPLAIN SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10
GROUP BY 1;
Explanation (Partial)
are computed locally, then placed in Spool 2. The size of Spool 2 is estimated
with high confidence to be 1 to 1 rows.
Note: This is an example of the index 'monthly_sales' covering the query.
SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010'
GROUP BY 1;
itemid Sum(sales)
------- --------10
8800.00
EXPLAIN SELECT itemid
,SUM(sales)
FROM daily_sales_2010
WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010'
GROUP BY 1;
Explanation (Partial)
Aggregate indexes are not used in conjunction with queries using SUM Window,
COUNT Window, WITH or WITH BY functions. Because these functions must process
and display all qualifying detail rows, the value of the aggregate index is reduced.
Explaining any query using these functions will validate that the index is not used.
Lab
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).
Indexes Revisited
In the next few pages, we will be looking at Hash Indexes and their properties.
Because they share in common many attributes of secondary indexes and join
indexes, let's first review the basics of secondary indexes and join indexes.
Secondary Indexes
Secondary indexes are defined to provide alternate access pathways to the base
rows of a single table. Users may define secondary indexes, but they cannot be
accessed directly by the user, nor can the user affect how the index rows are
distributed. Their use or non-use is a option to the optimizer in its query planning.
The following are properties of secondary indexes:
Can 'cover' certain queries, but their primary purpose is locating base rows
Join Indexes
Join indexes are defined to reduce the number of rows processed in generating
result sets from certain types of queries, especially joins. Like secondary indexes,
users may not directly access join indexes. They are an option available to the
optimizer in query planning. The following are properties of join indexes:
Are used to replicate and 'pre-join' information from several tables into a
single structure.
Are designed to cover queries, reducing or eliminating the need for access to
Usually do not contain pointers to base table rows (unless user defined to do
so).
Are distributed based on the user choice of a Primary Index on the Join
Index.
Permit Secondary Indexes to be defined on the Join Index (except for Single
Table Join Indexes), with either 'hash' or 'value' ordering.
Facilitates the ability to join the foreign key table with the primary
key table.
A join index which contains 'pre-joined' data from two or more tables.
Hash Indexes are database objects that are user-defined for the purpose of
improving query performance. They are file structures which contain properties of
both secondary indexes and join indexes.
They may sometimes cover a query without use of the base table rows.
They are very similar to single-table join indexes (STJI), however with added
functionality.
Automatically contains base table PI value as part of the hash index subtable
row.
Contains additional information needed to locate the base table row (e.g.
uniqueness value).
Note:
Each hash index row contains the employee number, the department
number.
The BY clause indicates that the rows of this index will be distributed by the
employee_number hash value.
The ORDER BY clause indicates that the index rows will be ordered on each
AMP in sequence by the employee_number hash value.
Example 2
The same hash index definition could have been abbreviated as follows:
The BY
The ORDER BY clause defaults to the order of the base table rows.
The column(s) specified in the BY clause must be a subset of the columns which
make up the hash index.
When the BY clause is specified, the ORDER BY clause must also be specified.
Covered Query
The following is an examply of a simple query which is covered by this index:
SELECT employee_number, department_number FROM emp1;
Normally, this query would result in a full table scan of the employee table. With the
existence of the hash index, the optimizer can pick a less costly approach, namely
retrieve the necessary information directly from the index rather than accessing the
lengthier (and costlier) base rows.
Consider the explain of this query:
EXPLAIN SELECT employee_number, department_number FROM emp1;
1) First, we lock a distinct TD000."pseudo table" for read on a
RowHash to prevent global deadlock for TD000.hash_1.
2) Next, we lock TD000.hash_1 for read.
3) We do an all-AMPs RETRIEVE step from TD000.hash_1 by way of an allrows scan with no residual conditions into Spool 1, which is built
locally on the AMPs. The size of Spool 1 is estimated with low
confidence to be 8 rows. The estimated time for this step is 0.15
seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request. -> The contents of Spool 1 are sent back to
the user as the result of statement 1. The total estimated time is
0.15 seconds.
Example 3
The following is an alternate definition of the hash index 'hash_1'.
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY VALUES(employee_number);
This definition produces the same hash index, however the index rows are
ordered based on employee_number value rather than the hash value.
The merge join step (#4) is able to proceed directly after lock steps.
In this case, the hash index functions much the same as a single table join
index (STJI).
5.10 seconds. )
Points to consider about this query plan:
Job-code values are returned by using the ROWID pointer to the base row
(Step #6).
Row hash locks are used to access the base rows of employee table.
A similar effect could have been achieved with a single table join index (STJI) by
adding an explicit ROWID to the index definition as follows:
CREATE JOIN INDEX ji_emp AS SELECT employee_number, department_number,
ROWID FROM emp1;
The following page lists advantages of Hash Indexes over STJI's.
Hash indexes are not supported with the following Teradata features and
utilities:
o
Multiload
Fastload
Archive/Recovery
Triggers
Permanent Journal
Upsert Processing
Lab
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).
Lab 1
1a.) Create a hash index which will facilitate joins between the 'loc1'
and 'loc_emp1' tables. The hash index should:
Hash Indexes
First, let's review a little bit about hash indexes as seen in a previous section.
CREATE HASH INDEX hash_2
(employee_number, department_number) ON emp1
BY (department_number)
ORDER BY HASH (department_number);
Hash indexes, by definition are defined on a single table. This hash index 'hash_2'
Join Backs
Note that the following query, which additionally selects the job_code column, is
also able to use the HI. This is due to the availability of the ROWID which is
implicitly included in all hash indexes. The implicit ROWID allows the optimizer to 'join
back' to the base employee row to pick up additional information (i.e., job_code), not
available in the HI itself.
SELECT e.employee_number
, d.department_name
, e.job_code
FROM emp1 e INNER JOIN dept1 d
ON e.department_number = d.department_number;
5) We do an all-AMPs JOIN step from TD000.d by way of a RowHash match
scan. with no residual conditions, which is joined to TD000.hash_2.
TD000.d and TD000.hash_2 are joined using a merge join, with a join
condition of ("TD000.hash_2.department_number =
TD000.d.department_number"). The result goes into Spool 2, which is
duplicated on all AMPs. The size of Spool 2 is estimated with low
confidence to be 64 rows. The estimated time for this step is 0.18
seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to TD000.e. Spool 2 and TD000.e are
Step 5 joins the HI to the department table, and step 6 uses the implicit ROWID in the HI
to locate the base row for each employee to extract its job code. We consider the HI to
partially cover the query because it still requires information from another table.
A limitation of HI's is that they are by definition single table indexes.
Now create a second join index which includes the ROWID for the employee table.
CREATE JOIN INDEX ji_emp_dept2 AS
SELECT e.employee_number
, d.department_name
, e.ROWID
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Now in addition to selecting the employee number and department name, let's add
the job code to the select list. This column exists on the employee table and is not
an explicit part of the join index.
SELECT e.employee_number, d.department_name, e.job_code
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
The optimizer may choose to cover this query using the join index to acquire the
employee number and the department name, and it may use the rowid of the
employee table to 'join back' to the base employee row to acquire the job code.
Thus, it works similarly to a hash index, however the join index has the added
property of being defined on multiple tables. This provides the oppotunity to 'join
back' to either table, assuming both ROWID's are specified in the join index
definition.
It is still considered a 'partial covering' index because it still had to join back to the
employee table to fully resolve the query, but it did not have to scan the entire
employee table.
The join back capability provided by the ROWID syntax is typically chosen by the
optimizer when the number of rows in the table is fairly large. Smaller table joins
may not demonstrate this approach.
manager's number.
SELECT e.employee_number
, d.department_name
, d.manager_employee_number
FROM employee e INNER JOIN department d
ON e.department_number = d.department_number;
Note, to use this option, aliases must be assigned to each rowid column selected in
the join index definition. If the columns are not 'renamed' using an alias, the
syntaxer will not allow more than one column named 'ROWID' in the same query.
Tables with lots of nulls which are ignored for purposes of most queries
Tables with frequent access for rows which contain quantities above or below
a certain limit.
Tables which are time oriented and where the most frequent accesses are for
current information.
Example
Create a non-sparse join index between the customers and orders tables.
CREATE JOIN INDEX cust_ord_ix
AS SELECT (c.cust_id, cust_name)
,(order_id, order_status, order_date)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
Count(order_id)
--------------86
This query represents a simple join of the two tables to produce an aggregation.
The defined join index is able to resolve this query as seen in the extract of the
EXPLAIN seen here.
3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of
an all-rows scan with no residual conditions, and the
grouping identifier in field 1. Aggregate Intermediate Results are computed
globally, then placed in Spool 4. The size of Spool
4 is estimated with high confidence to be 1 row. The estimated time for this
step is 0.08 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an allrows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with high confidence to be 1
row. The estimated time for this step is 0.03
seconds.
Now, let's try another query, this time restricting the time interval.
How many valid customers have assigned orders in January 2002?
SELECT COUNT(c.cust_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-0131');
Once again, we see that the join index is able to cover the query. The number of
rows participating is however, expected to be much fewer.
3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of
an all-rows scan with a condition of (
"(SQL00.cust_ord_ix.order_date <= DATE '2009-01-31') AND
(SQL00.cust_ord_ix.order_date >= DATE '2009-01-01')"), and the
grouping identifier in field 1. Aggregate Intermediate Results are computed
globally, then placed in Spool 4. The size of Spool
4 is estimated with high confidence to be 1 row. The estimated time for this
step is 0.06 seconds.
Since we anticipate that most of the queries against this table will involve rows
from the year 2009, we may wish to create a sparse index with only those rows
represented in the index.
The following creates a new sparse join index which only includes rows for the year
2009.
SELECT COUNT(c.cust_id)
FROM customers c INNER JOIN orders o
ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-0131');
Count(order_id)
--------------86
3) We do an all-AMPs SUM step to aggregate from
SQL00.cust_ord_ix_2009 by way of an all-rows scan with a condition
of ("(((EXTRACT(YEAR FROM
(SQL00.cust_ord_ix_2009.order_date )))< )))= 2009) AND
((EXTRACT(MONTH FROM <= 1 ))) AND ((EXTRACT(YEAR
FROM (SQL00.cust_ord_ix_2009.order_date )))>= 2009)"), and the are
computed globally, then placed in Spool 4. The size of Spool
time for this step is 0.08 seconds.
Any query for the year 2009 which is covered by the sparse index will be optimized
to use the sparse index instead of the base tables.
Example
cust_name
---------------
order_id
order_status
-----------
------------
YZA Corp
648
2009-02-
JKL Corp
620
2009-02-
VWX Corp
645
2009-03-
1005
MNO Corp
627
2009-03-
1006
PQR Corp
633
2009-03-
1009
YZA Corp
649
2009-03-
1004
JKL Corp
621
2009-03-
1008
VWX Corp
644
2009-02-
1005
MNO Corp
626
2009-02-
1006
PQR Corp
632
2009-02-
1003
GHI Corp
614
2009-02-
1007
STU Corp
639
2009-03-
1003
GHI Corp
615
2009-03-
1007
STU Corp
638
2009-02-
27
24
08
17
04
27
24
12
14
12
14
The following is an EXPLAIN of this query. Note the use of the sparse index.
Explanation
----------------------------------------------------------------------1) First, we lock a distinct SQL00."pseudo table" for read
on a RowHash to prevent global deadlock for
SQL00.CUST_ORD_IX_2009.
2) Next, we lock SQL00.CUST_ORD_IX_2009 for read.
3) We do an all-AMPs RETRIEVE step from
SQL00.CUST_ORD_IX_2009 by way of an all-rows scan with a
condition of (
"(SQL00.CUST_ORD_IX_2009.cust_id > 600) AND (((EXTRACT(YEAR
FROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2009) AND
(((EXTRACT(MONTH
FROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2) OR
((EXTRACT(MONTH FROM
(SQL00.CUST_ORD_IX_2009.order_date )))=3 )))") into Spool 1
(group_amps), which is built locally on the AMPs. The size
of Spool 1 is estimated with no confidence to be 1 row. The
estimated time for this step is 0.06 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs
involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the
result of statement 1. The total estimated time is 0.06
seconds.
Sparse Join Indexes are a type of Join Index which contains a WHERE clause that
reduces the number of rows which would otherwise be included in the index. All
types of join indexes, including single-table, multi-table, simple or aggregate can be
sparse.
A sparse index makes sense when a definable subset of the rows in the join index
are needed to satisfy a large percentage of the queries which will use it.
By default, any join index, including sparse join index, has a NUPI on the first
column specified. You can explicitly define other columns to be the primary index.
Any combination of AND,OR,IN conditions may be applied to the sparse index WHERE
clause.
Better maintenance performance since not all changes to the base table will
affect the sparse index.
They require additional space and maintenance resources over and above
the base table requirements.
Summary
which limits the base table rows that will be reflected in the join index. This permits
a join index to be built only for the rows which are most frequently accessed by
queries, such as current year or current month. The size of the join index is thereby
smaller and the maintenance costs are subsequently less. The optimizer will decide
if it can use a sparse index to reduce the costs associated with a given query.
Lab
Try It!
For this set of lab questions you will need information from the
Database Info document.
To start the online labs, click on the Access Lab Server button in
the lower left hand screen of the course. A window will pop-up
with instructions on accessing the lab server
You will need these instructions to log on to Teradata.
If you experience problems connecting to the lab server, contact
Training.Support@Teradata.com
Prior to doing these labs, it will be helpful to reset your default database to
your own user id (i.e. DATABASE TDnnnn;).
Answers:
Lab 1b
Lab 2
Lab 3
1a.) Create and populate the following two tables in your database,
then run the UPDATE statement.
CREATE TABLE orders AS Student.orders WITH DATA;
CREATE TABLE customers AS Student.customers WITH DATA;
UPDATE orders
SET order_date = order_date + INTERVAL '10' YEAR;
2.) Create a query that returns a count of all open orders held by
valid customers.
3