You are on page 1of 55

WG3

Database Languages
Stephen Cannan
Convenor

2002-05-06
Agenda

• Working Group changes


• ISO/IEC 9075
• ISO/IEC 20606
• Recap of recent changes
• Sketch of planned changes

2 S J Cannan
Working Group changes

• 3 ex-WG5 subprojects absorbed


– 20606-1 Authorization and Audit
– 20606-2 Distribution Schema
– 20606-3 Encompassing Transaction
• Effect on WG3 internal structure

3 S J Cannan
ISO/IEC 9075
• Multi-part standard
– Part 1: SQL/Framework
– Part 2: SQL/Foundation
– Part 3: SQL/CLI
– Part 4: SQL/PSM
– Part 9: SQL/MED
– Part 10: SQL/OLB
– Part 11: SQL/Schemata
– Part 13: SQL/JRT
– Part 14: SQL/XML

4 S J Cannan
Part 1: SQL/Framework

• Common definitions & concepts


• Structure of multi-part standard
• Basic conformance structure & statement
• About 85 pages

5 S J Cannan
Part 2: SQL/Foundation

• The “core” of the standard


• Includes:
– Traditional SQL
– Object-oriented SQL
– Module Language
– Host language bindings (except Java)
– Dynamic SQL
– Direct SQL
• Excludes:
– Information Schema & Definition Schema (in Part 11)
• About 1300 pages
6 S J Cannan
Part 3: SQL/CLI

• Call-Level Interface
• Best-known implementation: ODBC
• About 400 pages

7 S J Cannan
Part 4: SQL/PSM

• Persistent Stored Modules


• PSM-96 specified:
– functions & procedures
– SQL-server modules
– computational completeness
• In PSM:1999, functions & procedures moved to Foundation
(same in 2003)
• Analogous to PL/SQL, Transact-SQL, etc.
• About 170 pages

8 S J Cannan
Part 9: SQL/MED

• Management of External Data


• Foreign Servers, Foreign-Data Wrappers,
Foreign Tables
• SQL-aware vs non-SQL-aware
• Datalinks
• Merge with 20606-2 Distribution Schema
• About 500 pages

9 S J Cannan
Part 10: SQL/OLB

• Object Language Bindings


• Embedded SQL in Java
• SQLJ Part 0
• About 360 pages

10 S J Cannan
Part 11: SQL/Schemata

• Information Schema
• Definition Schema
• About 300 pages

11 S J Cannan
Part 13: SQL/JRT

• Java Routines and Types


• Java routines stored in an SQL database, and
invoked from SQL statements
• Java classes used as data types of SQL
columns
• About 200 pages

12 S J Cannan
Part 14: SQL/XML

• Under construction
• Using SQL and XML together
• About 150 pages, but still growing

13 S J Cannan
ISO/IEC 20206

• 20606-1 Authorization and Audit


• 20606-3 Encompassing Transaction

14 S J Cannan
Project 20206 –1
Authorization and Audit
• Addresses issues of:
– Authentication
– Access Control
– Transfer Integrity
– Transfer Confidentiality
– Non-repudiation

• No Working Draft available


• CD expected 2002-06-01
• John Hadjioannou listed as Editor but not present

15 S J Cannan
Project 20206 –3
Encompassing Transaction
• Interacts with external Transaction Managers
• Support for xa_ interface
– 9075-6 XA Specialization (SQL/Transaction)
• Propagates Global Context information

• No Working Draft available


• CD expected 2002-06-01
• Bob Sunday listed as Editor but has withdrawn

16 S J Cannan
Recap of recent changes:
ISO/IEC 9075 SQL:2003
• Since 1999 we have already added:
– Part 9 SQL/MED
– Part 10 SQL/OLB
– Part 13 SQL/JRT
– Addendum 1 SQL/OLAP
• SQL:2003 will be the 4th generation
– 1987, 1992, 1999
• Consists of:
– Many error corrections
(TC stands at 376 pages and growing)
– A few new features
– Part 14 SQL/XML

17 S J Cannan
New Features

• Data types
• Predicates
• Semantics
• Security
• Active Database

18 S J Cannan
New Data Types

• BIGINT
• MULTISET

19 S J Cannan
New Data Types — BIGINT

• Exact numeric, scale 0


• Precision of BIGINT  precision of INTEGER
• Must have same radix as SMALLINT and
INTEGER

20 S J Cannan
New Data Types — MULTISET

• Varying-length, unordered collections of element having specified type


• COL1 INTEGER MULTISET
• COL2 ROW( F1 BIGINT,
F2 VARHCAR(4000) )
MULTISET
• No (specified) maximum cardinality
• SQL:2003 collection types include:
– MULTISET
– ARRAY

21 S J Cannan
New Data Types — MULTISET

• INTEGER MULTISET()
– Empty multiset with integer element type (not null!)
• INTEGER MULTISET(2, 3, 5, 7)
– Integer multiset with first few primes
• INTEGER MULTISET(SELECT COL1
FROM TBL1
WHERE COL2 > 10)
– Integer multiset populated from values in column of table

22 S J Cannan
New Data Types — MULTISET

• CARDINALITY (value1)
– Type of value1 must be multiset
– Returns number of elements in value
• SET (value1)
– Type of value1 must be multiset
– Returns value1 with duplicate elements removed
• ELEMENT (value1)
– Type of value1 must be multiset
– Cardinality of value1 must be 1
– Returns the single element in value1

23 S J Cannan
New Data Types — MULTISET
• UNNEST(value1) AS corr-name
– Type of value1 must be multiset
– “Un-nests” value1 and turns the elements
into rows of a virtual table
– UNNEST MULTISET (2, 3, 5, 7) AS P

P
2
3
5
7
24 S J Cannan
New Data Types — MULTISET
• value1 MULTISET setop
quantifier value2
– setop — UNION or EXCEPT or INTERSECT
– quantifier — ALL or DISTINCT
• SELECT col1
MULTISET INTERSECT DISTINCT col2
FROM tbl1
WHERE CARDINALITY(col2) > 50
• Close analogs to ordinary set operators
UNION, EXCEPT, and INTERSECT
• However, ALL is the default quantifier

25 S J Cannan
New Data Types — MULTISET
• New aggregates:
– COLLECT — Transform the values in a group into a multiset
– FUSION — Form a union of the multisets in a group —
number of duplicates of a given value in the result is the
sum of the number of duplicates in the multisets in the rows
of the group
– INTERSECTION — Form an intersection of the multisets in
a group — number of duplicates of a given value in the
result is the minimum of the number of duplicates in the
multisets in the rows of the group

26 S J Cannan
New Predicates
• NORMALIZED
– If character string argument is in Unicode Normalization Form
C (NFC), returns true
• MEMBER
– If first argument is a member of multiset in second argument,
returns true
• SUBMULTISET
– If first argument is a submultiset of second argument, returns
true
• SET
– If argument is a set (no duplicate values), returns true

27 S J Cannan
NORMALIZED Predicate

• string_value IS [ NOT ] NORMALIZED


• Character repertoire of string_value must
be UCS
• Encourages, but does not force, normalization
to NFC

28 S J Cannan
MEMBER Predicate
• value [ NOT ] MEMBER [ OF ]
multiset
• multiset must be a multiset
• Type of value comparable to element type of
multiset
• If multiset is empty, returns false
• If value is equal to some element of multiset,
returns true
• Else, if some element of multiset is null, returns
unknown

29 S J Cannan
SUBMULTISET Predicate
• multiset1 [ NOT ] SUBMULTISET
[ OF ] multiset2
• Both multiset1 and multiset2 must be
multisets…
• …and their element types must be comparable
• If the cardinalities of multiset1 and
multiset2 are equal and if every value in
multiset1 has a corresponding value in
multiset2, then returns true

30 S J Cannan
SET Predicate

• multiset IS [ NOT ] A SET


• multiset must be a multiset
• If there are no duplicate values in multiset,
returns true
• Maximum of 1 null value in a set

31 S J Cannan
New Semantics

• MERGE statement
• OLAP: TABLESAMPLE
• Generated columns
• Identity columns and sequence generators

32 S J Cannan
New Semantics — MERGE

• If some row in the target table matches


some row in the source table according to
the specified predicate, then the row in the
subject table is updated
• If no row in the target table matches a given
row in the source table according to the
specified predicate, then the row from the
source table is inserted into the subject
table

33 S J Cannan
New Semantics — MERGE
• MERGE INTO table-name [ AS correlation ]
USING table-reference
ON search-condition
WHEN MATCHED THEN SET col = value
• MERGE INTO table-name [ AS correlation ]
USING table-reference
ON search-condition
WHEN NOT MATCHED THEN
INSERT [ ( col-list ) ]
VALUES ( val-list )
• Both MATCHED and NOT MATCHED clauses permitted
once each, in either order

34 S J Cannan
TABLESAMPLE

• New feature in the OLAP capability


• Permits evaluation of aggregates on samples
derived from database data
• Permits faster debugging when database is huge
• Two forms of sampling: BERNOULLI and
SYSTEM

35 S J Cannan
TABLESAMPLE

• TABLESAMPLE method ( percentage )


[ repeatable ]
• method is either BERNOULLI or SYSTEM
– Bernoulli: Sample table contains approximately percentage %
of the rows of the original table; the probability of a given row of
the original table appearing in the sample table is percentage
%, independently of every other row
– System: Sample table contains approximately percentage %
of the rows of the original table; the probability of a given row of
the original table appearing in the sample table is percentage
%
• repeatable acts as a random number seed for
sampling
36 S J Cannan
Generated Columns

• Ordinary columns of base tables: base columns


• Generated column: value computed from values of zero
or more base columns in same row
• CREATE TABLE EMPLOYEES (
EMP_ID INTEGER,
SALARY DECIMAL(7,2),
BONUS DECIMAL(7,2),
TOTAL_COMP GENERATED ALWAYS AS
( SALARY + BONUS ),
HR_CLERK GENERATED ALWAYS AS
( CURRENT_USER )
)

37 S J Cannan
Identity columns &
Sequence generators
• Identity column: Invented as mechanism to
allow automatic population of table keys
• Sequence generator: Invented to allow
generation of “next” value in a sequence
• Together, they allow generation of keys for
unique columns (or combinations of columns)
when required
• Application chooses: select value for identity
column or let sequence generator make choice

38 S J Cannan
Sequence generators

• Sequence generators
– Data type (exact numeric with scale zero)
– Starting value
– Increment (positive: ascending, negative:
descending)
– Minimum and maximum values
– Cycle option
– External (explicit schema object) or internal (part of
another schema object, such a column)

39 S J Cannan
External sequence generators

• CREATE SEQUENCE seqname AS type

}
START WITH value Common
INCREMENT BY value sequence
MAXVALUE value generator options
CYCLE
• Variations:
– NO CYCLE
– NO MAXVALUE, MINVALUE, NO MINVALUE
– Order of clauses can vary

40 S J Cannan
Sequence generators

• Every sequence generator has a “current base


value” — initially set to the start value
• Generate next value of a sequence generator:
NEXT VALUE FOR seqname
• Returns current base value + N * increment for
some N  0
– If computed value > MAXVALUE (or < MINVALUE)
and NO CYCLE, then raise an exception
– Otherwise, reset to MINVALUE (or MAXVALUE) and
compute new value for some N.

41 S J Cannan
Sequence generators

• INSERT INTO TBL ( COL1, COL2 )


VALUES ( 10,
NEXT VALUE FOR seqgen )
• CALL myproc ( NEXT VALUE FOR
seqgen )
• SET J = J + NEXT VALUE FOR seqgen

42 S J Cannan
Sequence generators

• ALTER SEQUENCE seqname


– RESTART WITH newbase — resets start of
computation
– Common sequence generator options — change
start value, maximum or minimum, increment, cycle
option
• DROP SEQUENCE seqname dropoption

43 S J Cannan
Identity columns &
Sequence generators
• Base tables may optionally have a single identity
column
• Data type: exact numeric, scale 0 — e.g.,
INTEGER
• Associated with internal sequence generator
– Start value
– Minimum and/or maximum value
– Increment value
– Cycle option
44 S J Cannan
Identity columns &
Sequence generators
• GENERATED ALWAYS or GENERATED BY
DEFAULT
– ALWAYS — Not allowed to UPDATE column;
INSERT requires OVERRIDING SYSTEM VALUE
– BY DEFAULT — Allowed to INSERT or UPDATE
column; if column not specified on INSERT, then
value automatically generated

45 S J Cannan
Identity columns &
Sequence generators
• CREATE TABLE employees (
EMP_ID INTEGER
GENERATED ALWAYS AS IDENTITY
START WITH 100
INCREMENT 1
MINVALUE 10
NO MAXVALUE
NO CYCLE,
SALARY DECIMAL(7,2), ...,
)
46 S J Cannan
New part – SQL/XML

• Out for FCD ballot


• Known not to be complete
• Current contents comprise infrastructure —
mappings of…
– SQL identifiers to & from XML QNames
– SQL types & values onto XML schema types &
values
– SQL tables onto XML schema types
– Publishing functions (SQL data expressed as XML)
– New built-in type for SQL: XML
47 S J Cannan
To be added:
Inserting XML into an SQL database
• Transform character string containing XML text into an
XML value
• INSERT INTO employees ( empxml )
VALUES XMLPARSE ( '<emp id="15339">
<name>Gurney Halleck</name>
<hiredate>3998-04-25</hiredate>
<dept>Security</dept></emp>'
STRIP WHITESPACE )
• No concrete proposal yet (still working out issues)

48 S J Cannan
To be added:
Inserting XML into an SQL database

• XMLPARSE does not validate


• But it does:
– Syntax check all in-line DTDs
– Process entity references defined in such DTDs
– Apply default values defined in such DTDs
• Whitespace handling not finalized:
– STRIP and
– PRESERVE are possible choices

49 S J Cannan
To be added:
Validating XML in an SQL context

• Validate an XML value against an XML schema


(possibly a DTD)
• INSERT INTO employees ( empxml )
VALUES XMLVALIDATE ( XMLPARSE ( '<emp
id="15339"><name>Gurney
Halleck</name><hiredate>3998-04-25
</hiredate><dept>Security</dept>
</emp>' STRIP WHITESPACE ) )
• No concrete proposal yet (still working out issues)

50 S J Cannan
To be added:
Validating XML in an SQL context

• Many remaining problems:


– How to reference DTDs and schemas
– Security problems of referencing arbitrary DTDs and
schemas
– Namespace considerations
• xsi:schemaLocation and
xsi:noNameSpaceSchemaLocation might
identify schemas outside the database — a
solution might involve caching schemas inside
the database

51 S J Cannan
To be added:
Retrieving XML from an SQL database

• Transform an XML value into a character string


containing XML text
• SELECT XMLSERIALIZE ( e.empxml )
INTO :hostvar
FROM emp AS e
WHERE ...
• No concrete proposal yet (still working out issues)

52 S J Cannan
To be added:
Retrieving XML from an SQL database
• Numerous problems not yet resolved:
– Precise SQL type of the function result
(CHARACTER, CHARACTER VARYING,
CHARACTER LARGE OBJECT)
– Actual character encoding (UTF8, UTF16, other) —
and SQL’s specified character set
– Specified character encoding in prologue
– DTD or schema inference? Creation?

53 S J Cannan
Summary
• ISO/IEC 9075
– Consolidation of changes since 1999
– Correction of errors
– New Part (SQL/XML)

• ISO/IEC 20206
– Area of concern
– Looking for:
– National Body participation
– Editor(s)

54 S J Cannan
Questions?

55 S J Cannan

You might also like