Professional Documents
Culture Documents
About me
Sr Manager of DB engineering at PayPal
Using Oracle since 1998 http://sai-oracle.blogspot.com
Scope
No OEM demos
Tested on Oracle 10.2 and 11.2 Some of the observations here can be incorrect or may
Introduction
Active session history (ASH) introduced in 10g
Need diagnostic pack license Integrated with database kernel and awr
V$session
Single most important performance view
Active session count is the top KPI V$session_wait is included in v$session
v$session (contd)
Fixed_table_sequence is most under used column
Max(fixed_table_sequence) is total db calls made Incremental values for a session indicate activity
ASH overview
Snapshot of active sessions at one second interval
Mostly same information as in v$session Maintains circular buffer of up to 254MB
ASH architecture
MMNL process capture 1 second interval snapshot of
active sessions ASH is join of x$kewash and x$ash X$kewash has one record for every snashot X$kewash also has sample count and length X$kewash is used to locate snapshot address in x$ash Is_awr_sample column is indexed
x$ash ASH is indexed on sample_id and is_awr_sample cols ASH buffer size is typically 2M per cpu but no more than 254M or 5% of SGA ASH sample record can be self inconsistent due to lack of read consistency in underlying x$(v$) fixed tables New records overwrite older records in the ASH circular buffer
AWR
Each AWR snapshot flush ASH into
DBA_HIST_ACTIVE_SESS_HISTORY Only one out of every 10 ASH samples are preserved ASH emergency flush at 2/3 of buffer full ASH snapshots in DBA_HIST_ASH_SNAPSHOT Ashrpt.sql script provided under rdbms/admin ADDM rely on ASH data
ASH parameters
_ash_enable for enable/disable ASH
_ash_disk_write_enable for AWR writes _ash_disk_filter_ratio for sample records to write
data (can only be accessed after parameter set to true) Set _ash_disk_filter_ratio to 1 during any database incident Dont change _ash_sample_all, as inactive session data is not relevant and it generates more data Set _ash_sampling_interval to lower values (10 ms) during RAT or snapshot standby testing (not dynamic parameter, require instance restart)
Sample data
Sample id and time
Session details Sql details
Pl/sql details
Wait event information Blocking session details Client details Time model information
Session details
IS_AWR_SAMPLE SESSION_ID SESSION_SERIAL# SESSION_TYPE USER_ID SESSION_STATE XID REMOTE_INSTANCE# PGA_ALLOCATED TEMP_SPACE_ALLOCATED TOP_LEVEL_CALL# TOP_LEVEL_CALL_NAME VARCHAR2(1) NUMBER NUMBER VARCHAR2(10) NUMBER VARCHAR2(7) RAW(8) NUMBER NUMBER NUMBER NUMBER VARCHAR2(64)
comparing session data across samples XID indicate whether session is in transaction Percentage of sample sqls not having XID indicate purely read only traffic (can be target for Active DG) Pga_allocated and temp_space_allocated helpful in determining big sort operations Top_level_call_name indicate db calls like exec, fetch, commit, rollback, etc.
Sql details
SQL_ID IS_SQLID_CURRENT SQL_CHILD_NUMBER SQL_OPCODE SQL_OPNAME TOP_LEVEL_SQL_ID TOP_LEVEL_SQL_OPCODE SQL_PLAN_HASH_VALUE SQL_PLAN_LINE_ID SQL_PLAN_OPERATION SQL_PLAN_OPTIONS SQL_EXEC_ID SQL_EXEC_START VARCHAR2(13) VARCHAR2(1) NUMBER NUMBER VARCHAR2(64) VARCHAR2(13) NUMBER NUMBER NUMBER VARCHAR2(30) VARCHAR2(30) NUMBER DATE
instance Same sql_exec_id across snapshots indicate session executing same sql Sql plan details in ASH helps determine expensive sqls Execution state detail in columns (in_bind, in_parse, in_sql_execution, in_sequence_load) Top_level_sql_id helpful for pl/sql or recursive sql
Plsql details
PLSQL_ENTRY_OBJECT_ID PLSQL_ENTRY_SUBPROGRAM_ID PLSQL_OBJECT_ID PLSQL_SUBPROGRAM_ID IN_PLSQL_EXECUTION IN_PLSQL_RPC IN_PLSQL_COMPILATION NUMBER NUMBER NUMBER NUMBER VARCHAR2(1) VARCHAR2(1) VARCHAR2(1)
in the pl/sql procedure Plsql_object_id is the currently executing procedure Query dba_procedures to map subprogram_id to sub program name Demo
Waitevents
EVENT EVENT_ID EVENT# SEQ# P1TEXT P1 P2TEXT P2 P3TEXT P3 WAIT_CLASS WAIT_CLASS_ID WAIT_TIME TIME_WAITED VARCHAR2(64) NUMBER NUMBER NUMBER VARCHAR2(64) NUMBER VARCHAR2(64) NUMBER VARCHAR2(64) NUMBER VARCHAR2(64) NUMBER NUMBER NUMBER
Waitevents (contd)
Event null in ASH (unlike in v$session) when session is
on CPU P1, p2, p3 populated even when session is on CPU (for the previous event) Wait class is higher level dimension to group data Wait_time is irrevelant in ASH (unlike in v$session) Seq# is the sequence number of each wait event in a given session
Waitevents (contd)
Seq# rolls over after reaching 64K in a given session
Sequential samples with same seq# indicate session in
the same wait event (extremely useful for RCA) Event is not written (event_id is enough) to AWR table wrh$_active_session_history to save space Event_id and event# are synonymous (event_id is usually same across DB versions) All the wait time metrics reported in micro seconds
Time_waited
Only populated when session is done waiting on a
given event, set to zero for remaining previous samples ASH realizes when session done waiting for a given event, go back and update time_waited in the most recent sample for that wait_event with same seq# Time_waited not be populated if session exits before ASH can capture it for the previous sample Query max and avg(time_waited) for samples it is set Demo
Blocking session
BLOCKING_SESSION_STATUS BLOCKING_SESSION BLOCKING_SESSION_SERIAL# BLOCKING_INST_ID BLOCKING_HANGCHAIN_INFO CURRENT_OBJ# CURRENT_FILE# CURRENT_BLOCK# CURRENT_ROW# VARCHAR2(11) NUMBER NUMBER NUMBER VARCHAR2(1) NUMBER NUMBER NUMBER NUMBER
busy waits, latch contention, library cache mutex contention, etc. (More useful than v$lock) Current_obj#/file/block/row can be used to determine row lock contention Current_obj#/file/block is populated for any I/O related wait event Blocking_hangchain_info indicate multiple levels of lock contention
Client details
SERVICE_HASH PROGRAM MODULE ACTION CLIENT_ID MACHINE PORT ECID NUMBER VARCHAR2(48) VARCHAR2(64) VARCHAR2(64) VARCHAR2(64) VARCHAR2(64) NUMBER VARCHAR2(64)
dimensions Finding resource utilization by program, module, service, etc Useful for troubleshooting network related issues Having this data in awr base tables would give more insight into changes being to middle tier
Time model
TM_DELTA_TIME TM_DELTA_CPU_TIME TM_DELTA_DB_TIME DELTA_TIME DELTA_READ_IO_REQUESTS DELTA_WRITE_IO_REQUESTS DELTA_READ_IO_BYTES DELTA_WRITE_IO_BYTES DELTA_INTERCONNECT_IO_BYTE NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER
tm_delta_time Cpu time + db time can be greater than tm_delta_time tm_delta_time reported since last time session was sampled Useful for finding top cpu/db time sessions in a given interval Demo
reported over delta_time Delta_time is different from tm_delta_time Delta_time reported since last time session was sampled Useful for finding top I/O sessions during any interval Demo
to identify contention caused sql with same binds) Populating current_obj#/file#/block# for all logical reads when session is on cpu Adding wait_time_micro from v$session and populating it just like in v$session across each sample Populating last event when session is on cpu Record redo usage per sql (ER# 8646714) More awr samples for session outliers (ER# 8669416) Reporting plsql line id being executed
DIY ASH
Do it yourself ASH by copying v$session active session
data every second Sql plan information will be unavailable Not going to be light weight process Time model information will be missing Incomplete wait time metrics Overall, a good alternate option if diagnostic pack license not purchased
busy waits, latch contention, etc) Fine grained resource utilization metrics report Finding effectiveness of pointing read only traffic to active dataguard