Pages

Wednesday, 11 January 2012

JOIN TECHNIQUES CONCEPTS IN ORACLE


This article is for the DBA's who are not familiar with the basics of join techniques like HASH Join, Nested Loops and SORT MERGE. Because any DBA who doesnt know how a join is happening in the background cannot work with the join order so understanding the concept is more important first. As it is an article on to explain the concepts so we will not go deep into it as of now. On this cold night here in New Delhi with some hot sweet corn soup on the table, oh im sorry let me come to the point now...

HASH JOINS

Lets start with hash join. As we know that the optimizer looks into the access path , join method and then the join order respectively, so yes hash join is a join method. The database uses hash joins to join large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows. This method is best available when the hash table is able to fit in memory. This method is best when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

The optimizer shall use hash join when you are joining large data sets or large fraction of a small table must be joined. Use the USE_HASH hint to instruct the optimizer to use a hash join when joining two tables together.

However, in a production database with very large tables, it is not always easy to get your database to invoke hash joins without increasing the RAM regions that control hash joins.  For large tables, hash joins requires lots of RAM. Make sure that the smaller table is the driving table (the first table in the from clause when using the ordered hint). This is because a hash join builds the memory array using the driving table.

Make sure that you have the proper settings for optimizer_index_cost_adj and optimizer_max_permutations (Oracle9i and earlier) to limit the number of table join evaluations. Also check your values for hash_area_size and hash_multiblock_io_count. If using pga_aggregate_target, remember that it will not use more than 5% of the space for any hash join and you may need to override it.

SORT MERGE JOINS

Sort merge joins can join rows from two independent sources. Hash joins generally perform better than sort merge joins. However, sort merge joins can perform better than hash joins if the row sources are already sorted and a sort operation will not at all be performed. Do keep in mind that there is no concept of driving table in sort merge join. As this post is only for basic understanding, the final thing i will say it is that this technique has two steps := SORT is done for both the inputs and then both the sorted lists are MERGED together. If the input is sorted by the join column, then a sort join operation is not performed for that row source. However, a sort merge join always creates a positionable sort buffer for the right side of the join so that it can seek back to the last match in the case where duplicate join key values come out of the left side of the join. Use USE_MERGE hint to instruct the optimizer to go for sort merge join.

Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an
equality condition.

NESTED LOOP JOINS

This type of join should always be used when small subsets of data is being joined by the database and also if the join condition is an efficient way of accessing the second table. Hold on here... By second table here i mean is the driven table. So there is a driving table which drives the driven table in the nested loops offcourse. It is very important that inner table which is driven should have dependent data on the driving table otherwise if the access path of driving and driven table is independent then for every iteration for the outer loop the same rows will be retrived degrading performance considerably. It works something like this, well not exactly as a cartesion product, but yes if access path is independent then for every row of outer loop an inner loop row is evaluated which hits the performance badly if joining large data sets.

Oracle Database 11g introduces a new implementation for nested loop joins to reduce overall latency for physical I/O. When an index or a table block is not in the buffer cache and is needed to process the join, a physical I/O is required. Oracle Database 11g can batch multiple physical I/O requests and process them using a vector I/O instead of processing them one at a time.

Well the optimizer will always work on the stats which it has as it is a mathematical function and it works on numbers. What im trying to say is that lets say for example A larger multiblock read count is likely to decrease the cost for a sort merge join in relation to a nested loop join. If the database can read a large number of sequential blocks from disk in a single I/O, then an index on the inner table for the nested loop join is less likely to improve performance over a full table scan. The multiblock read count is specified by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.

Have a nice day ahead DBA's ..... My next post shall be more on how to improve the cost of joins as it is very important task for a DBA specially when it comes to determine the join order.

Related links:

HOME


TO SEE INTERNALS OF BACKGROUND PROCESSES

To really see what DBWR, LGWR, and other backgound processes are doing set event 10046 level 12 for the background processes. DO NOT do this on a production database.

Find the PID
SELECT * FROM v$process WHERE background is not null;
ORADEBUG SETORAPID 5
ORADEBUG EVENT 10046 trace name context forever, level 12
Turn off tracing with:
oradebug event 10046 trace name context off
The trace file will be in the DAIG_TRACE directory for all processes. The SET EVENT command can have serious consequences in a database. Events should NOT be set except at the direction of Oracle Support.
In Oracle Database 11g, all trace files are located in the Automatic Diagnostics Repository, a set of directories. The default location for traces file is $ORACLE_BASE/diag/rdbms/<db_name>/<instance_name>/trace.


I will post another article very soon How to read SQL trace file.

---------------------------------------------------------------------------

Related links:


HOME

Tuesday, 10 January 2012

A DIFFERENT CLOSE LOOK AT ASH

Its good to know about ASH basic usage, but there are some things which every DBA must need to know.

Let me first compare ASH with SQLTRACE

Comparison of ASH with SQL TRACE

ASH and SQL*Trace are not the same thing, but both are valuable tools for finding out about where processes spend time. SQL*Trace (or event 10046 as we used to call it) has been my weapon of choice for solving performance issues for a very long time, and it is extremely effective, and there is still a place for it.

There are difficulties with using SQL trace, especially in a production environment.

Firstly, it does have a run time overhead. You could afford to trace a single process, but you certainly couldn’t trace the entire database.

1) You have to work with trace in a reactive way. You will probably not already be tracing a process when you experience a performance problem, so you need to run the process again and reproduce the poor performance with trace.

2) Trace will tell you if a session is blocked waiting on a lock. However, it will not tell you who is blocking you. ASH will do this (although there are limitations).

3) A trace file records everything that happens in a session, whereas ASH data samples the session every seconds. Short-lived events will be missed, so the data has to be handled statistically.

4) There are problems with both approaches if you have the kind of application where you have lots of different SQL statements because the application uses literal values rather than bind variables (and cursor sharing is EXACT).

5) Oracle’s TKPROF trace file profiler cannot aggregate these statements, but I have found another called ORASRP  www.oracledba.ru/orasrp) that can. With ASH, you will see different SQL_IDs, but it can be effective to group statements with the same execution plan.

6) You may have trouble finding the SQL text in the SGA (or via the DBMS_XPLAN package) because it has already been aged out of the library cache. You may have similar problems with historical ASH data because the statement had been aged out when the AWR snapshot was taken.

7) A trace file, with STATISTICS_LEVEL set to ALL, will give you timings for each operation in the execution plan. So, you can see where in the execution plan the time was spent. ASH will only tell you how long the whole statement takes to execute, and how long was spent on which wait event.

IF you see my blog http://karandba.blogspot.com/2011/11/actual-use-of-active-session-history.html
 you will also come to know how to find top SQL's spent more on CPU/WAIT/I/O






SERVICES CONFUSIONS CLEARED


ORACLE SERVICES

Well i was buzzed on my email by one of my close client, and he asks what is actually a service in RAC, so i decided to post on this. Well now lets get to the point, to manage workloads efficiently we define services to a particular application or to a subset of an application's operations. You can also group work by type under services. For example, online users can be a service while batch processing can be another and reporting can be yet another service type.

When you define a service, you specify which instances support that service. These are known as the PREFERRED instances. You can also define other instances to support a service if the service's preferred instance fails. These are known as AVAILABLE instances.

When you specify PREFERRED instances, you are specifying the number of instances on which a service will normally run. The Oracle Clusterware attempts to ensure that the service always runs on the number of nodes for which you have configured the service. Afterwards, due to either instance failure or planned service relocations, a service may be running on an AVAILABLE instance.

Services are integrated with Resource Manager which enables you to restrict the resources that are used by a service within an instance. The Resource Manager enables you to map a consumer group to a service so that users who connect with the service are members of the specified consumer group. Also, the Automatic Workload Repository (AWR) enables you to monitor performance by service.

Oracle Net Services provides connection load balancing to enable you to spread user connections across all of the instances that are supporting a service. For each service, you can define the method that you want the listener to use for load balancing by setting the connection load balancing goal, CLB_GOAL. You can also specify a single TAF policy for all users of a service by defining the FAILOVER_METHOD, FAILOVER_TYPE, and so on. Oracle RAC uses FAN to notify applications about configuration changes and the current service level that is provided by each instance where the service is enabled.

Oracle Net Services provides the ability to balance client connections across the instances in an Oracle RAC configuration. There are two types of load balancing that you can implement: client-side and server-side load balancing. Client-side load balancing balances the connection requests across the Listeners. With server-side load balancing, the Listener directs a connection request to the best instance currently providing the service by using the load balancing advisory.

In an Oracle RAC database, client connections should use both types of connection load balancing.

FAN, FCF, and the load balancing advisory depend on an accurate connection load balancing configuration that includes setting the connection load balancing goal for the service. You can use a goal of either long or short for connection load balancing. These goals have the following characteristics:

Long := Use the LONG connection load balancing method for applications that have long-lived connections. This is typical for connection pools and SQL*Forms sessions. Long is the default connection load balancing goal. The following is an example of modifying a service, POSTMAN, with the PL/SQL DBMS_SERVICE package and the CLB_GOAL_LONG package constant to define the connection load balancing goal for long-lived sessions:

EXECUTE DBMS_SERVICE.MODIFY_SERVICE (service_name => 'POSTMAN', clb_goal => DBMS_SERVICE.CLB_GOAL_LONG);

Short := Use the SHORT connection load balancing method for applications that have short-lived connections. The following example modifies the service known as ORDER, using the DBMS_SERVICE.CLB_GOAL_SHORT package constant to set the goal to short:

EXECUTE DBMS_SERVICE.MODIFY_SERVICE (service_name => 'ORDER', CLB_GOAL => DBMS_SERVICE.CLB_GOAL_SHORT);



To make things simple let me explain you what actually FAN is :=

FAN (FAST APPLICATION NOTIFICATION) := It is nothing but a notification mechanism which RAC uses to quickly alert applications about configuration and workload service level changes.

On the other hand LBA (Load balancing advisory) := For LBA the concept is simple, it makes recommendations to applications about where to direct application requests to obtain the best service based on the policy that you have defined for that service.

Finally FCF (Fast Connection Failover) is the ability of oracle clients to provide a rapid failover of connections by subscribing to FAN events.

One last thing but not the least := There are 2 types of service level goals := 1) Service Time 2) Throughput

Service Time attempts to direct work requests to instances according to response time. Because LBA stats are based on elapsed time for work done in the service plus available bandwidth to the service.

Throughput attempts to direct work requests according to throughput. LBA stats are based on the rate that work is completed in the service plus available bandwidth to the service

In this post im not gonna deal more about RAC concepts as the question was only for a service. But i shall be posting more on RAC in this blog as i love RAC infact every DBA loves RAC.

-----------------------------------------------------------------------------------------------------------

Related links:
11.2 RAC GOODIES AND SERVERPOOLS

11G R2 RAC PROBLEMS AND SOLUTIONS

CLONE DATABASE HOME IN 11GR2 RAC

NIC BONDING IN 11G R2 RAC

PREREQUISITE TO CONFIGURE GNS AND GPNP IN 11G R2 RAC
  
SETUP 10G RAC ON YOUR LAPTOP

SETUP AND PLAY WITH 11G R2 RAC AT YOUR HOME PC

11g R2 RAC - ADD INSTANCE MANUALLY


11gR2 RAC : USE RCONFIG TO CONVERT NON RAC DATABASE TO RAC DATABASE
http://koenigocm.blogspot.in/search/label/11gR2%20RAC%20%3A%20USE%20RCONFIG%20TO%20CONVERT%20NON%20RAC%20DATABASE%20TO%20%20RAC%20DATABASE

11g R2 RAC : TRACING SERVICES IN A RAC DATABASE
 http://koenigocm.blogspot.in/search/label/11g%20R2%20RAC%20%3A%20TRACING%20SERVICES%20%20IN%20A%20RAC%20DATABASE

11G R2 RAC: INSTANCE RECOVERY
 http://koenigocm.blogspot.in/search/label/11G%20R2%20RAC%3A%20INSTANCE%20RECOVERY

11g R2 RAC: MANAGING THIRD PARTY APPLICATIONS USING APPLICATION VIP
 http://koenigocm.blogspot.in/search/label/11g%20R2%20RAC%3A%20MANAGING%20THIRD%20PARTY%20APPLICATIONS%20USING%20APPLICATION%20VIP









TKPROF INTERPRETATION

Tkprof is an executable that 'parses' Oracle trace files to produce more
readable output. Remember that all the information in TkProf is available
from the base trace file.

If you have a system that is performing badly, a good way to identify 
problem SQL statements is to trace a typical user session and then use TkProf
to format the output using the sort functions on the tkprof command 
line. 

There are a huge number of sort options that can be accessed by simply 
typing 'TkProf' at the command prompt. A useful starting point is the 
'fchela' sort option which orders the output by elapsed time fetching (remember
that timing information is only available with timed_statistics set to true 
at the database level). The resultant .prf file will contain the most time 
consuming SQL statement at the start of the file.

Another useful parameter is sys. This can be used to prevent SQL statements 
run as user SYS from being displayed. This can make the output file much 
shorter and easier to manage.

Again, remember to always check that the TIMED_STATISTICS parameter is set to TRUE
as otherwise no time based comparisons can be made.


Interpreting TkProf Output Guidelines
=====================================

Column Meanings
===============

call :   Statisics for each cursor's activity are divided in to 3 areas: 

           Parse:   statisitics from parsing the cursor. This 
                    includes information for plan generation etc.
           Execute: statisitics for the exection phase of a cursor
           Fetch  : statistics for actually fetching the rows

count :  number of times we have performed a particular activity on this 
         particular cursor

cpu:     cpu time used by this cursor

elapsed: elapsed time for this cursor

disk:    This indicates the number of blocks read from disk. Generally you want
         to see blocks being read from the buffer cache rather than disk.
 
query :  This column is incremented if a buffer is read in Consistent mode.
         A Consistent mode buffer is one that has been generated to give
         a consistent read snapshot for a long running transaction. The buffer
         actually contains this status in its header.
 
current: This column is incremented if a buffer found in the buffer cache 
         that is new enough for the current transaction and is in current mode
         (and it is not a CR buffer). This applies to buffers that have been 
         read in to the cache as well as buffers that already exist in the 
         cache in current mode.

rows:    Rows retrieved by this step 


Explain plan 
============

Firstly, we advise that the autotrace feature of SQL*Plus be used 
on statements rather than using TkProf mainly because the TkProf output 
can be confusing with regard to whether the Rule or Cost Based optimizer 
has been used. 

Because TkProf explain plan does not show any costs or statistics, it is 
sometimes not possible to tell definitively which optimizer has been used.

That said, the following output from Tkprof explain plan is useful.
The Rows column next to the explain plan output shows the number of 
rows processed by that particular step. The information is gathered from the 
STAT lines for each cursor in the raw trace output. 

Remember that if the cursor is not closed then you will not see any output. 
Setting SQL_TRACE to false DOES NOT close PL/SQL child cursors. 
Cursors are closed in SQL*Plus immediately after execution.


TkProf Examples and Discussion
==============================

Examples:

Step 1 - Look at the totals at the end of the tkprof output
===========================================================

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
 
| call    | count |  cpu | elapsed |    disk |  query | current |   rows |
|---------|-------|------|---------|---------|--------|---------|--------|
| Parse   | [A] 7 | 1.87 |    4.53 |     385 |[G] 553 |      22 |      0 |
| Execute | [E] 7 | 0.03 |    0.11 | [P]   0 |[C]   0 | [D]   0 | [F]  0 |
| Fetch   | [E] 6 | 1.39 |    4.21 | [P] 128 |[C] 820 | [D]   3 | [F] 20 |
--------------------------------------------------------------------------
 
Misses in library cache during parse: 5
Misses in library cache during execute: 1
 
     8  user  SQL statements in session.
    12  internal SQL statements in session.
[B] 54  SQL statements in session.
     3  statements EXPLAINed in this session.

1. Compare [A] & [B] to spot over parsing. In this case we 
   have 7 parses for 54 statements which is ok.

2. You can use [P], [C] & [D] to determine the hit ratio.

Hit Ratio is logical reads/physical reads:

Logical Reads = Consistent Gets + DB Block Gets
Logical Reads = query           + current
Logical Reads = Sum[C]          + Sum[D]
Logical Reads = 0+820           + 0+3
Logical Reads = 820             + 3
Logical Reads = 823

Hit Ratio = 1 - (Physical Reads /  Logical Reads)
Hit Ratio = 1 - (Sum[P]         /   Logical Reads)
Hit Ratio = 1 - (128            /   823)
Hit Ratio = 1 - (0.16)
Hit Ratio = 0.84 or 84%

3. We want fetches to be less than the number of rows as this 
   will mean we have done less work (array fetching). 
   To see this we can compare [E] and [F].

[E] =  6 = Number of Fetches
[F] = 20 = Number of Rows

So we are doing 6 fetches to retrieve 20 rows - not too bad.
If arrayfetching was configured then rows could be retrieved with 
less fetches.

Remember that an extra fetch will be done at the end to check that
the end of fetch has been reached.

4. [G] Shows reads on the Dictionary cache for the statements.

   - this should not be a problem on Oracle7. 
   In this case we have done 553 reads from the 
   Library cache.

STEP 2 - Examine statements using high resource
===============================================

update ...
where  ...

| call    | count | cpu | elapsed | disk |   query | current |   rows |
|---------|-------|-----|---------|------|---------|---------|--------|
| Parse   |     1 |   7 |     122 |    0 |       0 |       0 |      0 |
| Execute |     1 |  75 |     461 |    5 | [H] 297 |   [I] 3 | [J]  1 |
| Fetch   |     0 |   0 |       0 |    0 |       0 |       0 |      0 |
-----------------------------------------------------------------------

[H] shows that this query is visiting 297 blocks to find the rows to 
    update
[I] shows that only 3 blocks are visited performing the update
[J] shows that only 1 row is updated.

297 block to update 1 rows is a lot. 
Possibly there is an index missing?


STEP 3 - Look for over parsing
==============================

select ...

| call    | count |     cpu | elapsed | disk |  query | current |  rows |
|---------|-------|---------|---------|------|--------|---------|-------|
| Parse   | [M] 2 | [N] 221 |     329 |    0 |     45 |       0 |     0 |
| Execute | [O] 3 | [P]   9 |      17 |    0 |      0 |       0 |     0 |
| Fetch   |     3 |       6 |       8 |    0 | [L]  4 |       0 | [K] 1 |
-------------------------------------------------------------------------

Misses in library cache during parse: 2 [Q]

[K] is shows that the query has returned 1 row.
[L] shows that we had to read 4 blocks to get this row back.
This is fine.

[M] show that we are parsing the statement twice - this is not desirable
    especially as the cpu usage is high [N] in comparison to the execute 
    figures : [O] & [P]. [Q] shows that these parses are hard parses. If
    [Q] was 1 then the statement would have had 1 hard parse followed by
    a soft parse (which just looks up the already parsed detail in the 
    library cache). See Note:32895.1 for more details.


This is not a particularly bad example since the query has only been 
executed a few times. However excessive parsing should be avoided as far 
as possible by:

o Ensuring that code is shared:

   - use bind variables
   - make shared pool large enough to hold query definitions in memory
     long enough to be reused.


NOTE:
====
We need to ensure, we do not parse a >=9i version 10046 raw trace with an <= 8i tkprof utility.
This is because the timings in raw trace file for <=8i versions and >=9i versions are in 
centiseconds and microseconds respectively.

EXAMPLE TOP SQL QUERIES FROM SQLAREA

EXAMPLE V$SQLAREA QUERIES
-------------------------
 
Top 10 by Buffer Gets:

set linesize 100
set pagesize 100
SELECT * FROM
(SELECT substr(sql_text,1,40) sql,
        buffer_gets, executions, buffer_gets/executions "Gets/Exec",
        hash_value,address
   FROM V$SQLAREA
  WHERE buffer_gets > 10000
 ORDER BY buffer_gets DESC)
WHERE rownum <= 10
;

Top 10 by Physical Reads:

set linesize 100
set pagesize 100
SELECT * FROM
(SELECT substr(sql_text,1,40) sql,
        disk_reads, executions, disk_reads/executions "Reads/Exec",
        hash_value,address
   FROM V$SQLAREA
  WHERE disk_reads > 1000
 ORDER BY disk_reads DESC)
WHERE rownum <= 10
;

Top 10 by Executions:

set linesize 100
set pagesize 100
SELECT * FROM
(SELECT substr(sql_text,1,40) sql,
        executions, rows_processed, rows_processed/executions "Rows/Exec",
        hash_value,address
   FROM V$SQLAREA
  WHERE executions > 100
 ORDER BY executions DESC)
WHERE rownum <= 10
;

Top 10 by Parse Calls:

set linesize 100
set pagesize 100
SELECT * FROM
(SELECT substr(sql_text,1,40) sql,
        parse_calls, executions, hash_value,address
   FROM V$SQLAREA
  WHERE parse_calls > 1000
 ORDER BY parse_calls DESC)
WHERE rownum <= 10
;

Top 10 by Sharable Memory:

set linesize 100
set pagesize 100
SELECT * FROM 
(SELECT substr(sql_text,1,40) sql,
        sharable_mem, executions, hash_value,address
   FROM V$SQLAREA
  WHERE sharable_mem > 1048576
 ORDER BY sharable_mem DESC)
WHERE rownum <= 10
;

Top 10 by Version Count:

set linesize 100
set pagesize 100
SELECT * FROM 
(SELECT substr(sql_text,1,40) sql,
        version_count, executions, hash_value,address
   FROM V$SQLAREA
  WHERE version_count > 20
 ORDER BY version_count DESC)
WHERE rownum <= 10
;
-------------------------------------------------------------------------------------------

Related links :

LATCHES LOCKS PINS AND MUTEXES
MUTEXES AND LATCHES
PARENT AND CHILD CURSORS IN ORACLE
QUICK OVERVIEW OF SHARED POOL TUNING
SHARED POOL ARCHITECTURE

SQL PLAN MANAGEMENT – ORACLE 11g : Part-1
 http://koenigocm.blogspot.in/search/label/SQL%20PLAN%20MANAGEMENT%E2%80%93%20ORACLE%20%2011g%20%3A%20Part-1


SQL PLAN MANAGEMENT – ORACLE 11g : Part-2

 LIBRARY CACHE LOCK/PIN DEMONSTRATED
 http://koenigocm.blogspot.in/search/label/LIBRARY%20CACHE%20LOCK%2FPIN%20DEMONSTRATED

SOME TIPS ABOUT ASH

Oracle collects Active Session History (ASH) statistics (mostly wait statistics for different events) for all active sessions every second from v$session and stores them in a circular FIFO buffer in the SGA. ASH records are very recent session history within the last 5-10 mins. The MMNL (Manageability Monitor light - shows up as "Manageability Monitor light 2" process) process, if the buffer fills up before the AWR flushes (by default every hour) the MMNL process will flush the data to disk (data stored in dba_hist_active_session_history).
ASH resides in the SGAand it’s size is calculated by the lesser of:

  • total # of cpu x 2MB memory
  • 5% of shared pool
So on a 16 cpu server with a shared pool of 500MB

  • Ash desired size 16 x 2MB = 32MB
  • 5% of 500MB = 25MB (this is the lower so ASH will be 25MB)
ASH collects the following:

  • SQL_ID
  • SID
  • Client ID, Service ID
  • Program, Module, Action
  • Object, File, Block
  • Wait event number, actual wait time (if session is waiting) 
Some hidden parameters to control the behaviour of ASH-

To enable or disable ASH data collection-
_ash_enable=TRUE


_ash_sampling_interval = 1000 (milliseconds)
This is where the one second sampling is specified. I'm tempted to try to reduce this to a silly level and watch the server fall on it's back-side.

_ash_sample_all = FALSE

Oooh, this one would be fun! Why not sample all sessions include those that aren't Active (Hint, you would have an enormous growth in the volume of data generated so, again, I'm kidding)

_ash_disk_write_enable = TRUE

Whether samples are flushed to the workload repository or not. Might initially seem a good idea to save space or improve performance a little but (just a guess) I think it would confuse the hell out of ADDM when it couldn't find any ASH samples to correlate with the AWR information.

_ash_disk_filter_ratio = 10