Pages

Showing posts with label BUFFER CACHE. Show all posts
Showing posts with label BUFFER CACHE. Show all posts

Monday, 2 July 2012

ORACLE DATABASE BUFFER CACHE




                      BUFEFR CACHE ARCHITECTURE



Database buffer cache is a memory structure in SGA. Buffers in DB cache are used to hold blocks of data read from the data files.Each buffer is sized to hold one database block. Oracle allows for a total of 8 different caches :

- 5 caches to allow different block sizes
- 3 caches to allow different behaviour

BUFFERS TO ALLOW DIFFRENT BLOCK SIZES :

- DEFAULT CACHE : To hold blocks of size DB_BLOCK_SIZE. Its size is determined by the parameter DB_CACHE_SIZE

- DB NK CACHE : To hold blocks of size = nK where n = 2,4,8,16,32 where n is not equal the size of the default block (DB_BLOCK_SIZE).  Its size is determined by the parameter DB_NK_CACHE_SIZE

BUFFERS TO ALLOW DIFFRENT BEHAVIOUR :

- DEFAULT CACHE : Cache where blocks of those objects are stored which are not assigned to keep or recycle cache or assigned default cache.

- KEEP CACHE : Cache to hold blocks of small sized objects for a longer time.Its size is determined by the parameter DB_KEEP_CACHE_SIZE.

- RECYCLE CACHE : Cache designed to quickly age out blocks of rarely accessed large sized objects. Its size is determined by the parameter DB_RECYCLE_CACHE_SIZE.

Each buffer in any of the above caches can be in any of the following states :

- FREE : Free buffers can be further of two kinds:
         - Empty or unused
         - Buffers which are identical to the blocks on disk because
           . they have not been dirtied or
           . they were dirtied and have been written to disk

- PINNED : A buffer being accessed (read/written) by a process.

- DIRTY : A buffer which holds a block that has been changed by a server process and hence needs to be writted to the disk before it can be reused.

Buffer headers store metadata about contents of the buffers and are used to manage the cache.The metadata includes information like Data block address (DBA) of the block, type of block (data,undo), touch count, pointer to the buffer, pointers to previous/next buffer in cache bufferchain etc. There is one row in V$BH for every buffer. Buffer headers keep track of various attributes and state of the buffers in buffer cache. The blocks are placed in the buffer cache in the form of linked lists (cache buffer chains) in different hash buckets. While searching for blocks in buffer cache oracle does not look at them buffer by buffer as it will bring down the performance drastically. Let us take the example of a library where we are looking for a blook. If we search the whole library book by book it will take very long but if the books are placed subject wise, we first need to go the rack containing books of the desired subject and then we search for the book we want. Our search will be over rather quickly. In the same manner, buffers are placed in different hash buckets depending upon the hash value of the data block address (DBA). While searching for a particular block, Oracle first hashes the DBA of the desired block, locates the hash bucket containing the block and then searches the blocks in that hash bucket. Different DBAs can hash to the same value (similar to modulus function i.e mod (5,4) = mod (9,4) = mod(13,4)....). Hence, one hash bucket can contain blocks with different DBA's. Each hash chain is protected by a latch. Processes need to get the relevant latch to allow them to scan a hash chain for a buffer so that while searching, the linked list underneath does not change. These latches are called cache buffer chain latches CBC latches). One CBC latch protects multiple hash buckets.

   To implement read consistency, a hash chain can also contain multiple versions of the  same block (DBA). For each DBA in a bucket, there is at least one current block which contains all the committed/uncommitted changes in different rows in that block made by different users. When a user requests a row from that block, he should receive a read consistent copy of the block i.e. the copy should have all the changes (committed / uncommitted)  made by him and only the committed changes made by other users. If the desired read consistent clone is already there, it is read and the user gets to see the results. If the read consistent clone is not there, a copy of the current block is made and marked as clone. Changes are rolled back on the clone after reading undo and then the user os displayed the results. In this case, No. of consistent gets are higher. If a block is rapidly changed and is rapidly queried, multiple CR copies can exist in buffer cache. Such a  block is called a hot block. All CR copies have the same DBA and hence go to the same hash bucket (CBC). The CBC gets longer and hence takes longer to scan. Multiple users trying to scan the CBC chain may have to wait to obtain the latch on the hash bucket. This can result in contention on CBC latch.

          CBC latch    Hash bucket          Buffer hash chains of           Buffers
                                                           buffer headers

            L1 +------- B1 ----------------- A <-> C                                     A
               |                                                                                         B
               +------- B2 ----------------- B <-> F <-> E <-> B                    C
                                                        |_____________|                     D
                                                                         |                               E
                                                                CR CLONES                     F

            L2 +------- B3
               |
               +------- B4

            L3 +------- B5
               |
               +------- B6

No. of buckets protected by each latch = 32 - is high so that

1. Hash chains linked to a bucket will be smaller.

2. No. of latches will be less since latch is a fairly large structure with along code path.

No. of hash buckets is decided by an uncodumented parameter _db_block_hash_buckets


  Whenever any block is modified, changes are always made on the current copy of the block and the statistic db block gets is incremented. Th information in hash buckets is actually the linked list of buffer headers pointing to blocks in DB buffercache. The latch on the hash chain can be obtained in

- SHARED MODE: to scan the buffers

- EXCLUSIVE MODE: To link/delink blocks to/from  the chain

To know more about consistent reads in oracle click the following link: http://koenigocm.blogspot.in/search/label/CONSISTENT%20READS%20IN%20ORACLE

                ALGORITHM FOR HASH LOOKUP
         ---------------------------------------

1. Hash the DBA (Tablespace-id + File-id + Block-id)

2. Find the corresponding hash bucket .

3. Try to get the latch protecting the hash bucket

   If not success

      - Spin for spin count times

        . If latch was not got while spinnning,
              then sleep, wakeup and go to step 3(try to get latch)

          else (latch obtained)
              go to step 4(walk the chain)

   else ( latch obtained)
       Go to step 4 (walk the chain)

4. Walk the chain, reading buffer headers to see if specific version of the buffer is already in the chain.

   If found (logical I/O)

      Access the buffer in buffer cache with protection of buffer pin/unpin action.

   else (not found) (Physical I/O)

      - find a free buffer in buffer cache (obtain LRU latch)
      - unlink the buffer header for that buffer from its current chain (LRU chain)
      - Link that buffer header with this buffer chain
      - Release the latch and read the block into that free buffer with buffer header unpinned.
   
                        WORKING SETS

Buffers in the buffer cache are divided into working sets to improve scalability of common buffer cache operation. Each working set has its own LRU list and two check point queues.

- One DB writer may be responsible to write buffers in multiple working sets. Therefore, write I/O workload can be distributed among various DB writers.

- Since each working set has  its own LRU list, competition between processes trying to look for free buffers is reduced as differebt readers pick buffers from different working sets.

     No. of working sets in 11g R2 = CPU_COUNT
     No. of DB writers in 11g R2 = CPU_COUNT/8

Buffers in DB buffer cache are managed using various doubly linked lists:

- LRU list : contains list of free / dirty / pinned buffers. It has a

    . Least Recently used (LRU) end which has cold buffers which have not been accessed for a long time.

    . Most recently used (MRU) end which have hot buffers which have been recently accessed.

  When a block is first read into the buffer, it is placed in the middle of the LRU list and moved to the LRU/MRU end depeneding upon how frequently and how recently it has been accessed which is further decided by a buffer's touch count. The touch count of a buffer is stored in its header .When a server process scans the LRU list to find free buffers, it moves the buffers to the MRU/LRU end depending upon its touch count. Hence, it is possible that some buffers on the LRU end might not be really cold. They may be there because they have not been searched by a server process yet.

  As instance activity increases, dirty buffers on LRU need to be flushed to the disk so that they can be reused. Since blocks need to be written to disk in the order in which they were dirtied, anothet list is maintained i.e.e checkpoint queue which contains dirty blocks in the order of SCN#.

- Checkpoint queue : contains dirty buffers in the order in which they were dirtied (SCN#). Whenever a block is dirtied, it is attached here. A buffer can be simlultaneously in both LRU list and checkpoint queue.

- Write list - Main : contains dirty buffers which have to be written by DB writer to free some buffers in the buffer cache. When a Server process is scanning LRU list for a free buffer, dirty buffers in LRU list are  delinked from LRU list before linking them to this list.

- Write list - Aux : contains dirty buffers  to which have already been written to the online redo log files and are currently being written by DB writer to data files.

In the next article I will discuss about checkpointing.
----------------------------------------------------------------------------------------------------------

Related links:


HOME

BUFFER CACHE WAIT EVENTS
CASE STUDY ON BUFFER BUSY WAIT EVENT
CHECKPOINTS CONFUSIONS CLEARED
ORACLE CHECKPOINTS
ORACLE DATABASE BUFFER CACHE
SIMULATE BUFFER BUSY WAIT AND IDENTIFY HOT OBJECT
 CONSISTENT READS IN ORACLE PART-I
http://koenigocm.blogspot.in/search/label/CONSISTENT%20READS%20IN%20ORACLE



CONSISTENT READS IN ORACLE PART-II 

FLUSH BUFFER CACHE

Friday, 24 February 2012

ORACLE 11GR2 DATABASE FLASH CACHE

Hi Dba's I was working on Database flash cache and found it very interesting feature of 11G so thought to share with you as well.  well database flash cache works as l2 level of cache for database and is very useful when memory become a bottleneck  in oltp-kinda database  by these feature we can configure a flash drive(either cheap as pen drive or may be expensive high quality flash drive) to  store clean buffer  that aged out from db_buffer cache
NOTE:-Database flash cache feature can only be configured in ORALCE ENTERPRISES LINUX or SOLARIS.
TO  be able to use database flash cache you should follow these steps.
1.login to database and shutdown the system in normal mode.
sql>shutdown immediate
2. Install Patch 9352237 by using opatch utility
[oracle@node1  9352237]$/u01/app/oracle/product/11.2.0/db_home/Opatch/opatch apply.
3.Install patch 8974084 which allows you to use the database flash cache on Oracle Enterprise Linux even if you don't have exadata storage.  The patch is named:
§     8974084:META BUG FOR FLASH CACHE 11.2PL BUGS TO BACKPORT TO 11.2.0.1 OEL


Once you install the patch you can use any old flash device as a database flash cache if  you just want to experiment using an USB flash device, then you first need to get that device mounted. On my test machine I created a directory "/mnt/usbflash" then created an /etc/fstab entry like this:
   /dev/sda1               /mnt/usbflash           vfat    noauto,users,rw,umask=0 0 0
On your system you might need to change "/dev/sda1" to another device depending on how your fixed disks are configured.  You should then be able to mount the flashdrive by typiing "mount /dev/sda1".  Make sure that the mount point is writable by oracle (chmod 777 /mnt/usbflash). 
Once mounted, you configure the flash cache by setting the parameters DB_FLASH_CACHE_FILE and DB_FLASH_CACHE_SIZE 
alter system set db_flash_cache_size='/mnt/usbflash/oraflashcache.dat;'
alter system set db_flash_cache_size=1000M;
Note that the value of DB_FLASH_CACHE_FILE needs to be a file on the flash drive, not the flash drive mount point itself.
Once these parameters are set, the flash cache will be enabled and will act as a secondary cache to the buffer cache.  When a block is removed from the primary cache, it will still exist in the flash cache, and can be read back without a physical read to the spinning disk.
Monitoring

There's a few ways to examine how the flash cache is being used.  Firstly,  V$SYSSTAT contains some new statistics showing the number of blocks added to the cache and the number of "hits" that were satisfied from cache:

SELECT   name,value   FROM   v$sysstat  WHERE   name IN ('physical read flash cache hits',
                  'physical reads',  'consistent gets',  'db block gets',  'flash cache inserts')
Although the flash reads save time when compared to db file sequential reads, the overhead of maintaining the cache can be high because flash based SSD has a relatively severe write penalty.
All flash-based Solid State Disk have issues with write performance.  However, cheap Multi Level Cell (MLC) flash take about 3 times as long to write as the more expensive Single Level Cell (SLC).  When flash drives are new, the empty space can be written to in single page increments (usually 4KB).  However, when the flash drive is older, writes typically require erasing a complete 128 page block which is very much slower.  My cheap USB drive was old and MLC, so it had very poor write performance.   But even the best flash based SSD is going to be much slower for writes than for reads, and in some cases using a  flash cache might slow a database down as a result.  So monitoring is important.
There's a couple of other V$SYSSTAT statistics of interest:

select name,value from v$sysstat where name like '%flash cache%'
To examine the contents of the cache, we can examine the V$BH view.  Buffers in the flash cache have STATUS values such as 'flashcur', allowing us to count the buffers from each object in the main buffer cache and in the flash cache
   SELECT   owner || '.' || object_name object, SUM (CASE WHEN b.status LIKE 'flash%' THEN 1 END) flash_blocks,SUM (CASE WHEN b.status LIKE 'flash%' THEN 0 else 1 END) cache_blocks,count(*) total_blocks  FROM   v$bh b  JOIN   dba_objects   ON (objd = object_id)  GROUP BY   owner,  object_name order by 4 desc
Conclusion
  We should be cautious however, because the flash write penalty may result in performance problems.

Related links:

HOME



Monday, 9 January 2012

CASE STUDY ON BUFFER BUSY WAIT EVENT

PURPOSE
-------

This document discusses a rare and difficult to diagnose database performance 
problem characterized by extremely high buffer busy waits that occur at 

seemingly random times.  The problem persists even after traditional buffer 
busy wait tuning practices are followed (typically, increasing the number of 
freelists for an object).  
 
SCOPE & APPLICATION

-------------------

This document is intended for support analysts and customers.  It applies to 
both Unix and Windows-based systems, although the examples here will be 
particular to a Unix-based (Solaris) system.


In addition to addressing a specific buffer busy wait performance problem, 
in section II, this document presents various techniques to diagnose and 
resolve this problem by using detailed data from a real-world example.  The 

techniques illustrated here may be used to diagnose other I/O and performance 
problems.



RESOLVING INTENSE AND "RANDOM" BUFFER BUSY WAIT PERFORMANCE PROBLEMS
------------------------------
------------------------------
-------- This document is composed of two sections; a summary section that broadly discusses the problem and its resolution, and a detailed diagnostics section that shows how to collect and analyze various database and operating system diagnostics related to this problem. The detailed diagnostics section is provided to help educate the reader with techniques that may be useful in other situations. I. Summary ~~~~~~~~~~~ 1. Problem Description ~~~~~~~~~~~~~~~~~~~~~~~ At seemingly random times without regard to overall load on the database, the following symptoms may be witnessed: - Slow response times on an instance-wide level - long wait times for "buffer busy waits" in Bstat/Estat or Statpack reports - large numbers of sessions waiting on buffer busy waits for a group of objects (identified in v$session_wait) Some tuning effort may have been spent in identifying the segments involved in the buffer busy waits and rebuilding those segments with a higher number of freelists or freelist groups (from 8.1.6 on one can dynamically add process freelists; segments only need to be rebuilt if changing freelist groups). Even after adding freelists, the problem continues and is not diminished in any way (although regular, concurrency-based buffer busy waits may be reduced). 2. Problem Diagnosis ~~~~~~~~~~~~~~~~~~~~~ The problem may be diagnosed by observing the following: - The alert.log file shows many occurrences of ORA-600, ORA-7445 and core dumps during or just before the time of the problem. - The core_dump_dest directory contains large core dumps during the time of the problem. There may either be many core dumps or a few very large core dumps (100s of MB per core file), depending on the size of the SGA. - sar -d shows devices that are completely saturated and have high request queues and service times. These devices and/or their controllers are part of logical volumes used for database files. - Buffer busy waits, write complete waits, db file parallel writes and enqueue waits are high (in top 5 waits, usually in that order). Note that in environments using Oracle Advanced Replication, the buffer busy waits may at times appear on segments related to replication (DEF$_AQCALL table, TRANORDER index, etc...). 3. Problem Resolution ~~~~~~~~~~~~~~~~~~~~~~ The cause for the buffer busy waits and other related waits might be a saturated disk controller or subsystem impacting the database's ability to read or write blocks. The disk/controller may be saturated because of the many core dumps occurring simultaneously requiring hundreds of megabytes each. If the alert.log or core_dump_dest directory has no evidence of core dumps, then the source of the I/O saturation must be found. It may be due to non-database processes, another database sharing the same filesystems, or a poorly tuned I/O subsystem. The solution is as follows: 1) Find the root cause for the I/O saturation (core dumps, another process or database, or poorly performing I/O subsystem) and resolve it. OR, 2) If evidence of core dumps are found: - Find the causes for the core dumps and resolve them (patch, etc) - Move the core_dump_dest location to a filesystem not shared with database files. - Use the following init.ora parameters to reduce or avoid the core dumps: shadow_core_dump = partial background_core_dump = partial These core dump parameters can also be set to "none" but this is not recommended unless the causes for the core dumps have been identified. II. Detailed Diagnostics with Examples ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section provides details into the diagnostic tools and methods used to resolve this problem and are intended to help illustrate the use of various diagnostics that may be applied in other situations. 1. Diagnostics ~~~~~~~~~~~~~~~ The following diagnostics will help fingerprint the problem. - alert.log and trace files in the background_dump_dest, user_dump_dest, and core_dump_dest directories - iostat or sar data (on Windows, the Performance Monitor utility with disk counters activated) with filesystem mapping tools or techniques (e.g., Veritas vxprint) - StatsPack or BSTAT/ESTAT reports - V$SESSION_WAIT A. ALERT.LOG file ~~~~~~~~~~~~~~~~~ The alert.log file should always be checked in any database diagnostic effort. In this case, the following entries were created during the time of the buffer busy wait problem: Wed May 16 00:38:15 2001 Errors in file /db_admin/prod/bdump/mlrep_s008_2731.trc: ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [859063291] [] [] Wed May 16 00:38:15 2001 Errors in file /db_admin/prod/bdump/mlrep_s014_2737.trc: ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGSEGV] [Address not mapped to object] [926430458] [] [] Wed May 16 00:41:13 2001 Errors in file /db_admin/prod/bdump/mlrep_s005_930.trc: ORA-07445: exception encountered: core dump [kzdurtc()+4] [SIGBUS] [Invalid address alignment] [1178678525] [] [] The stack traces corresponding to the core files indicated a problem during the login process. It was discovered that a core dump could be caused if users accidentally entered passwords greater than 48 characters. Furthermore, in this web-based application, users did not receive an error and would retry the login, producing another core dump with each click of the mouse. B. SAR Diagnostics ~~~~~~~~~~~~~~~~~~ SAR, IOSTAT, or similar tools are critical to diagnosing this problem because they show the health of the I/O system during the time of the problem. The SAR data for the example we are looking at is shown below (shown using "sar -d -f /var/adm/sa/sa16"): SunOS prod1 5.6 Generic_105181-23 sun4u 05/16/01 01:00:00 device %busy avque r+w/s blks/s avwait avserv sd22 100 72.4 2100 2971 0.0 87.0 sd22,c 0 0.0 0 0 0.0 0.0 sd22,d 0 0.0 0 0 0.0 0.0 sd22,e 100 72.4 2100 2971 0.0 87.0 /\ || extremely high queue values (usually less than 2 during peak) By mapping the sd22 device back to the device number (c3t8d0) and then back to the logical volume through to the filesystem (using "df" and Veritas' utility /usr/sbin/vxprint), it was determined the filesystem shared the same controller (c3) as several database files (among them were the datafiles for the SYSTEM tablespace). By looking within the filesystems using the aforementioned controller (c3), several very large (1.8 GB) core dumps were found in the core_dump_dest directory, produced around the time of the problem. C. StatsPack ~~~~~~~~~~~~~ During an episode of the performance problem, it is very helpful to collect samples using StatsPack. The interval between samples is ideally about 15 minutes, but even 1-hour intervals are acceptable. It is very valuable to also collect baseline samples of the same interval during similar activity levels when there is NOT a problem. By having the baseline, one will be able to see how certain statistics reflect the problem. i. Waits Events: During the problem episode, "buffer busy waits" and "write complete waits" will be seen as the highest wait events. During the baseline load, these events are not significant. Other significant wait events during the problem may be "enqueue" and "db file parallel writes". For example: Avg Total Wait wait Waits Event Waits Timeouts Time (cs) (ms) /txn ---------------------------- ------------ ---------- ----------- ----- ------ buffer busy waits 225,559 211,961 24,377,029 1081 4.0 enqueue 25,731 21,756 6,786,722 2638 0.5 Parallel Query Idle Wait - S 9,980 7,929 1,762,606 1766 0.2 SQL*Net message from dblink 435,845 0 1,288,965 30 7.7 db file parallel write 4,252 0 1,287,023 3027 0.1 write complete waits 5,872 5,658 581,066 990 0.1 db file sequential read 1,249,981 0 510,793 4 22.0 ii. Statistics: There may be evidence of DB Writer falling behind and slow write performance as follows: - low number of DBWR Timeouts - the ratio (free buffers inspected)/(free buffers requested) * 100% > 5% - much higher number for "dirty buffers inspected" than normal baseline The following lists some key statistics to look at: Statistic Total per Second per Trans ----------------------- ---------------- ------------ ------------ consistent changes 43,523 12.1 2.4 Much free buffer inspected 6,087 1.7 0.3 <== higher free buffer requested 416,010 115.6 23.1 than logons cumulative 15,718 4.4 0.9 normal physical writes 24,757 6.9 1.4 write requests 634 0.2 0.0 It's important to look at workload parameters to ensure the problem isn't due to simply more work being done. The statistic "consistent changes", "logons cumulative", "physical writes", and "write requests" are all compared to a baseline. iii. Tablespace I/O Summary The average wait times for tablespaces will be dramatically higher. Tablespace IO Summary for DB: PROD Instance: PROD Snaps: 3578 - 3579 Avg Read Total Avg Wait Tablespace Reads (ms) Writes Waits (ms) ----------- ----------- -------- ----------- ---------- -------- BOM 482,368 7.0 18,865 3,161 205.9 very CONF 157,288 0.6 420 779 9897.3 <= high CTXINDEX 36,628 0.5 7 4 12.5 very RBS 613 605.7 23,398 8,253 7694.6 <= high SYSTEM 18,360 3.6 286 78 745.5 DB_LOW_DATA 16,560 2.6 1,335 14 24.3 Similar statistics per datafile may be seen in the "File IO Statistics" section of StatsPack. iv. Buffer Busy Wait Statistics Buffer wait Statistics for DB: PROD Instance: PROD Snaps: 3578 - 3579 Tot Wait Avg Class Waits Time (cs) Time (cs) ------------------ ----------- ---------- --------- data block 216,577 ########## 108 <== severe contention on undo header 5,072 609,734 120 data blks undo block 3,770 333,876 89 ("off the scale") free list 70 17,426 249 segment header 8 34 4 v. Enqueues The StatsPack data for this case study shows many SQ (sequence) enqueue waits. SQ enqueues are held while Oracle updates the dictionary entry of the sequence with the next value (or starting value for the next batch of sequence numbers to cache). The object id of the sequence is found by looking at the p2 column of v$session_wait for 'enqueue' events and selecting from DBA_OBJECTS using the object_id (you must convert the value in the p2 column to decimal first). In this case it was for sys.sessaud$. This means the SYSTEM tablespace is being impacted by the I/O problems and is taking a long time to generate the next sequence number for "sessaud$". Enqueue activity for DB: MLREP Instance: MLREP Snaps: 3578 - 3579 Enqueue Gets Waits ---------- ------------ ---------- SQ 5,130 1,345 <== Drill down v$session_wait.p2 and TX 80,735 63 DBA_OBJECTSfor object_id. In SR 8,207 7 this case it was for sys.audses$ TM 93,225 4 CF 1,396 1 vi. Buffer Pool Statistics The buffer pool statistics should be compared with the baseline. During the performance problem write complete waits may be 10 to 100 times longer than during the baseline load (without the problem): Buffer Pool Sets for DB: PROD Instance: PROD Snaps: 3578 - 3579 Free Write Buffer Set Buffer Consistent Physical Physical Buffer Complete Busy Id Gets Gets Reads Writes Waits Waits Waits --- ----------- ----------- ----------- ----------- -------- -------- -------- 1 99,132 4,060,929 97,235 2,860 0 384 148,348 2 97,773 3,359,172 96,851 3,185 0 221 640 3 97,320 3,486,183 96,592 3,014 0 303 1,024 4 96,961 1,943,505 96,366 2,723 0 232 598 5 97,182 1,508,089 96,223 2,798 0 107 5,731 ... /\ || these are 10 times greater than baseline D. V$SESSION_WAIT ~~~~~~~~~~~~~~~~~~ V$SESSION_WAIT can be used to see how many sessions are impacted by the buffer busy wait problem in real-time using a query as follows: SELECT count(*), event FROM v$session_wait WHERE wait_time = 0 AND event NOT IN ('smon timer','pmon timer','rdbms ipc message', 'SQL*Net message from client') GROUP BY event ORDER BY 1 DESC; Output: COUNT(*) EVENT ---------- ---------------------------------------------------------------- 122 buffer busy waits 15 enqueue 15 enqueue 5 db file parallel write 3 SQL*Net message from dblink 2 latch free 1 SQL*Net more data to client To see the file and block numbers in more detail: SELECT count(*) NUM_WAITERS, p1 FILE#, p2 BLK#, p3 CLASS FROM v$session_wait WHERE event = 'buffer busy waits' GROUP BY p1, p2, p3 NUM_WAITERS FILE# BLK# CLASS ------------ ----- ------- ------ 92 2 13487 1016 73 2 27762 1016 32 1 29697 1016 This shows that during the execution of the above query, there were 92 waiters for file 2, block 13487. To find the object name and type for the objects being wait on, use the following query (supplying the file numbers and block numbers shown above): SELECT owner,segment_name,segment_type FROM dba_extents WHERE file_id=&file AND &blockid BETWEEN block_id AND block_id + blocks After querying against DBA_EXTENTS and supplying the file and block numbers, the following correlation can be made: NUM_WAITERS OWNER.SEGMENT_NAME TYPE FILE# BLK# CLASS ------------ ------------------- --------- ---- ------- ------ 92 SYSTEM.DEF$_TRANORDER INDEX 2 13487 1016 73 SYSTEM.DEF$_AQCALL TABLE 2 27762 1016 32 SYSTEM.DEF$_TRANORDER INDEX 1 29697 1016 Normally, one would rebuild the above segments with more freelists or freelists groups (in 8.1.6 or higher you can add process freelists without rebuilding the segment) to reduce contention. However, in this case more freelists won't help. 2. Resolving the Problem ~~~~~~~~~~~~~~~~~~~~~~~~~~ With the above diagnostics, it was possible to conclude that the core dumps caused disk controller saturation and impeded the database from reading and writing blocks. By scanning back through the alert.log file and looking for occurrences of similar core dumps, it was possible to ascertain when the problems had appeared. Looking at StatsPack data for the same time period seen in the alert.log file, the buffer busy waits were indeed the highest wait event. Since the customer could not change the application to limit the password length and there were no other places to put the core_dump_dest files, the following changes were made to the init.ora file: shadow_core_dump = partial background_core_dump = partial These changes caused the core dumps to be created AFTER the oracle processes are detached from the SGA. The resulting core dumps were 24 MB on average instead of 1.8 GB. Later occurrences of core dumps were seen in the alert.log and cdump directory, with no buffer busy wait side effects or performance degradation (and no I/O impact visible in sar data); this confirmed that the init.ora changes solved the problem.

---------------------------------------------------------------------------------------------------------

Related links:


CASE STUDY ON BUFFER BUSY WAIT EVENT
CHECKPOINTS CONFUSIONS CLEARED
ORACLE CHECKPOINTS
ORACLE DATABASE BUFFER CACHE
SIMULATE BUFFER BUSY WAIT AND IDENTIFY HOT OBJECT
 CONSISTENT READS IN ORACLE PART-I


CONSISTENT READS IN ORACLE PART-II 

FLUSH BUFFER CACHE

Friday, 30 December 2011

CHECKPOINTS CONFUSIONS CLEARED FROM KARAN

Well, lot of DBA's in this world are still unaware about checkpoint facts, after having a lovely breakfast i decided to post on this....

  An RBA indicates position in the redo log. CKPT records checkpoint information to controlfile for maintaining book keeping information like checkpoint progress. Each instance checkpoint refers to some RBA (called checkpoint RBA) whose redo prior to this RBA have been written to disk. Hence recovery time is difference between checkpoint RBA and end of the redo log.

Given a checkpoint RBA, DBWR writes buffers from the head of the queue  until low RBA of the buffer at the head of the checkpoint queue is greater than the checkpoint RBA. At this point ,CKPT can record this checkpoint progress record in control file (phase 3). Phase (1) process initiating the checkpoint (checkpoitn RBA or current RBA is marked) (The RBA of the last change made to a buffer) at the time reuqest is initiated. PHASE 2 = DBWR writes all required buffers i.e all buffers that have been modified at RBAs less than or equal to the checkpoint RBA. After all required buffers have been written, in 3rd phase , CKPT process records the completion of the checkpoint in control file.

# BCQ (buffer checkpoint queue) which contains modified buffers linked in ascending oreder of their low RBAs. In response to a checkpoint request, buffers are written from the head of the checkpoint queue, i.e in ascending order of low RBA values. after the buffer is written it is unlinked from checkpoint queue.

# ACQ (active checkpoint queue) is a queue that contains active checkpoint requests. Whenever a checkpint request is made , a new active checkpoint entry describing the request is added to the ACQ. Note that multiple checkpoint requests can be there in ACq like datafiles has to checkpointed before they are backed up, alter system checkpoint, or at log switch.

# Periodically CKPT can record this low RBA to controlfile (using a very light weight controlfile update protocol). This periodically recorded lowest low RBA is current position of the incremental checkpoint for this instance.

# so incremental checkpointing greatly improves crash recovery times with negligible impact on normal activity.

# IF DBWR continiually writes from the head of the checkpoint queue, the instance checkpoint (lowest low -RBA of modified buffers) will keep on advancing.

The low and high RBAs for dirty buffers can be seen in X$BH. (There is also a recovery RBA which is used to record the progress of partial block recovery by PMON.) The incremental checkpoint RBA, the target RBA and the on-disk RBA can all be seen in X$TARGETRBA. The incremental checkpoint RBA and the on-disk RBA can also be seen in X$KCCCP. The full thread checkpoint RBA can be seen in X$KCCRT.

The checkpoint RBA is copied into the checkpoint progress record of the controlfile by the checkpoint heartbeat once every 3 seconds. Instance recovery, when needed, begins from the checkpoint RBA recorded in the controlfile. The target RBA is the point up to which DBWn should seek to advance the checkpoint RBA to satisfy instance recovery objectives.

The on-disk RBA is the point up to which LGWR has flushed the redo thread to the online log files. DBWn may not write a block for which the high RBA is beyond the on-disk RBA. Otherwise transaction recovery (rollback) would not be possible, because the redo needed to undo a change is always in the same redo record as the redo for the change itself.

The term sync RBA is sometimes used to refer to the point up to which LGWR is required to sync the thread. However, this is not a full RBA -- only a redo block number is used at this point.

-----------------------------------------------------------------------------------------------------------

Related links:

HOME


CASE STUDY ON BUFFER BUSY WAIT EVENT
CHECKPOINTS CONFUSIONS CLEARED
ORACLE CHECKPOINTS
ORACLE DATABASE BUFFER CACHE
SIMULATE BUFFER BUSY WAIT AND IDENTIFY HOT OBJECT
 CONSISTENT READS IN ORACLE PART-I



CONSISTENT READS IN ORACLE PART-II 

FLUSH BUFFER CACHE