Pages

Monday, 2 July 2012

ORACLE DATABASE BUFFER CACHE




                      BUFEFR CACHE ARCHITECTURE



Database buffer cache is a memory structure in SGA. Buffers in DB cache are used to hold blocks of data read from the data files.Each buffer is sized to hold one database block. Oracle allows for a total of 8 different caches :

- 5 caches to allow different block sizes
- 3 caches to allow different behaviour

BUFFERS TO ALLOW DIFFRENT BLOCK SIZES :

- DEFAULT CACHE : To hold blocks of size DB_BLOCK_SIZE. Its size is determined by the parameter DB_CACHE_SIZE

- DB NK CACHE : To hold blocks of size = nK where n = 2,4,8,16,32 where n is not equal the size of the default block (DB_BLOCK_SIZE).  Its size is determined by the parameter DB_NK_CACHE_SIZE

BUFFERS TO ALLOW DIFFRENT BEHAVIOUR :

- DEFAULT CACHE : Cache where blocks of those objects are stored which are not assigned to keep or recycle cache or assigned default cache.

- KEEP CACHE : Cache to hold blocks of small sized objects for a longer time.Its size is determined by the parameter DB_KEEP_CACHE_SIZE.

- RECYCLE CACHE : Cache designed to quickly age out blocks of rarely accessed large sized objects. Its size is determined by the parameter DB_RECYCLE_CACHE_SIZE.

Each buffer in any of the above caches can be in any of the following states :

- FREE : Free buffers can be further of two kinds:
         - Empty or unused
         - Buffers which are identical to the blocks on disk because
           . they have not been dirtied or
           . they were dirtied and have been written to disk

- PINNED : A buffer being accessed (read/written) by a process.

- DIRTY : A buffer which holds a block that has been changed by a server process and hence needs to be writted to the disk before it can be reused.

Buffer headers store metadata about contents of the buffers and are used to manage the cache.The metadata includes information like Data block address (DBA) of the block, type of block (data,undo), touch count, pointer to the buffer, pointers to previous/next buffer in cache bufferchain etc. There is one row in V$BH for every buffer. Buffer headers keep track of various attributes and state of the buffers in buffer cache. The blocks are placed in the buffer cache in the form of linked lists (cache buffer chains) in different hash buckets. While searching for blocks in buffer cache oracle does not look at them buffer by buffer as it will bring down the performance drastically. Let us take the example of a library where we are looking for a blook. If we search the whole library book by book it will take very long but if the books are placed subject wise, we first need to go the rack containing books of the desired subject and then we search for the book we want. Our search will be over rather quickly. In the same manner, buffers are placed in different hash buckets depending upon the hash value of the data block address (DBA). While searching for a particular block, Oracle first hashes the DBA of the desired block, locates the hash bucket containing the block and then searches the blocks in that hash bucket. Different DBAs can hash to the same value (similar to modulus function i.e mod (5,4) = mod (9,4) = mod(13,4)....). Hence, one hash bucket can contain blocks with different DBA's. Each hash chain is protected by a latch. Processes need to get the relevant latch to allow them to scan a hash chain for a buffer so that while searching, the linked list underneath does not change. These latches are called cache buffer chain latches CBC latches). One CBC latch protects multiple hash buckets.

   To implement read consistency, a hash chain can also contain multiple versions of the  same block (DBA). For each DBA in a bucket, there is at least one current block which contains all the committed/uncommitted changes in different rows in that block made by different users. When a user requests a row from that block, he should receive a read consistent copy of the block i.e. the copy should have all the changes (committed / uncommitted)  made by him and only the committed changes made by other users. If the desired read consistent clone is already there, it is read and the user gets to see the results. If the read consistent clone is not there, a copy of the current block is made and marked as clone. Changes are rolled back on the clone after reading undo and then the user os displayed the results. In this case, No. of consistent gets are higher. If a block is rapidly changed and is rapidly queried, multiple CR copies can exist in buffer cache. Such a  block is called a hot block. All CR copies have the same DBA and hence go to the same hash bucket (CBC). The CBC gets longer and hence takes longer to scan. Multiple users trying to scan the CBC chain may have to wait to obtain the latch on the hash bucket. This can result in contention on CBC latch.

          CBC latch    Hash bucket          Buffer hash chains of           Buffers
                                                           buffer headers

            L1 +------- B1 ----------------- A <-> C                                     A
               |                                                                                         B
               +------- B2 ----------------- B <-> F <-> E <-> B                    C
                                                        |_____________|                     D
                                                                         |                               E
                                                                CR CLONES                     F

            L2 +------- B3
               |
               +------- B4

            L3 +------- B5
               |
               +------- B6

No. of buckets protected by each latch = 32 - is high so that

1. Hash chains linked to a bucket will be smaller.

2. No. of latches will be less since latch is a fairly large structure with along code path.

No. of hash buckets is decided by an uncodumented parameter _db_block_hash_buckets


  Whenever any block is modified, changes are always made on the current copy of the block and the statistic db block gets is incremented. Th information in hash buckets is actually the linked list of buffer headers pointing to blocks in DB buffercache. The latch on the hash chain can be obtained in

- SHARED MODE: to scan the buffers

- EXCLUSIVE MODE: To link/delink blocks to/from  the chain

To know more about consistent reads in oracle click the following link: http://koenigocm.blogspot.in/search/label/CONSISTENT%20READS%20IN%20ORACLE

                ALGORITHM FOR HASH LOOKUP
         ---------------------------------------

1. Hash the DBA (Tablespace-id + File-id + Block-id)

2. Find the corresponding hash bucket .

3. Try to get the latch protecting the hash bucket

   If not success

      - Spin for spin count times

        . If latch was not got while spinnning,
              then sleep, wakeup and go to step 3(try to get latch)

          else (latch obtained)
              go to step 4(walk the chain)

   else ( latch obtained)
       Go to step 4 (walk the chain)

4. Walk the chain, reading buffer headers to see if specific version of the buffer is already in the chain.

   If found (logical I/O)

      Access the buffer in buffer cache with protection of buffer pin/unpin action.

   else (not found) (Physical I/O)

      - find a free buffer in buffer cache (obtain LRU latch)
      - unlink the buffer header for that buffer from its current chain (LRU chain)
      - Link that buffer header with this buffer chain
      - Release the latch and read the block into that free buffer with buffer header unpinned.
   
                        WORKING SETS

Buffers in the buffer cache are divided into working sets to improve scalability of common buffer cache operation. Each working set has its own LRU list and two check point queues.

- One DB writer may be responsible to write buffers in multiple working sets. Therefore, write I/O workload can be distributed among various DB writers.

- Since each working set has  its own LRU list, competition between processes trying to look for free buffers is reduced as differebt readers pick buffers from different working sets.

     No. of working sets in 11g R2 = CPU_COUNT
     No. of DB writers in 11g R2 = CPU_COUNT/8

Buffers in DB buffer cache are managed using various doubly linked lists:

- LRU list : contains list of free / dirty / pinned buffers. It has a

    . Least Recently used (LRU) end which has cold buffers which have not been accessed for a long time.

    . Most recently used (MRU) end which have hot buffers which have been recently accessed.

  When a block is first read into the buffer, it is placed in the middle of the LRU list and moved to the LRU/MRU end depeneding upon how frequently and how recently it has been accessed which is further decided by a buffer's touch count. The touch count of a buffer is stored in its header .When a server process scans the LRU list to find free buffers, it moves the buffers to the MRU/LRU end depending upon its touch count. Hence, it is possible that some buffers on the LRU end might not be really cold. They may be there because they have not been searched by a server process yet.

  As instance activity increases, dirty buffers on LRU need to be flushed to the disk so that they can be reused. Since blocks need to be written to disk in the order in which they were dirtied, anothet list is maintained i.e.e checkpoint queue which contains dirty blocks in the order of SCN#.

- Checkpoint queue : contains dirty buffers in the order in which they were dirtied (SCN#). Whenever a block is dirtied, it is attached here. A buffer can be simlultaneously in both LRU list and checkpoint queue.

- Write list - Main : contains dirty buffers which have to be written by DB writer to free some buffers in the buffer cache. When a Server process is scanning LRU list for a free buffer, dirty buffers in LRU list are  delinked from LRU list before linking them to this list.

- Write list - Aux : contains dirty buffers  to which have already been written to the online redo log files and are currently being written by DB writer to data files.

In the next article I will discuss about checkpointing.
----------------------------------------------------------------------------------------------------------

Related links:


HOME

BUFFER CACHE WAIT EVENTS
CASE STUDY ON BUFFER BUSY WAIT EVENT
CHECKPOINTS CONFUSIONS CLEARED
ORACLE CHECKPOINTS
ORACLE DATABASE BUFFER CACHE
SIMULATE BUFFER BUSY WAIT AND IDENTIFY HOT OBJECT
 CONSISTENT READS IN ORACLE PART-I
http://koenigocm.blogspot.in/search/label/CONSISTENT%20READS%20IN%20ORACLE



CONSISTENT READS IN ORACLE PART-II 

FLUSH BUFFER CACHE

4 comments:

Unknown said...

Thanks M'aam for such a nice explanation...

Anonymous said...

Great info! Thanks for sharing this!

Avinash said...

Do you mean Working set as Hash Bucket ??
Just to clear my doubt. Thanks in Advance..

uj said...

Hi,

Can you please let me know if a user runs a query to fetch two columns from a table and another user fetches one column from the same table will the data blocks for second query will be written to the database buffer cache again?